Loading learning content...
The true power of graph databases isn't in storing connected data—it's in querying that data in ways that would be impossibly complex or prohibitively expensive in other paradigms.
Consider what becomes trivial with graph queries:
These operations—natural expressions of graph algorithms—translate directly into queries that execute in milliseconds even on massive datasets.
By the end of this page, you will master advanced graph query patterns—traversal algorithms, path analysis, pattern matching, graph algorithms for analytics, and optimization techniques that distinguish production-grade graph queries from naive implementations.
At the heart of every graph query is traversal—systematically visiting nodes by following relationships. Understanding traversal mechanics is essential for writing efficient queries.
Traversal Components:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
// ========================================// DIRECTED TRAVERSAL// ======================================== // Outgoing relationships onlyMATCH (manager:Person)-[:MANAGES]->(report:Person)RETURN manager.name, report.name // Incoming relationships onlyMATCH (employee:Person)<-[:HIRED]-(company:Company)RETURN employee.name, company.name // Either direction (performance note: can be slower)MATCH (a:Person)-[:KNOWS]-(b:Person)RETURN a.name, b.name // ========================================// MULTI-HOP TRAVERSAL (Fixed Depth)// ======================================== // Exactly 2 hopsMATCH (a:Person)-[:KNOWS]->()-[:KNOWS]->(c:Person)WHERE a <> cRETURN DISTINCT a.name AS person, c.name AS friendOfFriend // Same as above, more readable with named nodesMATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)WHERE a <> cRETURN a.name, b.name AS via, c.name // ========================================// VARIABLE-LENGTH TRAVERSAL// ======================================== // 1 to 3 hopsMATCH path = (start:Person {id: "P001"})-[:KNOWS*1..3]->(end:Person)RETURN end.name, length(path) AS distance // 2 to 5 hops (skip immediate neighbors)MATCH (start:Person {id: "P001"})-[:KNOWS*2..5]-(connected:Person)RETURN DISTINCT connected.name // Any number of hops (use with caution!)MATCH (start:Category {name: "Root"})-[:SUBCATEGORY*]->(leaf:Category)WHERE NOT (leaf)-[:SUBCATEGORY]->()RETURN leaf.name AS leafCategory // Up to N hops (safer than unbounded)MATCH (a:Node)-[:LINKED*..10]->(b:Node)RETURN a.id, b.id // ========================================// TRAVERSAL WITH FILTERING// ======================================== // Filter on intermediate nodesMATCH path = (start:Person)-[:KNOWS*1..4]-(end:Person)WHERE ALL(node IN nodes(path) WHERE node.verified = true)RETURN path // Filter on relationshipsMATCH (start:Person)-[rels:KNOWS*1..3]->(end:Person)WHERE ALL(r IN rels WHERE r.strength > 0.5)RETURN start.name, end.name, [r IN rels | r.strength] AS strengths // Filter on path propertiesMATCH path = (a:City)-[:ROAD*]-(b:City)WHERE REDUCE(total = 0, r IN relationships(path) | total + r.distance) < 500RETURN [n IN nodes(path) | n.name] AS route, REDUCE(d = 0, r IN relationships(path) | d + r.distance) AS totalDistance // ========================================// TRAVERSAL UNIQUENESS// ======================================== // Default: relationships are unique per path (each edge visited once per path)MATCH path = (a)-[*1..5]-(b)// This may include same nodes multiple times, but not same edges // Force unique nodes in pathMATCH path = (a)-[*1..5]-(b)WHERE size(nodes(path)) = size(apoc.coll.toSet(nodes(path)))RETURN pathNever use unbounded variable-length patterns ([:REL*]) in production without constraints. On a large graph, this can traverse millions of paths, exhausting memory and timing out. Always specify maximum depth (*1..6) or use timeout mechanisms. For exploration, use LIMIT to cap result count.
Paths—ordered sequences of nodes connected by relationships—are first-class citizens in graph queries. Analyzing paths enables route finding, dependency tracking, and connection analysis.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
// ========================================// SHORTEST PATH QUERIES// ======================================== // Single shortest path (any one shortest)MATCH path = shortestPath( (alice:Person {name: "Alice"})-[:KNOWS*]-(bob:Person {name: "Bob"}))RETURN path, length(path) AS hops // All shortest paths (same minimum length)MATCH paths = allShortestPaths( (start:Person {id: "P001"})-[:KNOWS*]-(end:Person {id: "P099"}))RETURN paths, length(paths) AS hops // Shortest path with relationship filterMATCH path = shortestPath( (a:Person)-[:KNOWS|WORKS_WITH*]-(b:Person))WHERE ALL(r IN relationships(path) WHERE r.trust > 0.7)RETURN path // Weighted shortest path (using relationship properties)// Requires GDS library for DijkstraMATCH (source:City {name: "NYC"}), (target:City {name: "LA"})CALL gds.shortestPath.dijkstra.stream('roadNetwork', { sourceNode: source, targetNode: target, relationshipWeightProperty: 'distance'})YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, pathRETURN totalCost AS totalDistance, [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS cities // ========================================// ALL PATHS ENUMERATION// ======================================== // All paths between two nodes (limited)MATCH path = (a:Person {id: "P001"})-[:KNOWS*1..5]-(b:Person {id: "P010"})RETURN pathORDER BY length(path)LIMIT 100 // Paths with specific characteristicsMATCH path = (start:Account)-[:TRANSFERRED*2..4]->(end:Account)WHERE start.id = "suspicious" AND end.id = "offshore" AND ALL(n IN nodes(path) WHERE n.flagged = false) // Only clean intermediariesRETURN path, [n IN nodes(path) | n.id] AS accountChain, REDUCE(total = 0, r IN relationships(path) | total + r.amount) AS totalTransferred // ========================================// PATH EXTRACTION AND MANIPULATION// ======================================== // Extract nodes from pathMATCH path = (a:Person)-[:KNOWS*1..3]-(b:Person)WITH path, nodes(path) AS pathNodes, relationships(path) AS pathRelsRETURN [n IN pathNodes | n.name] AS people, [r IN pathRels | r.since] AS connectionDates, length(path) AS hops // Path as ordered listMATCH path = (root:Category {name: "Electronics"})-[:SUBCATEGORY*]->(leaf:Category)WHERE NOT (leaf)-[:SUBCATEGORY]->()RETURN [n IN nodes(path) | n.name] AS hierarchy, length(path) AS depth // ========================================// CYCLE DETECTION// ======================================== // Find all simple cycles from a starting nodeMATCH path = (start:Node {id: "N001"})-[:LINKED*2..10]->(start)WHERE ALL(n IN nodes(path)[1..] WHERE n <> start) // No revisiting except returnRETURN path // Detect circular dependenciesMATCH cycle = (pkg:Package)-[:DEPENDS_ON*]->(pkg)RETURN [n IN nodes(cycle) | n.name] AS circularDependency // Find cycles of specific lengthMATCH path = (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)-[:KNOWS]->(a)WHERE a <> b AND b <> c AND a <> cRETURN a.name, b.name, c.name AS triangle // ========================================// PATH COMPARISON AND AGGREGATION// ======================================== // Group by path lengthMATCH path = (start:Person {id: "P001"})-[:KNOWS*1..6]-(end:Person)WITH length(path) AS distance, count(*) AS reachableRETURN distance, reachableORDER BY distance // Find nodes reachable at exactly each distanceMATCH (start:Person {id: "P001"})UNWIND range(1, 5) AS distanceMATCH path = (start)-[:KNOWS*]->(reachable:Person)WHERE length(path) = distanceRETURN distance, collect(DISTINCT reachable.name) AS reachableAtDistanceshortestPath() uses bidirectional BFS and terminates upon finding the first path—highly efficient. allShortestPaths() continues to find all paths of that minimum length. General path enumeration (-[*]->) explores exhaustively and can be extremely expensive. Choose the minimal pattern that answers your question.
Pattern matching is the declarative heart of graph queries. You describe the subgraph shape you're looking for, and the database finds all instances. This is particularly powerful for complex topological queries.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
// ========================================// COMPLEX PATTERNS// ======================================== // Triangle pattern (mutual friends)MATCH (a:Person)-[:KNOWS]->(b:Person), (a)-[:KNOWS]->(c:Person), (b)-[:KNOWS]->(c)WHERE id(a) < id(b) AND id(b) < id(c) // Avoid duplicatesRETURN a.name, b.name, c.name AS mutualConnections // Star pattern (hub node)MATCH (hub:Person)<-[:FOLLOWS]-(follower:Person)WITH hub, count(follower) AS followerCountWHERE followerCount > 1000RETURN hub.name AS influencer, followerCountORDER BY followerCount DESC // Diamond pattern (two paths between same endpoints)MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(d:Person), (a)-[:KNOWS]->(c:Person)-[:KNOWS]->(d)WHERE b <> cRETURN a.name, b.name, c.name, d.name // Chain with multiple relationship typesMATCH (customer:Customer)-[:PLACED]->(order:Order)-[:CONTAINS]->(product:Product), (product)-[:BELONGS_TO]->(category:Category), (customer)-[:LIVES_IN]->(city:City)RETURN customer.name, product.name, category.name, city.name // ========================================// NEGATIVE PATTERNS (NOT EXISTS)// ======================================== // Find people NOT connected to anyoneMATCH (p:Person)WHERE NOT (p)-[:KNOWS]-()RETURN p.name AS isolatedPerson // Find products never purchasedMATCH (p:Product)WHERE NOT ()-[:PURCHASED]->(p)RETURN p.name AS unsoldProduct // Find users who haven't interacted recentlyMATCH (u:User)WHERE NOT (u)-[:LOGGED_IN {date: date()}]-()RETURN u.email // Complex negation: Find A connected to B but NOT via CMATCH (a:Person {name: "Alice"})-[:KNOWS]->(b:Person)WHERE NOT EXISTS { MATCH (a)-[:KNOWS]->(c:Person {blocked: true})-[:KNOWS]->(b)}RETURN b.name AS cleanConnection // ========================================// OPTIONAL PATTERNS// ======================================== // Left outer join equivalentMATCH (e:Employee)OPTIONAL MATCH (e)-[:MANAGES]->(report:Employee)RETURN e.name, CASE WHEN report IS NULL THEN "Individual Contributor" ELSE collect(report.name) END AS reports // Multiple optional patternsMATCH (p:Person)OPTIONAL MATCH (p)-[:WORKS_FOR]->(c:Company)OPTIONAL MATCH (p)-[:STUDIED_AT]->(u:University)RETURN p.name, c.name AS employer, u.name AS almaMater // ========================================// CONDITIONAL PATTERNS (CASE in patterns)// ======================================== // Different aggregations based on pattern existenceMATCH (p:Product)OPTIONAL MATCH (p)<-[r:REVIEWED]-()WITH p, count(r) AS reviewCountRETURN p.name, CASE WHEN reviewCount = 0 THEN "No reviews" WHEN reviewCount < 10 THEN "Few reviews" WHEN reviewCount < 100 THEN "Popular" ELSE "Highly reviewed" END AS reviewStatus // ========================================// EXISTENTIAL PATTERNS// ======================================== // WHERE EXISTS (subquery)MATCH (p:Person)WHERE EXISTS { MATCH (p)-[:PURCHASED]->(:Product {category: "Electronics"})}RETURN p.name AS electronicsBuyer // COUNT pattern in WHEREMATCH (p:Person)WHERE COUNT { MATCH (p)-[:PURCHASED]->(:Product)} > 5RETURN p.name AS frequentBuyer // ========================================// UNION AND INTERSECTION PATTERNS// ======================================== // UNION: Either pattern matchesMATCH (p:Person)-[:WORKS_FOR]->(:Company {name: "TechCorp"})RETURN p.name AS employee, "TechCorp" AS sourceUNIONMATCH (p:Person)-[:WORKS_FOR]->(:Company {name: "InnovateCo"})RETURN p.name AS employee, "InnovateCo" AS source // INTERSECTION: Both patterns must matchMATCH (p:Person)-[:WORKS_FOR]->(:Company {industry: "Tech"})MATCH (p)-[:GRADUATED_FROM]->(:University {ranking: "Top 10"})RETURN p.name AS eliteEngineerSimple patterns (MATCH (a)-[:REL]-(b)) are more efficient than subqueries for basic matching. Use subquery syntax (EXISTS { }, COUNT { }) when you need to: aggregate within the condition, express complex negation, or combine multiple independent patterns. The query planner optimizes both, but simpler syntax is often faster.
While traversal is graph databases' strength, aggregation enables summarizing patterns across the graph—counting connections, measuring path costs, and computing network statistics.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
// ========================================// BASIC AGGREGATIONS// ======================================== // Count and groupingMATCH (p:Person)-[:WORKS_FOR]->(c:Company)RETURN c.name AS company, count(p) AS employees, collect(p.name) AS employeeNames // StatisticsMATCH (p:Person)-[r:RATED]->(m:Movie)RETURN m.title, count(r) AS ratingCount, avg(r.score) AS avgRating, min(r.score) AS minRating, max(r.score) AS maxRating, stDev(r.score) AS ratingVariance // Percentile (approximate)MATCH (p:Product)RETURN percentileDisc(p.price, 0.5) AS medianPrice, percentileDisc(p.price, 0.9) AS p90Price, percentileDisc(p.price, 0.99) AS p99Price // ========================================// DEGREE ANALYSIS// ======================================== // Find degree distributionMATCH (p:Person)WITH p, size((p)-[:KNOWS]-()) AS degreeRETURN degree, count(*) AS nodeCountORDER BY degree // Find high-degree nodes (hubs)MATCH (p:Person)WITH p, size((p)<-[:FOLLOWS]-()) AS inDegree, size((p)-[:FOLLOWS]->()) AS outDegreeWHERE inDegree > 1000RETURN p.name, inDegree, outDegree, toFloat(outDegree) / inDegree AS followBackRatioORDER BY inDegree DESC // ========================================// PATH-BASED AGGREGATIONS// ======================================== // Average path lengthMATCH path = (a:Person)-[:KNOWS*1..6]-(b:Person)WHERE a.id = "P001" AND a <> bRETURN avg(length(path)) AS avgPathLength, min(length(path)) AS shortestPath, max(length(path)) AS longestPath // Total path weightMATCH path = (start:City {name: "NYC"})-[:ROAD*]->(end:City {name: "Chicago"})WITH path, REDUCE(dist = 0, r IN relationships(path) | dist + r.distance) AS totalDistWHERE totalDist < 1000RETURN [n IN nodes(path) | n.name] AS route, totalDistORDER BY totalDistLIMIT 10 // ========================================// TEMPORAL AGGREGATIONS// ======================================== // Activity over timeMATCH (u:User)-[v:VISITED {timestamp: t}]->(p:Page)WITH u, date(t) AS visitDate, count(*) AS pageViewsRETURN u.id, visitDate, pageViewsORDER BY u.id, visitDate // Time-windowed aggregationMATCH (o:Order)WHERE o.createdAt >= datetime() - duration('P30D')WITH date(o.createdAt) AS orderDate, sum(o.total) AS dailyRevenueRETURN orderDate, dailyRevenueORDER BY orderDate // ========================================// RANKING AND TOP-N// ======================================== // Top N per group (using subquery)MATCH (c:Category)CALL { WITH c MATCH (c)<-[:IN_CATEGORY]-(p:Product) RETURN p ORDER BY p.sales DESC LIMIT 3}RETURN c.name, collect(p.name) AS topProducts // Running totals (ordered aggregation)MATCH (o:Order)WHERE o.customerId = "C001"WITH o ORDER BY o.createdAtWITH collect(o) AS ordersUNWIND range(0, size(orders)-1) AS idxRETURN orders[idx].orderId, orders[idx].total AS orderTotal, REDUCE(running = 0, i IN range(0, idx) | running + orders[i].total) AS runningTotal // ========================================// COHORT ANALYSIS// ======================================== // User cohorts by signup monthMATCH (u:User)-[:PURCHASED]->(p:Product)WITH u, date.truncate('month', u.createdAt) AS cohortMonthWITH cohortMonth, count(DISTINCT u) AS cohortSizeRETURN cohortMonth, cohortSizeORDER BY cohortMonth // Retention: first purchase vs repeatMATCH (c:Customer)OPTIONAL MATCH (c)-[p1:PURCHASED]->(first:Product)WITH c, min(p1.date) AS firstPurchaseOPTIONAL MATCH (c)-[p2:PURCHASED]->(repeat:Product)WHERE p2.date > firstPurchaseRETURN date.truncate('month', firstPurchase) AS cohort, count(DISTINCT c) AS totalCustomers, count(DISTINCT CASE WHEN p2 IS NOT NULL THEN c END) AS repeatCustomers, toFloat(count(DISTINCT CASE WHEN p2 IS NOT NULL THEN c END)) / count(DISTINCT c) AS retentionRatecollect() materializes results in memory—use LIMIT if you don't need all items. REDUCE is powerful but not indexed—aggregate on properties, not computed values when possible. For large-scale analytics, consider GDS (Graph Data Science) library which uses optimized in-memory representations.
Graph databases natively support classic graph algorithms that would require complex implementations in other systems. Neo4j's Graph Data Science (GDS) library provides production-ready implementations of 60+ algorithms.
| Category | Algorithms | Use Cases |
|---|---|---|
| Centrality | PageRank, Betweenness, Closeness, Degree | Influence ranking, bottleneck detection |
| Community Detection | Louvain, Label Propagation, K-Means | Clustering users, topic modeling |
| Path Finding | Dijkstra, A*, BFS/DFS, All Pairs | Routing, dependency resolution |
| Similarity | Jaccard, Cosine, Euclidean | Recommendations, duplicate detection |
| Link Prediction | Adamic-Adar, Common Neighbors | Friend suggestions, knowledge graphs |
| Embeddings | Node2Vec, FastRP, GraphSAGE | ML feature generation, transfer learning |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
// ========================================// CENTRALITY ALGORITHMS// ======================================== // PageRank - Identify influential nodesCALL gds.pageRank.stream('socialGraph')YIELD nodeId, scoreWITH gds.util.asNode(nodeId) AS person, scoreRETURN person.name, score AS influenceORDER BY score DESCLIMIT 20 // Betweenness Centrality - Find bridge nodesCALL gds.betweenness.stream('networkGraph')YIELD nodeId, scoreWITH gds.util.asNode(nodeId) AS node, scoreWHERE score > 0RETURN node.name, score AS bridgeScoreORDER BY score DESCLIMIT 10 // Degree Centrality - Connection countCALL gds.degree.stream('socialGraph', { orientation: 'UNDIRECTED'})YIELD nodeId, scoreRETURN gds.util.asNode(nodeId).name AS person, toInteger(score) AS connectionsORDER BY score DESCLIMIT 10 // ========================================// COMMUNITY DETECTION// ======================================== // Louvain - Detect communitiesCALL gds.louvain.stream('socialGraph')YIELD nodeId, communityIdWITH communityId, collect(gds.util.asNode(nodeId).name) AS membersRETURN communityId, size(members) AS size, members[0..5] AS sampleMembersORDER BY size DESCLIMIT 10 // Label Propagation - Fast community detectionCALL gds.labelPropagation.stream('socialGraph')YIELD nodeId, communityIdRETURN communityId, count(*) AS communitySizeORDER BY communitySize DESC // ========================================// PATH FINDING ALGORITHMS// ======================================== // Dijkstra Shortest Path (weighted)MATCH (source:City {name: "NYC"}), (target:City {name: "LA"})CALL gds.shortestPath.dijkstra.stream('roadNetwork', { sourceNode: source, targetNode: target, relationshipWeightProperty: 'distance'})YIELD totalCost, pathRETURN totalCost, [n IN nodes(path) | n.name] AS route // A* with heuristic (for geographic routing)MATCH (source:City {name: "Seattle"}), (target:City {name: "Miami"})CALL gds.shortestPath.astar.stream('flightNetwork', { sourceNode: source, targetNode: target, relationshipWeightProperty: 'duration', latitudeProperty: 'lat', longitudeProperty: 'lon'})YIELD totalCost, nodeIdsRETURN totalCost AS totalFlightTime, [id IN nodeIds | gds.util.asNode(id).name] AS route // All Shortest Paths (between all pairs in subset)MATCH (hub:City) WHERE hub.isHub = trueWITH collect(hub) AS hubsCALL gds.allShortestPaths.stream('roadNetwork', { sourceNodes: hubs, targetNodes: hubs, relationshipWeightProperty: 'distance'})YIELD sourceNodeId, targetNodeId, totalCostRETURN gds.util.asNode(sourceNodeId).name AS source, gds.util.asNode(targetNodeId).name AS target, totalCost // ========================================// SIMILARITY ALGORITHMS// ======================================== // Node Similarity (Jaccard)CALL gds.nodeSimilarity.stream('purchaseGraph')YIELD node1, node2, similarityWHERE similarity > 0.5RETURN gds.util.asNode(node1).name AS customer1, gds.util.asNode(node2).name AS customer2, similarityORDER BY similarity DESCLIMIT 20 // K-Nearest NeighborsCALL gds.knn.stream('productGraph', { topK: 10, nodeProperties: ['embedding'], similarityMetric: 'COSINE'})YIELD node1, node2, similarityRETURN gds.util.asNode(node1).name AS product, gds.util.asNode(node2).name AS similarProduct, similarity // ========================================// NODE EMBEDDINGS (for ML)// ======================================== // FastRP - Fast Random Projection embeddingsCALL gds.fastRP.stream('knowledgeGraph', { embeddingDimension: 128, iterationWeights: [0.0, 0.5, 1.0]})YIELD nodeId, embeddingRETURN gds.util.asNode(nodeId).name AS entity, embedding[0..5] AS embeddingPreview // Node2Vec - Graph neural network embeddingsCALL gds.node2vec.write('socialGraph', { embeddingDimension: 64, walkLength: 20, walksPerNode: 10, writeProperty: 'n2vEmbedding'})YIELD nodePropertiesWrittenRETURN nodePropertiesWrittenGDS algorithms operate on projected graphs (in-memory representations). Workflow: 1) Create projection: gds.graph.project('myGraph', 'Person', 'KNOWS') 2) Run algorithm: gds.pageRank.stream('myGraph') 3) Drop when done: gds.graph.drop('myGraph'). Projections use significant memory—size them appropriately.
Understanding how Neo4j executes queries is essential for writing performant code. The query planner transforms Cypher into execution plans, and understanding these plans enables optimization.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
// ========================================// QUERY ANALYSIS TOOLS// ======================================== // EXPLAIN - Show plan without executingEXPLAIN MATCH (p:Person)-[:WORKS_FOR]->(c:Company)WHERE p.email = "alice@example.com"RETURN p.name, c.name // PROFILE - Execute and show detailed metricsPROFILE MATCH (p:Person)-[:WORKS_FOR]->(c:Company)WHERE p.email = "alice@example.com"RETURN p.name, c.name // Key metrics to watch:// - "Rows" - number of intermediate results// - "DbHits" - database page accesses// - "NodeByLabelScan" - full scan (often slow)// - "NodeIndexSeek" - indexed lookup (fast) // ========================================// OPTIMIZATION PATTERNS// ======================================== // ANTIPATTERN: Unanchored traversalMATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)WHERE c.name = "Alice" // Filter at the ENDRETURN a.name // Scans entire Person label first! // OPTIMIZED: Anchor on indexed property firstMATCH (c:Person {name: "Alice"})<-[:KNOWS]-(b:Person)<-[:KNOWS]-(a:Person)RETURN a.name // Starts from indexed node, expands backward // ANTIPATTERN: Repeating patternsMATCH (p:Person)-[:PURCHASED]->(product:Product)MATCH (p)-[:LIVES_IN]->(city:City) // p is re-matched!RETURN p.name, product.name, city.name // OPTIMIZED: Single pattern or explicit variable reuseMATCH (p:Person)-[:PURCHASED]->(product:Product), (p)-[:LIVES_IN]->(city:City)RETURN p.name, product.name, city.name // ANTIPATTERN: Collect then filterMATCH (c:Company)WITH c, [(c)<-[:WORKS_FOR]-(e:Employee) | e] AS employeesWHERE size(employees) > 100RETURN c.name // OPTIMIZED: Count without materializingMATCH (c:Company)WHERE size((c)<-[:WORKS_FOR]-()) > 100RETURN c.name // ANTIPATTERN: Unnecessary OPTIONAL MATCHMATCH (p:Person)OPTIONAL MATCH (p)-[:FRIEND]->(f:Person)WITH p, count(f) AS friendCountWHERE friendCount > 0 // Could have used regular MATCH!RETURN p.name // ========================================// INDEX HINTS// ======================================== // Force specific index usageMATCH (p:Person)USING INDEX p:Person(email)WHERE p.email STARTS WITH "alice"RETURN p.name // Force scan (rare, for testing)MATCH (p:Person)USING SCAN p:PersonWHERE p.email STARTS WITH "alice"RETURN p.name // ========================================// JOIN HINTS// ======================================== // Control join order (expert feature)MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)USING JOIN ON bWHERE a.country = "USA" AND c.country = "UK"RETURN a.name, c.name // ========================================// BATCH PROCESSING FOR LARGE OPERATIONS// ======================================== // Instead of loading all at once:// LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row// CREATE (:Person {id: row.id, name: row.name}) // Memory explosion! // Use APOC periodic iterate:CALL apoc.periodic.iterate( "LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row RETURN row", "CREATE (:Person {id: row.id, name: row.name})", {batchSize: 1000, parallel: true})YIELD batches, total, errorMessagesRETURN batches, totalIf your PROFILE shows "Cartesian Product" with large row counts, your query has disconnected patterns being combined via cross-join. This is exponential complexity. Ensure all patterns share variables or are connected via OPTIONAL MATCH if intentional.
Let's examine production query patterns for common graph database use cases—demonstrating how the techniques we've learned combine into practical solutions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
// ========================================// FRAUD DETECTION - Suspicious Patterns// ======================================== // Detect money laundering: rapid circular transfersMATCH cycle = (origin:Account)-[:TRANSFERRED*3..6]->(origin)WHERE ALL(r IN relationships(cycle) WHERE r.timestamp > datetime() - duration('PT1H'))WITH origin, cycle, REDUCE(total = 0, r IN relationships(cycle) | total + r.amount) AS cycleValueWHERE cycleValue > 10000RETURN origin.id, length(cycle) AS hops, cycleValue, [n IN nodes(cycle) | n.id] AS accountChain // First-time connections to high-risk entitiesMATCH (user:Account)-[:TRANSFERRED]->(:Account)-[:TRANSFERRED*1..2]->(flagged:Account {suspicious: true})WHERE NOT (user)-[:TRANSFERRED]->(:Account)-[:TRANSFERRED*1..2]->(flagged) // This was the first such connection AND NOT EXISTS { MATCH (user)-[old:TRANSFERRED]->() WHERE old.timestamp < datetime() - duration('P30D') }RETURN user.id, flagged.id // ========================================// RECOMMENDATION ENGINE// ======================================== // Collaborative filtering: users who bought X also bought YMATCH (targetUser:Customer {id: "C001"})-[:PURCHASED]->(product:Product)WITH targetUser, collect(product) AS purchasedProductsMATCH (product)<-[:PURCHASED]-(similar:Customer)-[:PURCHASED]->(recommended:Product)WHERE product IN purchasedProducts AND NOT recommended IN purchasedProducts AND similar <> targetUserWITH recommended, count(DISTINCT similar) AS recommenderCountORDER BY recommenderCount DESCLIMIT 10RETURN recommended.name, recommenderCount // Content-based + social: friends' purchases in preferred categoriesMATCH (me:Customer {id: "C001"})-[:FOLLOWS]->(friend:Customer), (me)-[:PURCHASED]->(:Product)-[:IN_CATEGORY]->(preferredCat:Category)WITH me, collect(DISTINCT friend) AS friends, collect(DISTINCT preferredCat) AS categoriesMATCH (f)-[:PURCHASED]->(rec:Product)-[:IN_CATEGORY]->(cat)WHERE f IN friends AND cat IN categories AND NOT (me)-[:PURCHASED]->(rec)RETURN rec.name, cat.name, count(DISTINCT f) AS friendsWhoPurchasedORDER BY friendsWhoPurchased DESCLIMIT 20 // ========================================// KNOWLEDGE GRAPH QUERIES// ======================================== // Entity resolution: find all mentions of same entityMATCH (e:Entity)WHERE e.name CONTAINS "Apple"OPTIONAL MATCH (e)-[:SAME_AS*0..3]-(alias:Entity)RETURN e.name, CASE WHEN alias IS NULL THEN [] ELSE collect(DISTINCT alias.name) END AS aliases // Reasoning chains: evidence for factMATCH path = (claim:Claim)-[:SUPPORTED_BY*1..5]->(evidence:Evidence)WHERE claim.statement = "Climate change is anthropogenic"RETURN [n IN nodes(path) | n.statement] AS reasoningChain, length(path) AS evidenceDepth // ========================================// ACCESS CONTROL - RBAC Graph// ======================================== // Check permission via role inheritanceMATCH (user:User {email: $userEmail})-[:HAS_ROLE*1..3]->(role:Role), (role)-[:GRANTS]->(permission:Permission), (permission)-[:ON]->(resource:Resource {id: $resourceId})WHERE permission.action = $requestedActionRETURN count(*) > 0 AS hasAccess // Audit: who can access this resource?MATCH (resource:Resource {id: "FINANCE_REPORT"})MATCH path = (resource)<-[:ON]-(:Permission)<-[:GRANTS]-(:Role)<-[:HAS_ROLE*1..3]-(user:User)RETURN DISTINCT user.name, [r IN relationships(path) | type(r)] AS accessPath // ========================================// NETWORK ANALYSIS// ======================================== // Find critical path nodes (if removed, disconnects network)MATCH (a:Server)-[:CONNECTS*]-(b:Server)WHERE a.id = "CORE_SERVER" AND a <> bWITH collect(DISTINCT b) AS reachableMATCH (candidate:Server)WHERE candidate.id <> "CORE_SERVER"WITH candidate, reachable, size(reachable) AS originalReachMATCH (a:Server {id: "CORE_SERVER"})-[:CONNECTS*]-(b:Server)WHERE b <> candidate AND NOT (b)-[:CONNECTS*]-(candidate)-[:CONNECTS*]-(a)WITH candidate, originalReach, count(DISTINCT b) AS reachWithoutCandidateWHERE reachWithoutCandidate < originalReachRETURN candidate.id AS criticalNode, originalReach - reachWithoutCandidate AS impactedNodesProduction queries typically combine multiple techniques: indexed anchors → filtered traversals → aggregated results. Start with a simple pattern that returns correct results, then use PROFILE to identify bottlenecks, add indexes, and refine filters until performance meets requirements.
We've comprehensively explored graph query techniques—from traversal fundamentals through advanced algorithms. Let's consolidate the key insights:
What's next:
With query mastery established, we'll explore Use Cases in depth—examining the specific domains where graph databases provide transformative advantages, including social networks, fraud detection, recommendations, knowledge graphs, and network analysis.
You now possess advanced graph query capabilities—traversal patterns, path analysis, pattern matching, aggregation, graph algorithms, optimization techniques, and production query patterns. Next, we'll apply these skills to specific industry use cases.