Loading content...
Graph databases aren't universal solutions—they're specialized tools that excel when data is inherently connected and relationships drive business value. The difference between "possible with difficulty" and "trivially elegant" becomes apparent in specific domains where the shape of your questions matches the shape of graphs.
Consider: LinkedIn computes professional network insights across 900+ million members. Facebook surfaces relevant content from billions of posts through social graph analysis. Financial institutions detect fraud by identifying suspicious transaction patterns in real-time. Each of these would be prohibitively expensive with non-graph approaches.
This page examines where graph databases provide decisive advantages—and equally importantly, where they don't.
By the end of this page, you will understand graph database use cases across major domains—social networks, fraud detection, recommendation engines, knowledge graphs, and network/IT operations—with concrete examples, query patterns, architecture considerations, and guidance on when to choose graphs over alternatives.
Social networks are the canonical graph use case. The data is inherently a graph—users are nodes, relationships (follows, friends, blocks) are edges—and core features require graph traversal:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
// ========================================// FRIEND SUGGESTIONS (People You May Know)// ======================================== // Friends of friends, weighted by mutual connectionsMATCH (me:User {id: $userId})-[:FRIEND]->(friend:User)-[:FRIEND]->(suggested:User)WHERE NOT (me)-[:FRIEND]-(suggested) AND me <> suggested AND NOT (me)-[:BLOCKED]-(suggested)WITH suggested, count(DISTINCT friend) AS mutualFriends, collect(friend.name)[0..3] AS sampleMutualsWHERE mutualFriends >= 2RETURN suggested.id, suggested.name, mutualFriends, sampleMutualsORDER BY mutualFriends DESCLIMIT 20 // Enhanced: include workplace/school overlapMATCH (me:User {id: $userId})OPTIONAL MATCH (me)-[:WORKS_AT]->(company:Company)<-[:WORKS_AT]-(coworker:User)OPTIONAL MATCH (me)-[:STUDIED_AT]->(school:School)<-[:STUDIED_AT]-(classmate:User)OPTIONAL MATCH (me)-[:FRIEND]->(:User)-[:FRIEND]->(foaf:User)WHERE NOT (me)-[:FRIEND]-(coworker) AND NOT (me)-[:FRIEND]-(classmate)WITH me, collect(DISTINCT {user: coworker, source: "work"}) + collect(DISTINCT {user: classmate, source: "school"}) + collect(DISTINCT {user: foaf, source: "mutual"}) AS suggestionsUNWIND suggestions AS sWHERE s.user IS NOT NULL AND s.user <> meRETURN s.user.name, collect(s.source) AS connectionTypes, size(collect(s.source)) AS connectionStrengthORDER BY connectionStrength DESCLIMIT 15 // ========================================// CONNECTION DEGREE// ======================================== // Find connection path between two usersMATCH path = shortestPath( (me:User {id: $myId})-[:FRIEND*..6]-(target:User {id: $targetId}))RETURN CASE length(path) WHEN 1 THEN "1st degree (direct connection)" WHEN 2 THEN "2nd degree" WHEN 3 THEN "3rd degree" ELSE toString(length(path)) + " degrees away" END AS connectionDegree, [n IN nodes(path) | n.name] AS connectionPath // ========================================// FEED RANKING (Social Proximity)// ======================================== // Posts from network, scored by social distanceMATCH (me:User {id: $userId})-[:FRIEND*1..2]-(author:User)-[:POSTED]->(post:Post)WHERE post.createdAt > datetime() - duration('P7D')WITH post, author, CASE size((me)-[:FRIEND*1..1]-(author)) WHEN 1 THEN 1.0 // Direct friend ELSE 0.5 // Friend of friend END AS socialScore, duration.between(post.createdAt, datetime()).hours AS hoursAgoWITH post, author, socialScore, 1.0 / (1.0 + toFloat(hoursAgo) / 24) AS recencyScore, toFloat(post.likes + post.comments * 2) / 100 AS engagementScoreRETURN post.id, author.name, post.content, socialScore * 0.4 + recencyScore * 0.3 + engagementScore * 0.3 AS feedScoreORDER BY feedScore DESCLIMIT 50 // ========================================// INFLUENCE ANALYSIS// ======================================== // Identify influencers via follower analysisMATCH (user:User)WITH user, size((user)<-[:FOLLOWS]-()) AS followers, size((user)-[:FOLLOWS]->()) AS followingWHERE followers > 10000RETURN user.name, followers, following, toFloat(followers) / CASE following WHEN 0 THEN 1 ELSE following END AS influenceRatioORDER BY followers DESCLIMIT 50At major social network scale (billions of users), even graph databases require sharding and caching. Common patterns: 1) Cache frequently accessed friendship lists, 2) Pre-compute friend suggestions offline, update periodically, 3) Shard by user ID with cross-shard traversal for inter-shard connections, 4) Use read replicas to scale query throughput.
Fraud detection is where graph databases provide their most dramatic advantage. Fraudsters operate in networks—sharing accounts, devices, addresses, and payment methods. Patterns invisible in tabular data become obvious in graph form.
Why Graphs Excel at Fraud:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
// ========================================// FRAUD RING DETECTION// ======================================== // Find accounts sharing multiple identifiers (classic ring pattern)MATCH (a1:Account)-[:USES]->(shared)<-[:USES]-(a2:Account)WHERE a1 <> a2 AND (shared:Device OR shared:IPAddress OR shared:Phone OR shared:Address)WITH a1, a2, count(DISTINCT shared) AS sharedIdentifiers, collect(labels(shared)[0]) AS identifierTypesWHERE sharedIdentifiers >= 2RETURN a1.id, a2.id, sharedIdentifiers, identifierTypesORDER BY sharedIdentifiers DESC // Extended: find connected fraud communitiesMATCH (suspicious:Account)-[:USES*1..2]-(connected:Account)WHERE suspicious.riskScore > 80WITH suspicious, connected, shortestPath((suspicious)-[:USES*]-(connected)) AS pathWHERE length(path) <= 3 AND suspicious <> connectedRETURN suspicious.id AS suspiciousAccount, collect(DISTINCT connected.id) AS connectedAccounts, size(collect(DISTINCT connected.id)) AS networkSizeORDER BY networkSize DESC // ========================================// MONEY LAUNDERING - CIRCULAR TRANSFERS// ======================================== // Detect circular money flows (structuring)MATCH cycle = (origin:Account)-[:TRANSFERRED*3..8]->(origin)WHERE ALL(r IN relationships(cycle) WHERE r.timestamp > datetime() - duration('P1D') AND r.amount < 10000) // Under reporting thresholdWITH origin, cycle, REDUCE(total = 0, r IN relationships(cycle) | total + r.amount) AS totalFlow, length(cycle) AS hopsWHERE totalFlow > 50000 // Significant total despite small individual transfersRETURN origin.id, [n IN nodes(cycle) | n.id] AS flowPath, totalFlow, hops // Rapid pass-through (layering)MATCH (source:Account)-[t1:TRANSFERRED]->(intermediary:Account)-[t2:TRANSFERRED]->(dest:Account)WHERE t1.timestamp > datetime() - duration('PT1H') AND t2.timestamp > t1.timestamp AND duration.between(t1.timestamp, t2.timestamp).minutes < 30 AND t1.amount > 5000 AND abs(t1.amount - t2.amount) < 100 // Nearly same amount AND NOT (source)-[:NORMAL_BUSINESS_WITH]-(dest)RETURN source.id AS sourceAccount, intermediary.id AS layeringAccount, dest.id AS destinationAccount, t1.amount, t2.amount // ========================================// SYNTHETIC IDENTITY DETECTION// ======================================== // Identities with unusual attribute sharing patternsMATCH (identity:Person)OPTIONAL MATCH (identity)-[:HAS_SSN]->(ssn:SSN)<-[:HAS_SSN]-(other1:Person)OPTIONAL MATCH (identity)-[:HAS_ADDRESS]->(addr:Address)<-[:HAS_ADDRESS]-(other2:Person)OPTIONAL MATCH (identity)-[:HAS_PHONE]->(phone:Phone)<-[:HAS_PHONE]-(other3:Person)WITH identity, count(DISTINCT other1) AS ssnSharing, count(DISTINCT other2) AS addressSharing, count(DISTINCT other3) AS phoneSharing, identity.accountAge AS accountAgeWHERE ssnSharing > 0 OR (addressSharing > 3 AND accountAge < 365) OR (phoneSharing > 5)RETURN identity.id, ssnSharing, addressSharing, phoneSharing, accountAge, ssnSharing * 10 + addressSharing * 3 + phoneSharing * 2 AS riskScoreORDER BY riskScore DESC // ========================================// REAL-TIME TRANSACTION SCORING// ======================================== // Score transaction at payment timeWITH $transactionData AS txnMATCH (payer:Account {id: txn.payerId})MATCH (payee:Account {id: txn.payeeId}) // Check for shared identifiers with flagged accountsOPTIONAL MATCH (payer)-[:USES*1..2]-(flagged:Account {status: "FLAGGED"})WITH payer, payee, txn, count(DISTINCT flagged) AS flaggedConnections // Check payee's transaction patternOPTIONAL MATCH (payee)<-[recent:TRANSFERRED]-(others:Account)WHERE recent.timestamp > datetime() - duration('PT1H')WITH payer, payee, txn, flaggedConnections, count(DISTINCT others) AS recentUniquePayers // ScoringRETURN txn.id AS transactionId, CASE WHEN flaggedConnections > 0 THEN "BLOCK" WHEN recentUniquePayers > 20 AND txn.amount > 1000 THEN "REVIEW" WHEN flaggedConnections = 0 AND recentUniquePayers < 5 THEN "APPROVE" ELSE "REVIEW" END AS decision, flaggedConnections, recentUniquePayers| Fraud Pattern | Relational Approach | Graph Approach | Advantage |
|---|---|---|---|
| Shared device detection | Complex JOINs, slow at scale | 2-hop traversal, milliseconds | 100-1000x faster |
| Circular transactions | Recursive CTEs, often timeout | Native cycle detection | Makes possible what was impractical |
| Ring detection | Multiple self-joins | Community detection algorithm | Algorithmic scalability |
| Real-time scoring | Multiple queries, aggregation | Single traversal query | Low latency at transaction time |
Production fraud systems typically combine: 1) Rule engines for known patterns, 2) ML models for transaction scoring, 3) Graph databases for relationship analysis. The graph layer often runs in parallel with ML scoring, and decisions combine signals from both. Graph excels at detecting novel fraud patterns that rule systems miss.
Recommendations are inherently graph problems: users connect to items they've purchased, rated, or viewed; items connect to categories, attributes, and other items. The question "What should I recommend?" becomes "What paths lead from this user to items they might like?"
Graph-Based Recommendation Approaches:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
// ========================================// COLLABORATIVE FILTERING// ======================================== // "Customers who bought this also bought..."MATCH (p:Product {id: $productId})<-[:PURCHASED]-(c:Customer)-[:PURCHASED]->(other:Product)WHERE other <> p AND NOT other.discontinued = trueWITH other, count(DISTINCT c) AS coPurchaseCountORDER BY coPurchaseCount DESCLIMIT 10RETURN other.id, other.name, coPurchaseCount // User-based: find similar users, recommend their favoritesMATCH (me:Customer {id: $customerId})-[:PURCHASED]->(myProducts:Product)WITH me, collect(myProducts) AS myPurchasesMATCH (similar:Customer)-[:PURCHASED]->(p:Product)WHERE p IN myPurchases AND similar <> meWITH me, similar, myPurchases, count(p) AS overlapCountWHERE overlapCount >= 3 // Minimum overlap for similarityORDER BY overlapCount DESCLIMIT 10WITH me, myPurchases, collect(similar) AS topSimilarUsersMATCH (s)-[:PURCHASED]->(rec:Product)WHERE s IN topSimilarUsers AND NOT rec IN myPurchasesRETURN rec.name, count(DISTINCT s) AS recommendedByORDER BY recommendedBy DESCLIMIT 20 // ========================================// CONTENT-BASED FILTERING// ======================================== // Similar products by shared attributesMATCH (target:Product {id: $productId})-[:HAS_ATTRIBUTE]->(attr:Attribute)WITH target, collect(attr) AS targetAttrsMATCH (other:Product)-[:HAS_ATTRIBUTE]->(a:Attribute)WHERE other <> target AND a IN targetAttrsWITH target, other, count(a) AS sharedAttrs, targetAttrsWITH target, other, sharedAttrs, toFloat(sharedAttrs) / size(targetAttrs) AS similarityWHERE similarity > 0.5RETURN other.name, similarity, sharedAttrsORDER BY similarity DESCLIMIT 10 // Category + brand + price range similarityMATCH (target:Product {id: $productId})MATCH (similar:Product)WHERE similar <> target AND similar.category = target.category AND abs(similar.price - target.price) / target.price < 0.3OPTIONAL MATCH (target)-[:BY_BRAND]->(b:Brand)<-[:BY_BRAND]-(similar)RETURN similar.name, similar.price, CASE WHEN b IS NOT NULL THEN 1.5 ELSE 1.0 END AS brandBoostORDER BY brandBoost DESC, abs(similar.price - target.price)LIMIT 10 // ========================================// HYBRID RECOMMENDATIONS// ======================================== // Combining social + content + behavioral signalsMATCH (me:Customer {id: $customerId}) // Social: what are friends buying?OPTIONAL MATCH (me)-[:FRIEND]-(friend:Customer)-[:PURCHASED]->(socialRec:Product)WHERE NOT (me)-[:PURCHASED]->(socialRec)WITH me, collect({product: socialRec, score: 1.0, source: "social"}) AS socialRecs // Behavioral: based on browsing historyOPTIONAL MATCH (me)-[:VIEWED]->(viewed:Product)-[:SIMILAR_TO]->(behavRec:Product)WHERE NOT (me)-[:PURCHASED]->(behavRec)WITH me, socialRecs, collect({product: behavRec, score: 0.8, source: "behavioral"}) AS behavRecs // Content: based on past purchasesOPTIONAL MATCH (me)-[:PURCHASED]->(:Product)-[:IN_CATEGORY]->(cat:Category)<-[:IN_CATEGORY]-(contentRec:Product)WHERE NOT (me)-[:PURCHASED]->(contentRec)WITH me, socialRecs + behavRecs + collect({product: contentRec, score: 0.6, source: "content"}) AS allRecsUNWIND allRecs AS recWHERE rec.product IS NOT NULLWITH rec.product AS product, sum(rec.score) AS totalScore, collect(rec.source) AS sourcesRETURN product.name, totalScore, sourcesORDER BY totalScore DESCLIMIT 20 // ========================================// SESSION-BASED RECOMMENDATIONS// ======================================== // "Based on your current session..."WITH $sessionProducts AS viewedIdsMATCH (viewed:Product) WHERE viewed.id IN viewedIds // Find products commonly viewed together in other sessionsMATCH (session:Session)-[:CONTAINED]->(viewed), (session)-[:CONTAINED]->(nextItem:Product)WHERE NOT nextItem.id IN viewedIdsWITH nextItem, count(DISTINCT session) AS coViewCountORDER BY coViewCount DESCLIMIT 10RETURN nextItem.name, coViewCount // ========================================// PRE-COMPUTED SIMILARITY// ======================================== // For production: compute similarity offline, query in real-time// Offline job creates SIMILAR_TO relationships:MATCH (p1:Product), (p2:Product)WHERE p1 <> p2MATCH (p1)-[:IN_CATEGORY]->(c:Category)<-[:IN_CATEGORY]-(p2)MATCH (p1)<-[:PURCHASED]-(buyer:Customer)-[:PURCHASED]->(p2)WITH p1, p2, count(DISTINCT c) AS catOverlap, count(DISTINCT buyer) AS buyerOverlapWHERE buyerOverlap >= 10MERGE (p1)-[s:SIMILAR_TO]-(p2)SET s.score = catOverlap * 0.3 + buyerOverlap * 0.7, s.computedAt = datetime() // Real-time query uses pre-computed edges:MATCH (p:Product {id: $productId})-[s:SIMILAR_TO]-(rec:Product)RETURN rec.name, s.scoreORDER BY s.score DESCLIMIT 10Real-time collaborative filtering doesn't scale to billions of users. Production systems typically: 1) Pre-compute similarity matrices offline, 2) Store top-N similar items/users per entity, 3) Query uses simple lookups of pre-computed relationships, 4) Update periodically (hourly/daily) rather than real-time. Graph databases excel at the pre-computation phase and storing the resulting similarity graph.
Knowledge graphs represent structured knowledge as interconnected entities and relationships—powering search engines (Google Knowledge Graph), virtual assistants (Alexa, Siri), and enterprise data integration.
Knowledge Graph Components:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117
// ========================================// ENTITY MODELING// ======================================== // Create knowledge graph entities with rich typingCREATE (einstein:Person:Scientist:Physicist { name: "Albert Einstein", birthDate: date("1879-03-14"), birthPlace: "Ulm, German Empire", knownFor: ["Theory of Relativity", "E=mc²", "Photoelectric Effect"]}) CREATE (relativity:Theory:PhysicsTheory { name: "Theory of Relativity", published: date("1905-09-26"), type: "Special Relativity"}) CREATE (nobelPhysics:Award:NobelPrize { name: "Nobel Prize in Physics", year: 1921, category: "Physics"}) CREATE (eth:Institution:University { name: "ETH Zurich", location: "Zurich, Switzerland"}) // Create relationships with contextCREATE (einstein)-[:FORMULATED {year: 1905, context: "annus mirabilis papers"}]->(relativity)CREATE (einstein)-[:RECEIVED {year: 1921, citation: "photoelectric effect"}]->(nobelPhysics)CREATE (einstein)-[:STUDIED_AT {from: 1896, to: 1900, degree: "diploma"}]->(eth)CREATE (einstein)-[:WORKED_AT {from: 1912, to: 1914, role: "Professor"}]->(eth) // ========================================// KNOWLEDGE QUERIES// ======================================== // Question: "Who developed the Theory of Relativity?"MATCH (p:Person)-[:FORMULATED]->(t:Theory {name: "Theory of Relativity"})RETURN p.name // Question: "What awards did Einstein receive?"MATCH (p:Person {name: "Albert Einstein"})-[r:RECEIVED]->(a:Award)RETURN a.name, r.year, r.citation // Question: "What other theories were developed at ETH Zurich?"MATCH (p:Person)-[:STUDIED_AT|WORKED_AT]->(:Institution {name: "ETH Zurich"})MATCH (p)-[:FORMULATED]->(theory:Theory)RETURN DISTINCT theory.name, collect(p.name) AS developers // ========================================// INFERENCE AND REASONING// ======================================== // Transitive relationships: advisors chainMATCH path = (student:Person)-[:ADVISED_BY*]->(ancestor:Person)WHERE student.name = "Current PhD Student"RETURN [n IN nodes(path) | n.name] AS academicLineage, length(path) AS generations // Type inference: derive is_a relationshipsMATCH (e:Entity)WHERE e:ScientistMATCH (e)-[:WORKS_IN]->(field:Field)SET e:ResearcherRETURN e.name, labels(e) // Conflict detection: same entity, different factsMATCH (e:Entity)-[r1]->(value1), (e)-[r2]->(value2)WHERE type(r1) = type(r2) AND value1 <> value2 AND type(r1) IN ["BORN_IN", "DIED_IN"]RETURN e.name, type(r1), value1, value2 // ========================================// ENTITY RESOLUTION// ======================================== // Find duplicate entities (same person, different nodes)MATCH (e1:Person), (e2:Person)WHERE id(e1) < id(e2) AND (e1.name = e2.name OR (e1.birthDate = e2.birthDate AND e1.birthPlace = e2.birthPlace))RETURN e1.name, e2.name, CASE WHEN e1.name = e2.name THEN "name_match" ELSE "attribute_match" END AS matchType // Create SAME_AS relationships for linked entitiesMATCH (e1:Person), (e2:Person)WHERE e1.externalId = e2.wikiDataIdMERGE (e1)-[:SAME_AS]-(e2) // Query through SAME_AS for unified viewMATCH (e:Person {name: "Marie Curie"})-[:SAME_AS*0..1]-(alias)WITH collect(DISTINCT e) + collect(DISTINCT alias) AS allRepresentationsUNWIND allRepresentations AS entityMATCH (entity)-[r]->(related)RETURN type(r) AS relationship, collect(DISTINCT related.name) AS relatedEntities // ========================================// SEMANTIC SEARCH// ======================================== // Find entities by relationship contextMATCH (p:Person)-[:WON]->(:Award)<-[:WON]-(peer:Person), (p)-[:WORKED_IN]->(field:Field)<-[:WORKED_IN]-(peer)WHERE p.name = "Richard Feynman" AND p <> peerRETURN peer.name, field.name AS sharedField // Path-based similarityMATCH path = (e1:Concept {name: "Machine Learning"})-[:RELATED_TO*1..3]-(e2:Concept)RETURN e2.name, length(path) AS distanceORDER BY distanceLIMIT 10Knowledge graphs differ fundamentally from traditional databases: they're designed for connecting diverse data sources with semantic relationships. Where a relational DB stores structured records, a KG stores facts that can be reasoned about. Common sources: structured databases, documents (via NLP entity extraction), external APIs, and manual curation.
IT infrastructure is inherently a graph: servers connect to other servers, applications depend on services, users access resources through permission chains. Graph databases enable powerful capabilities for network management, impact analysis, and security.
IT Operations Graph Use Cases:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
// ========================================// DEPENDENCY MAPPING// ======================================== // Model application dependenciesCREATE (frontend:Application {name: "Web Frontend", tier: "presentation"})CREATE (api:Application {name: "API Gateway", tier: "application"})CREATE (userSvc:Service {name: "UserService", tier: "service"})CREATE (orderSvc:Service {name: "OrderService", tier: "service"})CREATE (userDb:Database {name: "UserDB", type: "PostgreSQL"})CREATE (orderDb:Database {name: "OrderDB", type: "PostgreSQL"})CREATE (cache:Cache {name: "Redis", type: "Redis"}) CREATE (frontend)-[:CALLS]->(api)CREATE (api)-[:CALLS]->(userSvc)CREATE (api)-[:CALLS]->(orderSvc)CREATE (userSvc)-[:USES]->(userDb)CREATE (userSvc)-[:USES]->(cache)CREATE (orderSvc)-[:USES]->(orderDb)CREATE (orderSvc)-[:CALLS]->(userSvc) // What does the frontend depend on (recursively)?MATCH (frontend:Application {name: "Web Frontend"})-[:CALLS|USES*]->(dependency)RETURN DISTINCT labels(dependency)[0] AS type, dependency.name AS name // ========================================// IMPACT ANALYSIS// ======================================== // If UserDB goes down, what's affected?MATCH (failedComponent {name: "UserDB"})MATCH (failedComponent)<-[:USES|CALLS*]-(affected)RETURN DISTINCT affected.name AS affectedComponent, labels(affected)[0] AS type, length(shortestPath((affected)-[:USES|CALLS*]->(failedComponent))) AS dependencyDepthORDER BY dependencyDepth // Blast radius: count of affected components by tierMATCH (failed:Database {name: "UserDB"})MATCH (failed)<-[:USES|CALLS*]-(affected)RETURN affected.tier AS tier, count(DISTINCT affected) AS affectedCountORDER BY affectedCount DESC // Critical path: components with no redundancyMATCH (client:Application)-[:CALLS]->(singleDep)-[:USES]->(resource), (singleDep)-[:USES]->(resource2)WHERE NOT (client)-[:CALLS]->(:Service)<>singleDep-[:USES]->(resource)RETURN client.name, singleDep.name AS singlePointOfFailure // ========================================// ROOT CAUSE ANALYSIS// ======================================== // Trace error propagation pathWITH $errorServiceId AS errorOriginMATCH (origin {id: errorOrigin})MATCH path = (origin)<-[:CALLS*1..10]-(caller)WHERE ALL(node IN nodes(path) WHERE node.lastError IS NOT NULL)RETURN [n IN nodes(path) | n.name] AS errorPropagationPath, [n IN nodes(path) | n.lastError] AS errors, origin.name AS rootCause // Find correlated failures (likely common cause)MATCH (f1:Component)-[:ERROR_AT {time: $errorTime}]->(:ErrorLog)MATCH (f2:Component)-[:ERROR_AT {time: $errorTime}]->(:ErrorLog)WHERE f1 <> f2MATCH (f1)-[:DEPENDS_ON*1..3]->(common)<-[:DEPENDS_ON*1..3]-(f2)RETURN common.name AS potentialRootCause, collect(DISTINCT f1.name) + collect(DISTINCT f2.name) AS affectedServices // ========================================// SECURITY - ACCESS PATH ANALYSIS// ======================================== // Can this user access this resource? (RBAC path)MATCH path = (user:User {email: $userEmail})-[:MEMBER_OF*0..3]-> (:Group)-[:HAS_ROLE]->(:Role)-[:PERMITS]-> (:Permission)-[:ON]->(resource:Resource {id: $resourceId})RETURN count(path) > 0 AS hasAccess, [n IN nodes(path) | coalesce(n.name, n.email)] AS accessPath // Attack surface: externally accessible paths to sensitive dataMATCH path = (external:ExternalEndpoint)-[:CONNECTS_TO*0..5]->(sensitive:DataStore)WHERE sensitive.classification = "CONFIDENTIAL"RETURN external.name AS entryPoint, [n IN nodes(path) | n.name] AS attackPath, length(path) AS hopsORDER BY hops // Find overprivileged usersMATCH (u:User)-[:MEMBER_OF*1..3]->(:Group)-[:HAS_ROLE]->(r:Role)-[:PERMITS]->(p:Permission)WITH u, count(DISTINCT p) AS permCount, collect(p.name) AS permissionsWHERE permCount > 20RETURN u.name, permCount, permissions[0..5] AS samplePermissionsORDER BY permCount DESC // ========================================// CONFIGURATION DRIFT DETECTION// ======================================== // Compare current vs baseline configurationMATCH (current:Server)-[:HAS_CONFIG {version: "current"}]->(config:Config)MATCH (baseline:Server)-[:HAS_CONFIG {version: "baseline"}]->(baseConfig:Config)WHERE current.id = baseline.id AND config <> baseConfigRETURN current.name AS server, config.setting AS currentSetting, baseConfig.setting AS baselineSetting, config.value <> baseConfig.value AS driftedCommon mistakes when building IT graphs: 1) Modeling everything as generic DEPENDS_ON (lose semantic value), 2) Not capturing relationship direction (A calls B ≠ B calls A), 3) Ignoring temporal aspects (configurations change), 4) Over-connecting (every microservice to every database creates noise). Be specific about relationship types and maintain historical snapshots.
Beyond the major domains, graph databases solve diverse problems wherever connected data matters:
| Domain | Use Case | Key Graph Pattern | Example Query |
|---|---|---|---|
| Supply Chain | Track product provenance | Linear paths with timestamps | Path from raw material to finished product |
| Healthcare | Patient journey mapping | Events connected by temporal and causal relationships | Treatment effectiveness paths |
| Pharma | Drug interaction networks | Molecules, pathways, side effects as nodes | Find contraindicated drug combinations |
| Telecom | Network topology | Physical/logical network connections | Shortest path for call routing |
| Logistics | Route optimization | Weighted graph for distances/costs | Optimal delivery sequence |
| Media | Content lineage | Assets, versions, derivatives | Track content licensing chain |
| Legal | Contract relationships | Parties, clauses, obligations | Find conflicting contractual obligations |
| HR | Org hierarchy + skills | Reports-to chains + skills graph | Find succession candidates |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// ========================================// SUPPLY CHAIN: Product Traceability// ======================================== // Track ingredient from farm to productMATCH path = (ingredient:Ingredient {batchId: $batchId}) -[:SOURCED_FROM]->(supplier:Supplier) -[:LOCATED_IN]->(region:Region)MATCH (ingredient)<-[:CONTAINS]-(product:Product)-[:SOLD_AT]->(store:Store)RETURN ingredient.name, supplier.name, region.name, product.name, store.location // Find all products affected by supplier recallMATCH (recalled:Supplier {id: $recalledSupplierId})<-[:SOURCED_FROM*1..3]-(material)MATCH (material)<-[:CONTAINS*]-(product:Product)RETURN DISTINCT product.sku, product.name // ========================================// HEALTHCARE: Clinical Pathways// ======================================== // Patient treatment journeyMATCH path = (patient:Patient {id: $patientId}) -[:HAD_VISIT]->(visit:Visit)-[:DIAGNOSED_WITH]->(condition:Condition)MATCH (visit)-[:PRESCRIBED]->(treatment:Treatment)RETURN visit.date, condition.name, treatment.nameORDER BY visit.date // Treatment effectiveness by outcome pathsMATCH (treatment:Treatment {name: "Treatment A"}) <-[:PRESCRIBED]-(visit:Visit)<-[:HAD_VISIT]-(patient:Patient) -[:HAD_OUTCOME]->(outcome:Outcome)RETURN outcome.name, count(DISTINCT patient) AS patientsORDER BY patients DESC // ========================================// LOGISTICS: Route Optimization// ======================================== // Find optimal delivery route (TSP approximation)MATCH (depot:Location {type: "WAREHOUSE"})MATCH (stop:Location) WHERE stop.id IN $deliveryStopsWITH depot, collect(stop) AS stopsCALL gds.shortestPath.aStar.mutate('roadNetwork', { sourceNode: depot, relationshipWeightProperty: 'distance'})// Additional logic for multi-stop optimization...RETURN *Ask yourself: 1) Do the key questions involve relationships between entities? 2) Are queries recursive or multi-hop? 3) Does performance degrade as relationships increase in relational models? 4) Is the schema connection-heavy (many join tables)? If yes to 2+ of these, consider graphs.
Graph databases are powerful but not universal. Understanding anti-patterns is equally important as understanding use cases.
| Scenario | Best Choice | Reasoning |
|---|---|---|
| User sessions, caching | Key-Value (Redis) | Simple lookup by key, no relationships |
| Product catalog with nested data | Document (MongoDB) | Hierarchical data, single-entity queries |
| Financial transactions, ACID compliance | Relational (PostgreSQL) | Strong consistency, complex transactions |
| Log analysis, time-series | Columnar (ClickHouse) | Append-only, aggregate queries |
| Social network, fraud detection | Graph (Neo4j) | Relationship traversal, pattern matching |
| Full-text search | Search Engine (Elastic) | Inverted indexes, relevance scoring |
| Multi-model requirements | Multi-Model (ArangoDB) | Combines document + graph + key-value |
Modern architectures commonly use multiple databases for different workloads. A single application might use: PostgreSQL for transactions, Redis for sessions, Elasticsearch for search, and Neo4j for recommendations. Choose each tool for its strengths rather than forcing one tool to do everything.
We've explored the domains where graph databases provide transformative advantages. Let's consolidate the key insights:
Module Complete:
You have now comprehensively explored graph databases—from the fundamental property graph model through Neo4j implementation, advanced query techniques, and real-world use cases. You understand when graph databases provide decisive advantages and when alternative solutions are more appropriate.
Graph databases represent a paradigm shift for connected data—one that enables previously impractical analyses and unlocks new business capabilities. As data connections become increasingly central to modern applications, graph thinking will only grow in importance.
Congratulations! You've mastered graph databases—the property graph model, nodes and edges, Neo4j and Cypher, advanced query patterns, and use cases from social networks to fraud detection. You can now evaluate when graph databases provide advantages and implement graph-based solutions for connected data problems.