Graph Databases - Learning Module

Loading content...

0/241

Use Cases

Where Graphs Transform the Impossible into Trivial

Graph databases aren't universal solutions—they're specialized tools that excel when data is inherently connected and relationships drive business value. The difference between "possible with difficulty" and "trivially elegant" becomes apparent in specific domains where the shape of your questions matches the shape of graphs.

Consider: LinkedIn computes professional network insights across 900+ million members. Facebook surfaces relevant content from billions of posts through social graph analysis. Financial institutions detect fraud by identifying suspicious transaction patterns in real-time. Each of these would be prohibitively expensive with non-graph approaches.

This page examines where graph databases provide decisive advantages—and equally importantly, where they don't.

What You Will Learn

By the end of this page, you will understand graph database use cases across major domains—social networks, fraud detection, recommendation engines, knowledge graphs, and network/IT operations—with concrete examples, query patterns, architecture considerations, and guidance on when to choose graphs over alternatives.

Social Networks

Social networks are the canonical graph use case. The data is inherently a graph—users are nodes, relationships (follows, friends, blocks) are edges—and core features require graph traversal:

Friend suggestions require traversing connection paths
Feed ranking considers social proximity and engagement patterns
Connection degree ("2nd-degree connection") is a path length
Mutual friends are intersection of traversed neighbor sets
Influence analysis is centrality computation

social-network-patterns.cypher

Cypher

// ========================================
// FRIEND SUGGESTIONS (People You May Know)
// ========================================
 
// Friends of friends, weighted by mutual connections
MATCH (me:User {id: $userId})-[:FRIEND]->(friend:User)-[:FRIEND]->(suggested:User)
WHERE NOT (me)-[:FRIEND]-(suggested)
  AND me <> suggested
  AND NOT (me)-[:BLOCKED]-(suggested)
WITH suggested, count(DISTINCT friend) AS mutualFriends,
     collect(friend.name)[0..3] AS sampleMutuals
WHERE mutualFriends >= 2
RETURN suggested.id, suggested.name, 
       mutualFriends, sampleMutuals
ORDER BY mutualFriends DESC
LIMIT 20
 
// Enhanced: include workplace/school overlap
MATCH (me:User {id: $userId})
OPTIONAL MATCH (me)-[:WORKS_AT]->(company:Company)<-[:WORKS_AT]-(coworker:User)
OPTIONAL MATCH (me)-[:STUDIED_AT]->(school:School)<-[:STUDIED_AT]-(classmate:User)
OPTIONAL MATCH (me)-[:FRIEND]->(:User)-[:FRIEND]->(foaf:User)
WHERE NOT (me)-[:FRIEND]-(coworker) AND NOT (me)-[:FRIEND]-(classmate)
WITH me, 
     collect(DISTINCT {user: coworker, source: "work"}) + 
     collect(DISTINCT {user: classmate, source: "school"}) + 
     collect(DISTINCT {user: foaf, source: "mutual"}) AS suggestions
UNWIND suggestions AS s
WHERE s.user IS NOT NULL AND s.user <> me
RETURN s.user.name, collect(s.source) AS connectionTypes,
       size(collect(s.source)) AS connectionStrength
ORDER BY connectionStrength DESC
LIMIT 15
 
// ========================================
// CONNECTION DEGREE
// ========================================
 
// Find connection path between two users
MATCH path = shortestPath(
    (me:User {id: $myId})-[:FRIEND*..6]-(target:User {id: $targetId})
)
RETURN 
    CASE length(path)
        WHEN 1 THEN "1st degree (direct connection)"
        WHEN 2 THEN "2nd degree"
        WHEN 3 THEN "3rd degree"
        ELSE toString(length(path)) + " degrees away"
    END AS connectionDegree,
    [n IN nodes(path) | n.name] AS connectionPath
 
// ========================================
// FEED RANKING (Social Proximity)
// ========================================
 
// Posts from network, scored by social distance
MATCH (me:User {id: $userId})-[:FRIEND*1..2]-(author:User)-[:POSTED]->(post:Post)
WHERE post.createdAt > datetime() - duration('P7D')
WITH post, author,
     CASE size((me)-[:FRIEND*1..1]-(author)) 
         WHEN 1 THEN 1.0   // Direct friend
         ELSE 0.5         // Friend of friend
     END AS socialScore,
     duration.between(post.createdAt, datetime()).hours AS hoursAgo
WITH post, author, socialScore,
     1.0 / (1.0 + toFloat(hoursAgo) / 24) AS recencyScore,
     toFloat(post.likes + post.comments * 2) / 100 AS engagementScore
RETURN post.id, author.name, post.content,
       socialScore * 0.4 + recencyScore * 0.3 + engagementScore * 0.3 AS feedScore
ORDER BY feedScore DESC
LIMIT 50
 
// ========================================
// INFLUENCE ANALYSIS
// ========================================
 
// Identify influencers via follower analysis
MATCH (user:User)
WITH user, 
     size((user)<-[:FOLLOWS]-()) AS followers,
     size((user)-[:FOLLOWS]->()) AS following
WHERE followers > 10000
RETURN user.name, followers, following,
       toFloat(followers) / CASE following WHEN 0 THEN 1 ELSE following END AS influenceRatio
ORDER BY followers DESC
LIMIT 50

Scale Considerations for Social Networks

At major social network scale (billions of users), even graph databases require sharding and caching. Common patterns: 1) Cache frequently accessed friendship lists, 2) Pre-compute friend suggestions offline, update periodically, 3) Shard by user ID with cross-shard traversal for inter-shard connections, 4) Use read replicas to scale query throughput.

Fraud Detection

Fraud detection is where graph databases provide their most dramatic advantage. Fraudsters operate in networks—sharing accounts, devices, addresses, and payment methods. Patterns invisible in tabular data become obvious in graph form.

Why Graphs Excel at Fraud:

Fraud rings share resources (addresses, devices, IPs)
Money laundering creates circular transaction paths
Synthetic identity fraud exhibits unusual entity linkages
Graph algorithms detect anomalous connectivity patterns
Real-time traversal catches fraud before transactions complete

fraud-detection.cypher

Cypher

// ========================================
// FRAUD RING DETECTION
// ========================================
 
// Find accounts sharing multiple identifiers (classic ring pattern)
MATCH (a1:Account)-[:USES]->(shared)<-[:USES]-(a2:Account)
WHERE a1 <> a2
  AND (shared:Device OR shared:IPAddress OR shared:Phone OR shared:Address)
WITH a1, a2, count(DISTINCT shared) AS sharedIdentifiers,
     collect(labels(shared)[0]) AS identifierTypes
WHERE sharedIdentifiers >= 2
RETURN a1.id, a2.id, sharedIdentifiers, identifierTypes
ORDER BY sharedIdentifiers DESC
 
// Extended: find connected fraud communities
MATCH (suspicious:Account)-[:USES*1..2]-(connected:Account)
WHERE suspicious.riskScore > 80
WITH suspicious, connected, 
     shortestPath((suspicious)-[:USES*]-(connected)) AS path
WHERE length(path) <= 3 AND suspicious <> connected
RETURN suspicious.id AS suspiciousAccount,
       collect(DISTINCT connected.id) AS connectedAccounts,
       size(collect(DISTINCT connected.id)) AS networkSize
ORDER BY networkSize DESC
 
// ========================================
// MONEY LAUNDERING - CIRCULAR TRANSFERS
// ========================================
 
// Detect circular money flows (structuring)
MATCH cycle = (origin:Account)-[:TRANSFERRED*3..8]->(origin)
WHERE ALL(r IN relationships(cycle) 
          WHERE r.timestamp > datetime() - duration('P1D')
          AND r.amount < 10000)  // Under reporting threshold
WITH origin, cycle,
     REDUCE(total = 0, r IN relationships(cycle) | total + r.amount) AS totalFlow,
     length(cycle) AS hops
WHERE totalFlow > 50000  // Significant total despite small individual transfers
RETURN origin.id,
       [n IN nodes(cycle) | n.id] AS flowPath,
       totalFlow, hops
 
// Rapid pass-through (layering)
MATCH (source:Account)-[t1:TRANSFERRED]->(intermediary:Account)-[t2:TRANSFERRED]->(dest:Account)
WHERE t1.timestamp > datetime() - duration('PT1H')
  AND t2.timestamp > t1.timestamp
  AND duration.between(t1.timestamp, t2.timestamp).minutes < 30
  AND t1.amount > 5000
  AND abs(t1.amount - t2.amount) < 100  // Nearly same amount
  AND NOT (source)-[:NORMAL_BUSINESS_WITH]-(dest)
RETURN source.id AS sourceAccount,
       intermediary.id AS layeringAccount,
       dest.id AS destinationAccount,
       t1.amount, t2.amount
 
// ========================================
// SYNTHETIC IDENTITY DETECTION
// ========================================
 
// Identities with unusual attribute sharing patterns
MATCH (identity:Person)
OPTIONAL MATCH (identity)-[:HAS_SSN]->(ssn:SSN)<-[:HAS_SSN]-(other1:Person)
OPTIONAL MATCH (identity)-[:HAS_ADDRESS]->(addr:Address)<-[:HAS_ADDRESS]-(other2:Person)
OPTIONAL MATCH (identity)-[:HAS_PHONE]->(phone:Phone)<-[:HAS_PHONE]-(other3:Person)
WITH identity,
     count(DISTINCT other1) AS ssnSharing,
     count(DISTINCT other2) AS addressSharing,
     count(DISTINCT other3) AS phoneSharing,
     identity.accountAge AS accountAge
WHERE ssnSharing > 0 
   OR (addressSharing > 3 AND accountAge < 365)
   OR (phoneSharing > 5)
RETURN identity.id,
       ssnSharing, addressSharing, phoneSharing, accountAge,
       ssnSharing * 10 + addressSharing * 3 + phoneSharing * 2 AS riskScore
ORDER BY riskScore DESC
 
// ========================================
// REAL-TIME TRANSACTION SCORING
// ========================================
 
// Score transaction at payment time
WITH $transactionData AS txn
MATCH (payer:Account {id: txn.payerId})
MATCH (payee:Account {id: txn.payeeId})
 
// Check for shared identifiers with flagged accounts
OPTIONAL MATCH (payer)-[:USES*1..2]-(flagged:Account {status: "FLAGGED"})
WITH payer, payee, txn, count(DISTINCT flagged) AS flaggedConnections
 
// Check payee's transaction pattern
OPTIONAL MATCH (payee)<-[recent:TRANSFERRED]-(others:Account)
WHERE recent.timestamp > datetime() - duration('PT1H')
WITH payer, payee, txn, flaggedConnections,
     count(DISTINCT others) AS recentUniquePayers
 
// Scoring
RETURN txn.id AS transactionId,
       CASE 
           WHEN flaggedConnections > 0 THEN "BLOCK"
           WHEN recentUniquePayers > 20 AND txn.amount > 1000 THEN "REVIEW"
           WHEN flaggedConnections = 0 AND recentUniquePayers < 5 THEN "APPROVE"
           ELSE "REVIEW"
       END AS decision,
       flaggedConnections,
       recentUniquePayers

Graph vs. Relational for Fraud Detection
Fraud Pattern	Relational Approach	Graph Approach	Advantage
Shared device detection	Complex JOINs, slow at scale	2-hop traversal, milliseconds	100-1000x faster
Circular transactions	Recursive CTEs, often timeout	Native cycle detection	Makes possible what was impractical
Ring detection	Multiple self-joins	Community detection algorithm	Algorithmic scalability
Real-time scoring	Multiple queries, aggregation	Single traversal query	Low latency at transaction time

Hybrid Fraud Systems

Production fraud systems typically combine: 1) Rule engines for known patterns, 2) ML models for transaction scoring, 3) Graph databases for relationship analysis. The graph layer often runs in parallel with ML scoring, and decisions combine signals from both. Graph excels at detecting novel fraud patterns that rule systems miss.

Recommendation Engines

Recommendations are inherently graph problems: users connect to items they've purchased, rated, or viewed; items connect to categories, attributes, and other items. The question "What should I recommend?" becomes "What paths lead from this user to items they might like?"

Graph-Based Recommendation Approaches:

Collaborative Filtering: Users who bought X also bought Y
Content-Based: Items similar to what user liked
Hybrid: Combining social + content + behavioral signals
Session-Based: Current browsing path predicts next action

recommendations.cypher

Cypher

// ========================================
// COLLABORATIVE FILTERING
// ========================================
 
// "Customers who bought this also bought..."
MATCH (p:Product {id: $productId})<-[:PURCHASED]-(c:Customer)-[:PURCHASED]->(other:Product)
WHERE other <> p
  AND NOT other.discontinued = true
WITH other, count(DISTINCT c) AS coPurchaseCount
ORDER BY coPurchaseCount DESC
LIMIT 10
RETURN other.id, other.name, coPurchaseCount
 
// User-based: find similar users, recommend their favorites
MATCH (me:Customer {id: $customerId})-[:PURCHASED]->(myProducts:Product)
WITH me, collect(myProducts) AS myPurchases
MATCH (similar:Customer)-[:PURCHASED]->(p:Product)
WHERE p IN myPurchases AND similar <> me
WITH me, similar, myPurchases, count(p) AS overlapCount
WHERE overlapCount >= 3  // Minimum overlap for similarity
ORDER BY overlapCount DESC
LIMIT 10
WITH me, myPurchases, collect(similar) AS topSimilarUsers
MATCH (s)-[:PURCHASED]->(rec:Product)
WHERE s IN topSimilarUsers 
  AND NOT rec IN myPurchases
RETURN rec.name, count(DISTINCT s) AS recommendedBy
ORDER BY recommendedBy DESC
LIMIT 20
 
// ========================================
// CONTENT-BASED FILTERING
// ========================================
 
// Similar products by shared attributes
MATCH (target:Product {id: $productId})-[:HAS_ATTRIBUTE]->(attr:Attribute)
WITH target, collect(attr) AS targetAttrs
MATCH (other:Product)-[:HAS_ATTRIBUTE]->(a:Attribute)
WHERE other <> target AND a IN targetAttrs
WITH target, other, count(a) AS sharedAttrs, targetAttrs
WITH target, other, sharedAttrs, 
     toFloat(sharedAttrs) / size(targetAttrs) AS similarity
WHERE similarity > 0.5
RETURN other.name, similarity, sharedAttrs
ORDER BY similarity DESC
LIMIT 10
 
// Category + brand + price range similarity
MATCH (target:Product {id: $productId})
MATCH (similar:Product)
WHERE similar <> target
  AND similar.category = target.category
  AND abs(similar.price - target.price) / target.price < 0.3
OPTIONAL MATCH (target)-[:BY_BRAND]->(b:Brand)<-[:BY_BRAND]-(similar)
RETURN similar.name, similar.price,
       CASE WHEN b IS NOT NULL THEN 1.5 ELSE 1.0 END AS brandBoost
ORDER BY brandBoost DESC, abs(similar.price - target.price)
LIMIT 10
 
// ========================================
// HYBRID RECOMMENDATIONS
// ========================================
 
// Combining social + content + behavioral signals
MATCH (me:Customer {id: $customerId})
 
// Social: what are friends buying?
OPTIONAL MATCH (me)-[:FRIEND]-(friend:Customer)-[:PURCHASED]->(socialRec:Product)
WHERE NOT (me)-[:PURCHASED]->(socialRec)
WITH me, collect({product: socialRec, score: 1.0, source: "social"}) AS socialRecs
 
// Behavioral: based on browsing history
OPTIONAL MATCH (me)-[:VIEWED]->(viewed:Product)-[:SIMILAR_TO]->(behavRec:Product)
WHERE NOT (me)-[:PURCHASED]->(behavRec)
WITH me, socialRecs, 
     collect({product: behavRec, score: 0.8, source: "behavioral"}) AS behavRecs
 
// Content: based on past purchases
OPTIONAL MATCH (me)-[:PURCHASED]->(:Product)-[:IN_CATEGORY]->(cat:Category)<-[:IN_CATEGORY]-(contentRec:Product)
WHERE NOT (me)-[:PURCHASED]->(contentRec)
WITH me, socialRecs + behavRecs + 
     collect({product: contentRec, score: 0.6, source: "content"}) AS allRecs
UNWIND allRecs AS rec
WHERE rec.product IS NOT NULL
WITH rec.product AS product, 
     sum(rec.score) AS totalScore,
     collect(rec.source) AS sources
RETURN product.name, totalScore, sources
ORDER BY totalScore DESC
LIMIT 20
 
// ========================================
// SESSION-BASED RECOMMENDATIONS
// ========================================
 
// "Based on your current session..."
WITH $sessionProducts AS viewedIds
MATCH (viewed:Product) WHERE viewed.id IN viewedIds
 
// Find products commonly viewed together in other sessions
MATCH (session:Session)-[:CONTAINED]->(viewed),
      (session)-[:CONTAINED]->(nextItem:Product)
WHERE NOT nextItem.id IN viewedIds
WITH nextItem, count(DISTINCT session) AS coViewCount
ORDER BY coViewCount DESC
LIMIT 10
RETURN nextItem.name, coViewCount
 
// ========================================
// PRE-COMPUTED SIMILARITY
// ========================================
 
// For production: compute similarity offline, query in real-time
// Offline job creates SIMILAR_TO relationships:
MATCH (p1:Product), (p2:Product)
WHERE p1 <> p2
MATCH (p1)-[:IN_CATEGORY]->(c:Category)<-[:IN_CATEGORY]-(p2)
MATCH (p1)<-[:PURCHASED]-(buyer:Customer)-[:PURCHASED]->(p2)
WITH p1, p2, count(DISTINCT c) AS catOverlap, count(DISTINCT buyer) AS buyerOverlap
WHERE buyerOverlap >= 10
MERGE (p1)-[s:SIMILAR_TO]-(p2)
SET s.score = catOverlap * 0.3 + buyerOverlap * 0.7,
    s.computedAt = datetime()
 
// Real-time query uses pre-computed edges:
MATCH (p:Product {id: $productId})-[s:SIMILAR_TO]-(rec:Product)
RETURN rec.name, s.score
ORDER BY s.score DESC
LIMIT 10

Scaling Recommendations

Real-time collaborative filtering doesn't scale to billions of users. Production systems typically: 1) Pre-compute similarity matrices offline, 2) Store top-N similar items/users per entity, 3) Query uses simple lookups of pre-computed relationships, 4) Update periodically (hourly/daily) rather than real-time. Graph databases excel at the pre-computation phase and storing the resulting similarity graph.

Knowledge Graphs

Knowledge graphs represent structured knowledge as interconnected entities and relationships—powering search engines (Google Knowledge Graph), virtual assistants (Alexa, Siri), and enterprise data integration.

Knowledge Graph Components:

Entities: Real-world objects (people, places, concepts)
Relationships: Connections with semantic meaning
Attributes: Properties of entities
Types/Classes: Categorization hierarchy
Inferences: Derived knowledge from explicit facts

knowledge-graph.cypher

Cypher

// ========================================
// ENTITY MODELING
// ========================================
 
// Create knowledge graph entities with rich typing
CREATE (einstein:Person:Scientist:Physicist {
    name: "Albert Einstein",
    birthDate: date("1879-03-14"),
    birthPlace: "Ulm, German Empire",
    knownFor: ["Theory of Relativity", "E=mc²", "Photoelectric Effect"]
})
 
CREATE (relativity:Theory:PhysicsTheory {
    name: "Theory of Relativity",
    published: date("1905-09-26"),
    type: "Special Relativity"
})
 
CREATE (nobelPhysics:Award:NobelPrize {
    name: "Nobel Prize in Physics",
    year: 1921,
    category: "Physics"
})
 
CREATE (eth:Institution:University {
    name: "ETH Zurich",
    location: "Zurich, Switzerland"
})
 
// Create relationships with context
CREATE (einstein)-[:FORMULATED {year: 1905, context: "annus mirabilis papers"}]->(relativity)
CREATE (einstein)-[:RECEIVED {year: 1921, citation: "photoelectric effect"}]->(nobelPhysics)
CREATE (einstein)-[:STUDIED_AT {from: 1896, to: 1900, degree: "diploma"}]->(eth)
CREATE (einstein)-[:WORKED_AT {from: 1912, to: 1914, role: "Professor"}]->(eth)
 
// ========================================
// KNOWLEDGE QUERIES
// ========================================
 
// Question: "Who developed the Theory of Relativity?"
MATCH (p:Person)-[:FORMULATED]->(t:Theory {name: "Theory of Relativity"})
RETURN p.name
 
// Question: "What awards did Einstein receive?"
MATCH (p:Person {name: "Albert Einstein"})-[r:RECEIVED]->(a:Award)
RETURN a.name, r.year, r.citation
 
// Question: "What other theories were developed at ETH Zurich?"
MATCH (p:Person)-[:STUDIED_AT|WORKED_AT]->(:Institution {name: "ETH Zurich"})
MATCH (p)-[:FORMULATED]->(theory:Theory)
RETURN DISTINCT theory.name, collect(p.name) AS developers
 
// ========================================
// INFERENCE AND REASONING
// ========================================
 
// Transitive relationships: advisors chain
MATCH path = (student:Person)-[:ADVISED_BY*]->(ancestor:Person)
WHERE student.name = "Current PhD Student"
RETURN [n IN nodes(path) | n.name] AS academicLineage,
       length(path) AS generations
 
// Type inference: derive is_a relationships
MATCH (e:Entity)
WHERE e:Scientist
MATCH (e)-[:WORKS_IN]->(field:Field)
SET e:Researcher
RETURN e.name, labels(e)
 
// Conflict detection: same entity, different facts
MATCH (e:Entity)-[r1]->(value1),
      (e)-[r2]->(value2)
WHERE type(r1) = type(r2) 
  AND value1 <> value2
  AND type(r1) IN ["BORN_IN", "DIED_IN"]
RETURN e.name, type(r1), value1, value2
 
// ========================================
// ENTITY RESOLUTION
// ========================================
 
// Find duplicate entities (same person, different nodes)
MATCH (e1:Person), (e2:Person)
WHERE id(e1) < id(e2)
  AND (e1.name = e2.name 
       OR (e1.birthDate = e2.birthDate AND e1.birthPlace = e2.birthPlace))
RETURN e1.name, e2.name, 
       CASE WHEN e1.name = e2.name THEN "name_match" ELSE "attribute_match" END AS matchType
 
// Create SAME_AS relationships for linked entities
MATCH (e1:Person), (e2:Person)
WHERE e1.externalId = e2.wikiDataId
MERGE (e1)-[:SAME_AS]-(e2)
 
// Query through SAME_AS for unified view
MATCH (e:Person {name: "Marie Curie"})-[:SAME_AS*0..1]-(alias)
WITH collect(DISTINCT e) + collect(DISTINCT alias) AS allRepresentations
UNWIND allRepresentations AS entity
MATCH (entity)-[r]->(related)
RETURN type(r) AS relationship, collect(DISTINCT related.name) AS relatedEntities
 
// ========================================
// SEMANTIC SEARCH
// ========================================
 
// Find entities by relationship context
MATCH (p:Person)-[:WON]->(:Award)<-[:WON]-(peer:Person),
      (p)-[:WORKED_IN]->(field:Field)<-[:WORKED_IN]-(peer)
WHERE p.name = "Richard Feynman"
  AND p <> peer
RETURN peer.name, field.name AS sharedField
 
// Path-based similarity
MATCH path = (e1:Concept {name: "Machine Learning"})-[:RELATED_TO*1..3]-(e2:Concept)
RETURN e2.name, length(path) AS distance
ORDER BY distance
LIMIT 10

Knowledge Graph vs Traditional DB

Knowledge graphs differ fundamentally from traditional databases: they're designed for connecting diverse data sources with semantic relationships. Where a relational DB stores structured records, a KG stores facts that can be reasoned about. Common sources: structured databases, documents (via NLP entity extraction), external APIs, and manual curation.

Network and IT Operations

IT infrastructure is inherently a graph: servers connect to other servers, applications depend on services, users access resources through permission chains. Graph databases enable powerful capabilities for network management, impact analysis, and security.

IT Operations Graph Use Cases:

Dependency mapping: What depends on this service?
Impact analysis: If this fails, what's affected?
Root cause analysis: Trace errors through call chains
Security: Access paths, attack surface mapping
Configuration management: Version and relationship tracking

it-operations.cypher

Cypher

// ========================================
// DEPENDENCY MAPPING
// ========================================
 
// Model application dependencies
CREATE (frontend:Application {name: "Web Frontend", tier: "presentation"})
CREATE (api:Application {name: "API Gateway", tier: "application"})
CREATE (userSvc:Service {name: "UserService", tier: "service"})
CREATE (orderSvc:Service {name: "OrderService", tier: "service"})
CREATE (userDb:Database {name: "UserDB", type: "PostgreSQL"})
CREATE (orderDb:Database {name: "OrderDB", type: "PostgreSQL"})
CREATE (cache:Cache {name: "Redis", type: "Redis"})
 
CREATE (frontend)-[:CALLS]->(api)
CREATE (api)-[:CALLS]->(userSvc)
CREATE (api)-[:CALLS]->(orderSvc)
CREATE (userSvc)-[:USES]->(userDb)
CREATE (userSvc)-[:USES]->(cache)
CREATE (orderSvc)-[:USES]->(orderDb)
CREATE (orderSvc)-[:CALLS]->(userSvc)
 
// What does the frontend depend on (recursively)?
MATCH (frontend:Application {name: "Web Frontend"})-[:CALLS|USES*]->(dependency)
RETURN DISTINCT labels(dependency)[0] AS type, dependency.name AS name
 
// ========================================
// IMPACT ANALYSIS
// ========================================
 
// If UserDB goes down, what's affected?
MATCH (failedComponent {name: "UserDB"})
MATCH (failedComponent)<-[:USES|CALLS*]-(affected)
RETURN DISTINCT affected.name AS affectedComponent,
       labels(affected)[0] AS type,
       length(shortestPath((affected)-[:USES|CALLS*]->(failedComponent))) AS dependencyDepth
ORDER BY dependencyDepth
 
// Blast radius: count of affected components by tier
MATCH (failed:Database {name: "UserDB"})
MATCH (failed)<-[:USES|CALLS*]-(affected)
RETURN affected.tier AS tier, count(DISTINCT affected) AS affectedCount
ORDER BY affectedCount DESC
 
// Critical path: components with no redundancy
MATCH (client:Application)-[:CALLS]->(singleDep)-[:USES]->(resource),
      (singleDep)-[:USES]->(resource2)
WHERE NOT (client)-[:CALLS]->(:Service)<>singleDep-[:USES]->(resource)
RETURN client.name, singleDep.name AS singlePointOfFailure
 
// ========================================
// ROOT CAUSE ANALYSIS
// ========================================
 
// Trace error propagation path
WITH $errorServiceId AS errorOrigin
MATCH (origin {id: errorOrigin})
MATCH path = (origin)<-[:CALLS*1..10]-(caller)
WHERE ALL(node IN nodes(path) WHERE node.lastError IS NOT NULL)
RETURN [n IN nodes(path) | n.name] AS errorPropagationPath,
       [n IN nodes(path) | n.lastError] AS errors,
       origin.name AS rootCause
 
// Find correlated failures (likely common cause)
MATCH (f1:Component)-[:ERROR_AT {time: $errorTime}]->(:ErrorLog)
MATCH (f2:Component)-[:ERROR_AT {time: $errorTime}]->(:ErrorLog)
WHERE f1 <> f2
MATCH (f1)-[:DEPENDS_ON*1..3]->(common)<-[:DEPENDS_ON*1..3]-(f2)
RETURN common.name AS potentialRootCause,
       collect(DISTINCT f1.name) + collect(DISTINCT f2.name) AS affectedServices
 
// ========================================
// SECURITY - ACCESS PATH ANALYSIS
// ========================================
 
// Can this user access this resource? (RBAC path)
MATCH path = (user:User {email: $userEmail})-[:MEMBER_OF*0..3]->
             (:Group)-[:HAS_ROLE]->(:Role)-[:PERMITS]->
             (:Permission)-[:ON]->(resource:Resource {id: $resourceId})
RETURN count(path) > 0 AS hasAccess,
       [n IN nodes(path) | coalesce(n.name, n.email)] AS accessPath
 
// Attack surface: externally accessible paths to sensitive data
MATCH path = (external:ExternalEndpoint)-[:CONNECTS_TO*0..5]->(sensitive:DataStore)
WHERE sensitive.classification = "CONFIDENTIAL"
RETURN external.name AS entryPoint,
       [n IN nodes(path) | n.name] AS attackPath,
       length(path) AS hops
ORDER BY hops
 
// Find overprivileged users
MATCH (u:User)-[:MEMBER_OF*1..3]->(:Group)-[:HAS_ROLE]->(r:Role)-[:PERMITS]->(p:Permission)
WITH u, count(DISTINCT p) AS permCount, collect(p.name) AS permissions
WHERE permCount > 20
RETURN u.name, permCount, permissions[0..5] AS samplePermissions
ORDER BY permCount DESC
 
// ========================================
// CONFIGURATION DRIFT DETECTION
// ========================================
 
// Compare current vs baseline configuration
MATCH (current:Server)-[:HAS_CONFIG {version: "current"}]->(config:Config)
MATCH (baseline:Server)-[:HAS_CONFIG {version: "baseline"}]->(baseConfig:Config)
WHERE current.id = baseline.id
  AND config <> baseConfig
RETURN current.name AS server,
       config.setting AS currentSetting,
       baseConfig.setting AS baselineSetting,
       config.value <> baseConfig.value AS drifted

CMDB Graph Anti-Patterns

Common mistakes when building IT graphs: 1) Modeling everything as generic DEPENDS_ON (lose semantic value), 2) Not capturing relationship direction (A calls B ≠ B calls A), 3) Ignoring temporal aspects (configurations change), 4) Over-connecting (every microservice to every database creates noise). Be specific about relationship types and maintain historical snapshots.

Other Notable Use Cases

Beyond the major domains, graph databases solve diverse problems wherever connected data matters:

Additional Graph Database Use Cases
Domain	Use Case	Key Graph Pattern	Example Query
Supply Chain	Track product provenance	Linear paths with timestamps	Path from raw material to finished product
Healthcare	Patient journey mapping	Events connected by temporal and causal relationships	Treatment effectiveness paths
Pharma	Drug interaction networks	Molecules, pathways, side effects as nodes	Find contraindicated drug combinations
Telecom	Network topology	Physical/logical network connections	Shortest path for call routing
Logistics	Route optimization	Weighted graph for distances/costs	Optimal delivery sequence
Media	Content lineage	Assets, versions, derivatives	Track content licensing chain
Legal	Contract relationships	Parties, clauses, obligations	Find conflicting contractual obligations
HR	Org hierarchy + skills	Reports-to chains + skills graph	Find succession candidates

additional-use-cases.cypher

Cypher

// ========================================
// SUPPLY CHAIN: Product Traceability
// ========================================
 
// Track ingredient from farm to product
MATCH path = (ingredient:Ingredient {batchId: $batchId})
              -[:SOURCED_FROM]->(supplier:Supplier)
              -[:LOCATED_IN]->(region:Region)
MATCH (ingredient)<-[:CONTAINS]-(product:Product)-[:SOLD_AT]->(store:Store)
RETURN ingredient.name, supplier.name, region.name,
       product.name, store.location
 
// Find all products affected by supplier recall
MATCH (recalled:Supplier {id: $recalledSupplierId})<-[:SOURCED_FROM*1..3]-(material)
MATCH (material)<-[:CONTAINS*]-(product:Product)
RETURN DISTINCT product.sku, product.name
 
// ========================================
// HEALTHCARE: Clinical Pathways
// ========================================
 
// Patient treatment journey
MATCH path = (patient:Patient {id: $patientId})
              -[:HAD_VISIT]->(visit:Visit)-[:DIAGNOSED_WITH]->(condition:Condition)
MATCH (visit)-[:PRESCRIBED]->(treatment:Treatment)
RETURN visit.date, condition.name, treatment.name
ORDER BY visit.date
 
// Treatment effectiveness by outcome paths
MATCH (treatment:Treatment {name: "Treatment A"})
      <-[:PRESCRIBED]-(visit:Visit)<-[:HAD_VISIT]-(patient:Patient)
      -[:HAD_OUTCOME]->(outcome:Outcome)
RETURN outcome.name, count(DISTINCT patient) AS patients
ORDER BY patients DESC
 
// ========================================
// LOGISTICS: Route Optimization
// ========================================
 
// Find optimal delivery route (TSP approximation)
MATCH (depot:Location {type: "WAREHOUSE"})
MATCH (stop:Location) WHERE stop.id IN $deliveryStops
WITH depot, collect(stop) AS stops
CALL gds.shortestPath.aStar.mutate('roadNetwork', {
    sourceNode: depot,
    relationshipWeightProperty: 'distance'
})
// Additional logic for multi-stop optimization...
RETURN *

Identifying Graph-Friendly Problems

Ask yourself: 1) Do the key questions involve relationships between entities? 2) Are queries recursive or multi-hop? 3) Does performance degrade as relationships increase in relational models? 4) Is the schema connection-heavy (many join tables)? If yes to 2+ of these, consider graphs.

When NOT to Use Graph Databases

Graph databases are powerful but not universal. Understanding anti-patterns is equally important as understanding use cases.

Poor Fit for Graphs

•High-volume simple writes — Event logging, time-series data, click streams perform better in column stores or time-series DBs
•Aggregate-heavy analytics — Sum, average, counts across entire datasets favor columnar databases (Snowflake, BigQuery)
•Simple CRUD without relationships — If entities are independent, relational or document DBs suffice
•Blob/binary storage — Images, videos, documents belong in object stores
•Full-text search — While graphs can index text, Elasticsearch/Solr are purpose-built
•Sparse relationships — If most entities have 0-2 connections, graphs add overhead without benefit

Strong Fit for Graphs

•Relationship-centric queries — Questions about connections, paths, and patterns
•Multi-hop traversals — Queries crossing 2+ relationship boundaries
•Variable-length paths — Unknown depth explorations
•Pattern matching — Finding subgraph structures
•Graph algorithms — Centrality, community detection, similarity
•Dense interconnection — Entities with many diverse relationships

Database Selection Guide
Scenario	Best Choice	Reasoning
User sessions, caching	Key-Value (Redis)	Simple lookup by key, no relationships
Product catalog with nested data	Document (MongoDB)	Hierarchical data, single-entity queries
Financial transactions, ACID compliance	Relational (PostgreSQL)	Strong consistency, complex transactions
Log analysis, time-series	Columnar (ClickHouse)	Append-only, aggregate queries
Social network, fraud detection	Graph (Neo4j)	Relationship traversal, pattern matching
Full-text search	Search Engine (Elastic)	Inverted indexes, relevance scoring
Multi-model requirements	Multi-Model (ArangoDB)	Combines document + graph + key-value

Polyglot Persistence

Modern architectures commonly use multiple databases for different workloads. A single application might use: PostgreSQL for transactions, Redis for sessions, Elasticsearch for search, and Neo4j for recommendations. Choose each tool for its strengths rather than forcing one tool to do everything.

Summary: Graph Database Use Cases

We've explored the domains where graph databases provide transformative advantages. Let's consolidate the key insights:

Key Takeaways

•Social networks are natural graph territory — Friend suggestions, feed ranking, influence analysis all require multi-hop traversal that graphs handle elegantly.
•Fraud detection gains superpowers from graphs — Fraud rings, money laundering, and synthetic identities become visible patterns when viewed as connected structures.
•Recommendations thrive on connection data — Collaborative filtering, content similarity, and hybrid approaches leverage purchase/view graphs for personalization.
•Knowledge graphs capture semantic relationships — Entities, facts, and inferences form queryable knowledge bases powering search and AI applications.
•IT operations benefit from dependency graphs — Impact analysis, root cause detection, and security path analysis require traversing infrastructure connections.
•Not everything needs graphs — High-volume writes, aggregate analytics, and sparse-relationship data are better served by specialized databases.

Module Complete:

You have now comprehensively explored graph databases—from the fundamental property graph model through Neo4j implementation, advanced query techniques, and real-world use cases. You understand when graph databases provide decisive advantages and when alternative solutions are more appropriate.

Graph databases represent a paradigm shift for connected data—one that enables previously impractical analyses and unlocks new business capabilities. As data connections become increasingly central to modern applications, graph thinking will only grow in importance.

Module Complete

Congratulations! You've mastered graph databases—the property graph model, nodes and edges, Neo4j and Cypher, advanced query patterns, and use cases from social networks to fraud detection. You can now evaluate when graph databases provide advantages and implement graph-based solutions for connected data problems.

Use Cases

Where Graphs Transform the Impossible into Trivial

This page examines where graph databases provide decisive advantages—and equally importantly, where they don't.

What You Will Learn

Social Networks

Social networks are the canonical graph use case. The data is inherently a graph—users are nodes, relationships (follows, friends, blocks) are edges—and core features require graph traversal:

Friend suggestions require traversing connection paths
Feed ranking considers social proximity and engagement patterns
Connection degree ("2nd-degree connection") is a path length
Mutual friends are intersection of traversed neighbor sets
Influence analysis is centrality computation

social-network-patterns.cypher

Cypher

// ========================================
// FRIEND SUGGESTIONS (People You May Know)
// ========================================
 
// Friends of friends, weighted by mutual connections
MATCH (me:User {id: $userId})-[:FRIEND]->(friend:User)-[:FRIEND]->(suggested:User)
WHERE NOT (me)-[:FRIEND]-(suggested)
  AND me <> suggested
  AND NOT (me)-[:BLOCKED]-(suggested)
WITH suggested, count(DISTINCT friend) AS mutualFriends,
     collect(friend.name)[0..3] AS sampleMutuals
WHERE mutualFriends >= 2
RETURN suggested.id, suggested.name, 
       mutualFriends, sampleMutuals
ORDER BY mutualFriends DESC
LIMIT 20
 
// Enhanced: include workplace/school overlap
MATCH (me:User {id: $userId})
OPTIONAL MATCH (me)-[:WORKS_AT]->(company:Company)<-[:WORKS_AT]-(coworker:User)
OPTIONAL MATCH (me)-[:STUDIED_AT]->(school:School)<-[:STUDIED_AT]-(classmate:User)
OPTIONAL MATCH (me)-[:FRIEND]->(:User)-[:FRIEND]->(foaf:User)
WHERE NOT (me)-[:FRIEND]-(coworker) AND NOT (me)-[:FRIEND]-(classmate)
WITH me, 
     collect(DISTINCT {user: coworker, source: "work"}) + 
     collect(DISTINCT {user: classmate, source: "school"}) + 
     collect(DISTINCT {user: foaf, source: "mutual"}) AS suggestions
UNWIND suggestions AS s
WHERE s.user IS NOT NULL AND s.user <> me
RETURN s.user.name, collect(s.source) AS connectionTypes,
       size(collect(s.source)) AS connectionStrength
ORDER BY connectionStrength DESC
LIMIT 15
 
// ========================================
// CONNECTION DEGREE
// ========================================
 
// Find connection path between two users
MATCH path = shortestPath(
    (me:User {id: $myId})-[:FRIEND*..6]-(target:User {id: $targetId})
)
RETURN 
    CASE length(path)
        WHEN 1 THEN "1st degree (direct connection)"
        WHEN 2 THEN "2nd degree"
        WHEN 3 THEN "3rd degree"
        ELSE toString(length(path)) + " degrees away"
    END AS connectionDegree,
    [n IN nodes(path) | n.name] AS connectionPath
 
// ========================================
// FEED RANKING (Social Proximity)
// ========================================
 
// Posts from network, scored by social distance
MATCH (me:User {id: $userId})-[:FRIEND*1..2]-(author:User)-[:POSTED]->(post:Post)
WHERE post.createdAt > datetime() - duration('P7D')
WITH post, author,
     CASE size((me)-[:FRIEND*1..1]-(author)) 
         WHEN 1 THEN 1.0   // Direct friend
         ELSE 0.5         // Friend of friend
     END AS socialScore,
     duration.between(post.createdAt, datetime()).hours AS hoursAgo
WITH post, author, socialScore,
     1.0 / (1.0 + toFloat(hoursAgo) / 24) AS recencyScore,
     toFloat(post.likes + post.comments * 2) / 100 AS engagementScore
RETURN post.id, author.name, post.content,
       socialScore * 0.4 + recencyScore * 0.3 + engagementScore * 0.3 AS feedScore
ORDER BY feedScore DESC
LIMIT 50
 
// ========================================
// INFLUENCE ANALYSIS
// ========================================
 
// Identify influencers via follower analysis
MATCH (user:User)
WITH user, 
     size((user)<-[:FOLLOWS]-()) AS followers,
     size((user)-[:FOLLOWS]->()) AS following
WHERE followers > 10000
RETURN user.name, followers, following,
       toFloat(followers) / CASE following WHEN 0 THEN 1 ELSE following END AS influenceRatio
ORDER BY followers DESC
LIMIT 50

Scale Considerations for Social Networks

Fraud Detection

Why Graphs Excel at Fraud:

Fraud rings share resources (addresses, devices, IPs)
Money laundering creates circular transaction paths
Synthetic identity fraud exhibits unusual entity linkages
Graph algorithms detect anomalous connectivity patterns
Real-time traversal catches fraud before transactions complete

fraud-detection.cypher

Cypher

// ========================================
// FRAUD RING DETECTION
// ========================================
 
// Find accounts sharing multiple identifiers (classic ring pattern)
MATCH (a1:Account)-[:USES]->(shared)<-[:USES]-(a2:Account)
WHERE a1 <> a2
  AND (shared:Device OR shared:IPAddress OR shared:Phone OR shared:Address)
WITH a1, a2, count(DISTINCT shared) AS sharedIdentifiers,
     collect(labels(shared)[0]) AS identifierTypes
WHERE sharedIdentifiers >= 2
RETURN a1.id, a2.id, sharedIdentifiers, identifierTypes
ORDER BY sharedIdentifiers DESC
 
// Extended: find connected fraud communities
MATCH (suspicious:Account)-[:USES*1..2]-(connected:Account)
WHERE suspicious.riskScore > 80
WITH suspicious, connected, 
     shortestPath((suspicious)-[:USES*]-(connected)) AS path
WHERE length(path) <= 3 AND suspicious <> connected
RETURN suspicious.id AS suspiciousAccount,
       collect(DISTINCT connected.id) AS connectedAccounts,
       size(collect(DISTINCT connected.id)) AS networkSize
ORDER BY networkSize DESC
 
// ========================================
// MONEY LAUNDERING - CIRCULAR TRANSFERS
// ========================================
 
// Detect circular money flows (structuring)
MATCH cycle = (origin:Account)-[:TRANSFERRED*3..8]->(origin)
WHERE ALL(r IN relationships(cycle) 
          WHERE r.timestamp > datetime() - duration('P1D')
          AND r.amount < 10000)  // Under reporting threshold
WITH origin, cycle,
     REDUCE(total = 0, r IN relationships(cycle) | total + r.amount) AS totalFlow,
     length(cycle) AS hops
WHERE totalFlow > 50000  // Significant total despite small individual transfers
RETURN origin.id,
       [n IN nodes(cycle) | n.id] AS flowPath,
       totalFlow, hops
 
// Rapid pass-through (layering)
MATCH (source:Account)-[t1:TRANSFERRED]->(intermediary:Account)-[t2:TRANSFERRED]->(dest:Account)
WHERE t1.timestamp > datetime() - duration('PT1H')
  AND t2.timestamp > t1.timestamp
  AND duration.between(t1.timestamp, t2.timestamp).minutes < 30
  AND t1.amount > 5000
  AND abs(t1.amount - t2.amount) < 100  // Nearly same amount
  AND NOT (source)-[:NORMAL_BUSINESS_WITH]-(dest)
RETURN source.id AS sourceAccount,
       intermediary.id AS layeringAccount,
       dest.id AS destinationAccount,
       t1.amount, t2.amount
 
// ========================================
// SYNTHETIC IDENTITY DETECTION
// ========================================
 
// Identities with unusual attribute sharing patterns
MATCH (identity:Person)
OPTIONAL MATCH (identity)-[:HAS_SSN]->(ssn:SSN)<-[:HAS_SSN]-(other1:Person)
OPTIONAL MATCH (identity)-[:HAS_ADDRESS]->(addr:Address)<-[:HAS_ADDRESS]-(other2:Person)
OPTIONAL MATCH (identity)-[:HAS_PHONE]->(phone:Phone)<-[:HAS_PHONE]-(other3:Person)
WITH identity,
     count(DISTINCT other1) AS ssnSharing,
     count(DISTINCT other2) AS addressSharing,
     count(DISTINCT other3) AS phoneSharing,
     identity.accountAge AS accountAge
WHERE ssnSharing > 0 
   OR (addressSharing > 3 AND accountAge < 365)
   OR (phoneSharing > 5)
RETURN identity.id,
       ssnSharing, addressSharing, phoneSharing, accountAge,
       ssnSharing * 10 + addressSharing * 3 + phoneSharing * 2 AS riskScore
ORDER BY riskScore DESC
 
// ========================================
// REAL-TIME TRANSACTION SCORING
// ========================================
 
// Score transaction at payment time
WITH $transactionData AS txn
MATCH (payer:Account {id: txn.payerId})
MATCH (payee:Account {id: txn.payeeId})
 
// Check for shared identifiers with flagged accounts
OPTIONAL MATCH (payer)-[:USES*1..2]-(flagged:Account {status: "FLAGGED"})
WITH payer, payee, txn, count(DISTINCT flagged) AS flaggedConnections
 
// Check payee's transaction pattern
OPTIONAL MATCH (payee)<-[recent:TRANSFERRED]-(others:Account)
WHERE recent.timestamp > datetime() - duration('PT1H')
WITH payer, payee, txn, flaggedConnections,
     count(DISTINCT others) AS recentUniquePayers
 
// Scoring
RETURN txn.id AS transactionId,
       CASE 
           WHEN flaggedConnections > 0 THEN "BLOCK"
           WHEN recentUniquePayers > 20 AND txn.amount > 1000 THEN "REVIEW"
           WHEN flaggedConnections = 0 AND recentUniquePayers < 5 THEN "APPROVE"
           ELSE "REVIEW"
       END AS decision,
       flaggedConnections,
       recentUniquePayers

Graph vs. Relational for Fraud Detection
Fraud Pattern	Relational Approach	Graph Approach	Advantage
Shared device detection	Complex JOINs, slow at scale	2-hop traversal, milliseconds	100-1000x faster
Circular transactions	Recursive CTEs, often timeout	Native cycle detection	Makes possible what was impractical
Ring detection	Multiple self-joins	Community detection algorithm	Algorithmic scalability
Real-time scoring	Multiple queries, aggregation	Single traversal query	Low latency at transaction time

Hybrid Fraud Systems

Recommendation Engines

Graph-Based Recommendation Approaches:

Collaborative Filtering: Users who bought X also bought Y
Content-Based: Items similar to what user liked
Hybrid: Combining social + content + behavioral signals
Session-Based: Current browsing path predicts next action

recommendations.cypher

Cypher

// ========================================
// COLLABORATIVE FILTERING
// ========================================
 
// "Customers who bought this also bought..."
MATCH (p:Product {id: $productId})<-[:PURCHASED]-(c:Customer)-[:PURCHASED]->(other:Product)
WHERE other <> p
  AND NOT other.discontinued = true
WITH other, count(DISTINCT c) AS coPurchaseCount
ORDER BY coPurchaseCount DESC
LIMIT 10
RETURN other.id, other.name, coPurchaseCount
 
// User-based: find similar users, recommend their favorites
MATCH (me:Customer {id: $customerId})-[:PURCHASED]->(myProducts:Product)
WITH me, collect(myProducts) AS myPurchases
MATCH (similar:Customer)-[:PURCHASED]->(p:Product)
WHERE p IN myPurchases AND similar <> me
WITH me, similar, myPurchases, count(p) AS overlapCount
WHERE overlapCount >= 3  // Minimum overlap for similarity
ORDER BY overlapCount DESC
LIMIT 10
WITH me, myPurchases, collect(similar) AS topSimilarUsers
MATCH (s)-[:PURCHASED]->(rec:Product)
WHERE s IN topSimilarUsers 
  AND NOT rec IN myPurchases
RETURN rec.name, count(DISTINCT s) AS recommendedBy
ORDER BY recommendedBy DESC
LIMIT 20
 
// ========================================
// CONTENT-BASED FILTERING
// ========================================
 
// Similar products by shared attributes
MATCH (target:Product {id: $productId})-[:HAS_ATTRIBUTE]->(attr:Attribute)
WITH target, collect(attr) AS targetAttrs
MATCH (other:Product)-[:HAS_ATTRIBUTE]->(a:Attribute)
WHERE other <> target AND a IN targetAttrs
WITH target, other, count(a) AS sharedAttrs, targetAttrs
WITH target, other, sharedAttrs, 
     toFloat(sharedAttrs) / size(targetAttrs) AS similarity
WHERE similarity > 0.5
RETURN other.name, similarity, sharedAttrs
ORDER BY similarity DESC
LIMIT 10
 
// Category + brand + price range similarity
MATCH (target:Product {id: $productId})
MATCH (similar:Product)
WHERE similar <> target
  AND similar.category = target.category
  AND abs(similar.price - target.price) / target.price < 0.3
OPTIONAL MATCH (target)-[:BY_BRAND]->(b:Brand)<-[:BY_BRAND]-(similar)
RETURN similar.name, similar.price,
       CASE WHEN b IS NOT NULL THEN 1.5 ELSE 1.0 END AS brandBoost
ORDER BY brandBoost DESC, abs(similar.price - target.price)
LIMIT 10
 
// ========================================
// HYBRID RECOMMENDATIONS
// ========================================
 
// Combining social + content + behavioral signals
MATCH (me:Customer {id: $customerId})
 
// Social: what are friends buying?
OPTIONAL MATCH (me)-[:FRIEND]-(friend:Customer)-[:PURCHASED]->(socialRec:Product)
WHERE NOT (me)-[:PURCHASED]->(socialRec)
WITH me, collect({product: socialRec, score: 1.0, source: "social"}) AS socialRecs
 
// Behavioral: based on browsing history
OPTIONAL MATCH (me)-[:VIEWED]->(viewed:Product)-[:SIMILAR_TO]->(behavRec:Product)
WHERE NOT (me)-[:PURCHASED]->(behavRec)
WITH me, socialRecs, 
     collect({product: behavRec, score: 0.8, source: "behavioral"}) AS behavRecs
 
// Content: based on past purchases
OPTIONAL MATCH (me)-[:PURCHASED]->(:Product)-[:IN_CATEGORY]->(cat:Category)<-[:IN_CATEGORY]-(contentRec:Product)
WHERE NOT (me)-[:PURCHASED]->(contentRec)
WITH me, socialRecs + behavRecs + 
     collect({product: contentRec, score: 0.6, source: "content"}) AS allRecs
UNWIND allRecs AS rec
WHERE rec.product IS NOT NULL
WITH rec.product AS product, 
     sum(rec.score) AS totalScore,
     collect(rec.source) AS sources
RETURN product.name, totalScore, sources
ORDER BY totalScore DESC
LIMIT 20
 
// ========================================
// SESSION-BASED RECOMMENDATIONS
// ========================================
 
// "Based on your current session..."
WITH $sessionProducts AS viewedIds
MATCH (viewed:Product) WHERE viewed.id IN viewedIds
 
// Find products commonly viewed together in other sessions
MATCH (session:Session)-[:CONTAINED]->(viewed),
      (session)-[:CONTAINED]->(nextItem:Product)
WHERE NOT nextItem.id IN viewedIds
WITH nextItem, count(DISTINCT session) AS coViewCount
ORDER BY coViewCount DESC
LIMIT 10
RETURN nextItem.name, coViewCount
 
// ========================================
// PRE-COMPUTED SIMILARITY
// ========================================
 
// For production: compute similarity offline, query in real-time
// Offline job creates SIMILAR_TO relationships:
MATCH (p1:Product), (p2:Product)
WHERE p1 <> p2
MATCH (p1)-[:IN_CATEGORY]->(c:Category)<-[:IN_CATEGORY]-(p2)
MATCH (p1)<-[:PURCHASED]-(buyer:Customer)-[:PURCHASED]->(p2)
WITH p1, p2, count(DISTINCT c) AS catOverlap, count(DISTINCT buyer) AS buyerOverlap
WHERE buyerOverlap >= 10
MERGE (p1)-[s:SIMILAR_TO]-(p2)
SET s.score = catOverlap * 0.3 + buyerOverlap * 0.7,
    s.computedAt = datetime()
 
// Real-time query uses pre-computed edges:
MATCH (p:Product {id: $productId})-[s:SIMILAR_TO]-(rec:Product)
RETURN rec.name, s.score
ORDER BY s.score DESC
LIMIT 10

Scaling Recommendations

Knowledge Graphs

Knowledge Graph Components:

Entities: Real-world objects (people, places, concepts)
Relationships: Connections with semantic meaning
Attributes: Properties of entities
Types/Classes: Categorization hierarchy
Inferences: Derived knowledge from explicit facts

knowledge-graph.cypher

Cypher

// ========================================
// ENTITY MODELING
// ========================================
 
// Create knowledge graph entities with rich typing
CREATE (einstein:Person:Scientist:Physicist {
    name: "Albert Einstein",
    birthDate: date("1879-03-14"),
    birthPlace: "Ulm, German Empire",
    knownFor: ["Theory of Relativity", "E=mc²", "Photoelectric Effect"]
})
 
CREATE (relativity:Theory:PhysicsTheory {
    name: "Theory of Relativity",
    published: date("1905-09-26"),
    type: "Special Relativity"
})
 
CREATE (nobelPhysics:Award:NobelPrize {
    name: "Nobel Prize in Physics",
    year: 1921,
    category: "Physics"
})
 
CREATE (eth:Institution:University {
    name: "ETH Zurich",
    location: "Zurich, Switzerland"
})
 
// Create relationships with context
CREATE (einstein)-[:FORMULATED {year: 1905, context: "annus mirabilis papers"}]->(relativity)
CREATE (einstein)-[:RECEIVED {year: 1921, citation: "photoelectric effect"}]->(nobelPhysics)
CREATE (einstein)-[:STUDIED_AT {from: 1896, to: 1900, degree: "diploma"}]->(eth)
CREATE (einstein)-[:WORKED_AT {from: 1912, to: 1914, role: "Professor"}]->(eth)
 
// ========================================
// KNOWLEDGE QUERIES
// ========================================
 
// Question: "Who developed the Theory of Relativity?"
MATCH (p:Person)-[:FORMULATED]->(t:Theory {name: "Theory of Relativity"})
RETURN p.name
 
// Question: "What awards did Einstein receive?"
MATCH (p:Person {name: "Albert Einstein"})-[r:RECEIVED]->(a:Award)
RETURN a.name, r.year, r.citation
 
// Question: "What other theories were developed at ETH Zurich?"
MATCH (p:Person)-[:STUDIED_AT|WORKED_AT]->(:Institution {name: "ETH Zurich"})
MATCH (p)-[:FORMULATED]->(theory:Theory)
RETURN DISTINCT theory.name, collect(p.name) AS developers
 
// ========================================
// INFERENCE AND REASONING
// ========================================
 
// Transitive relationships: advisors chain
MATCH path = (student:Person)-[:ADVISED_BY*]->(ancestor:Person)
WHERE student.name = "Current PhD Student"
RETURN [n IN nodes(path) | n.name] AS academicLineage,
       length(path) AS generations
 
// Type inference: derive is_a relationships
MATCH (e:Entity)
WHERE e:Scientist
MATCH (e)-[:WORKS_IN]->(field:Field)
SET e:Researcher
RETURN e.name, labels(e)
 
// Conflict detection: same entity, different facts
MATCH (e:Entity)-[r1]->(value1),
      (e)-[r2]->(value2)
WHERE type(r1) = type(r2) 
  AND value1 <> value2
  AND type(r1) IN ["BORN_IN", "DIED_IN"]
RETURN e.name, type(r1), value1, value2
 
// ========================================
// ENTITY RESOLUTION
// ========================================
 
// Find duplicate entities (same person, different nodes)
MATCH (e1:Person), (e2:Person)
WHERE id(e1) < id(e2)
  AND (e1.name = e2.name 
       OR (e1.birthDate = e2.birthDate AND e1.birthPlace = e2.birthPlace))
RETURN e1.name, e2.name, 
       CASE WHEN e1.name = e2.name THEN "name_match" ELSE "attribute_match" END AS matchType
 
// Create SAME_AS relationships for linked entities
MATCH (e1:Person), (e2:Person)
WHERE e1.externalId = e2.wikiDataId
MERGE (e1)-[:SAME_AS]-(e2)
 
// Query through SAME_AS for unified view
MATCH (e:Person {name: "Marie Curie"})-[:SAME_AS*0..1]-(alias)
WITH collect(DISTINCT e) + collect(DISTINCT alias) AS allRepresentations
UNWIND allRepresentations AS entity
MATCH (entity)-[r]->(related)
RETURN type(r) AS relationship, collect(DISTINCT related.name) AS relatedEntities
 
// ========================================
// SEMANTIC SEARCH
// ========================================
 
// Find entities by relationship context
MATCH (p:Person)-[:WON]->(:Award)<-[:WON]-(peer:Person),
      (p)-[:WORKED_IN]->(field:Field)<-[:WORKED_IN]-(peer)
WHERE p.name = "Richard Feynman"
  AND p <> peer
RETURN peer.name, field.name AS sharedField
 
// Path-based similarity
MATCH path = (e1:Concept {name: "Machine Learning"})-[:RELATED_TO*1..3]-(e2:Concept)
RETURN e2.name, length(path) AS distance
ORDER BY distance
LIMIT 10

Knowledge Graph vs Traditional DB

Network and IT Operations

IT Operations Graph Use Cases:

Dependency mapping: What depends on this service?
Impact analysis: If this fails, what's affected?
Root cause analysis: Trace errors through call chains
Security: Access paths, attack surface mapping
Configuration management: Version and relationship tracking

it-operations.cypher

Cypher

// ========================================
// DEPENDENCY MAPPING
// ========================================
 
// Model application dependencies
CREATE (frontend:Application {name: "Web Frontend", tier: "presentation"})
CREATE (api:Application {name: "API Gateway", tier: "application"})
CREATE (userSvc:Service {name: "UserService", tier: "service"})
CREATE (orderSvc:Service {name: "OrderService", tier: "service"})
CREATE (userDb:Database {name: "UserDB", type: "PostgreSQL"})
CREATE (orderDb:Database {name: "OrderDB", type: "PostgreSQL"})
CREATE (cache:Cache {name: "Redis", type: "Redis"})
 
CREATE (frontend)-[:CALLS]->(api)
CREATE (api)-[:CALLS]->(userSvc)
CREATE (api)-[:CALLS]->(orderSvc)
CREATE (userSvc)-[:USES]->(userDb)
CREATE (userSvc)-[:USES]->(cache)
CREATE (orderSvc)-[:USES]->(orderDb)
CREATE (orderSvc)-[:CALLS]->(userSvc)
 
// What does the frontend depend on (recursively)?
MATCH (frontend:Application {name: "Web Frontend"})-[:CALLS|USES*]->(dependency)
RETURN DISTINCT labels(dependency)[0] AS type, dependency.name AS name
 
// ========================================
// IMPACT ANALYSIS
// ========================================
 
// If UserDB goes down, what's affected?
MATCH (failedComponent {name: "UserDB"})
MATCH (failedComponent)<-[:USES|CALLS*]-(affected)
RETURN DISTINCT affected.name AS affectedComponent,
       labels(affected)[0] AS type,
       length(shortestPath((affected)-[:USES|CALLS*]->(failedComponent))) AS dependencyDepth
ORDER BY dependencyDepth
 
// Blast radius: count of affected components by tier
MATCH (failed:Database {name: "UserDB"})
MATCH (failed)<-[:USES|CALLS*]-(affected)
RETURN affected.tier AS tier, count(DISTINCT affected) AS affectedCount
ORDER BY affectedCount DESC
 
// Critical path: components with no redundancy
MATCH (client:Application)-[:CALLS]->(singleDep)-[:USES]->(resource),
      (singleDep)-[:USES]->(resource2)
WHERE NOT (client)-[:CALLS]->(:Service)<>singleDep-[:USES]->(resource)
RETURN client.name, singleDep.name AS singlePointOfFailure
 
// ========================================
// ROOT CAUSE ANALYSIS
// ========================================
 
// Trace error propagation path
WITH $errorServiceId AS errorOrigin
MATCH (origin {id: errorOrigin})
MATCH path = (origin)<-[:CALLS*1..10]-(caller)
WHERE ALL(node IN nodes(path) WHERE node.lastError IS NOT NULL)
RETURN [n IN nodes(path) | n.name] AS errorPropagationPath,
       [n IN nodes(path) | n.lastError] AS errors,
       origin.name AS rootCause
 
// Find correlated failures (likely common cause)
MATCH (f1:Component)-[:ERROR_AT {time: $errorTime}]->(:ErrorLog)
MATCH (f2:Component)-[:ERROR_AT {time: $errorTime}]->(:ErrorLog)
WHERE f1 <> f2
MATCH (f1)-[:DEPENDS_ON*1..3]->(common)<-[:DEPENDS_ON*1..3]-(f2)
RETURN common.name AS potentialRootCause,
       collect(DISTINCT f1.name) + collect(DISTINCT f2.name) AS affectedServices
 
// ========================================
// SECURITY - ACCESS PATH ANALYSIS
// ========================================
 
// Can this user access this resource? (RBAC path)
MATCH path = (user:User {email: $userEmail})-[:MEMBER_OF*0..3]->
             (:Group)-[:HAS_ROLE]->(:Role)-[:PERMITS]->
             (:Permission)-[:ON]->(resource:Resource {id: $resourceId})
RETURN count(path) > 0 AS hasAccess,
       [n IN nodes(path) | coalesce(n.name, n.email)] AS accessPath
 
// Attack surface: externally accessible paths to sensitive data
MATCH path = (external:ExternalEndpoint)-[:CONNECTS_TO*0..5]->(sensitive:DataStore)
WHERE sensitive.classification = "CONFIDENTIAL"
RETURN external.name AS entryPoint,
       [n IN nodes(path) | n.name] AS attackPath,
       length(path) AS hops
ORDER BY hops
 
// Find overprivileged users
MATCH (u:User)-[:MEMBER_OF*1..3]->(:Group)-[:HAS_ROLE]->(r:Role)-[:PERMITS]->(p:Permission)
WITH u, count(DISTINCT p) AS permCount, collect(p.name) AS permissions
WHERE permCount > 20
RETURN u.name, permCount, permissions[0..5] AS samplePermissions
ORDER BY permCount DESC
 
// ========================================
// CONFIGURATION DRIFT DETECTION
// ========================================
 
// Compare current vs baseline configuration
MATCH (current:Server)-[:HAS_CONFIG {version: "current"}]->(config:Config)
MATCH (baseline:Server)-[:HAS_CONFIG {version: "baseline"}]->(baseConfig:Config)
WHERE current.id = baseline.id
  AND config <> baseConfig
RETURN current.name AS server,
       config.setting AS currentSetting,
       baseConfig.setting AS baselineSetting,
       config.value <> baseConfig.value AS drifted

CMDB Graph Anti-Patterns

Other Notable Use Cases

Beyond the major domains, graph databases solve diverse problems wherever connected data matters:

Additional Graph Database Use Cases
Domain	Use Case	Key Graph Pattern	Example Query
Supply Chain	Track product provenance	Linear paths with timestamps	Path from raw material to finished product
Healthcare	Patient journey mapping	Events connected by temporal and causal relationships	Treatment effectiveness paths
Pharma	Drug interaction networks	Molecules, pathways, side effects as nodes	Find contraindicated drug combinations
Telecom	Network topology	Physical/logical network connections	Shortest path for call routing
Logistics	Route optimization	Weighted graph for distances/costs	Optimal delivery sequence
Media	Content lineage	Assets, versions, derivatives	Track content licensing chain
Legal	Contract relationships	Parties, clauses, obligations	Find conflicting contractual obligations
HR	Org hierarchy + skills	Reports-to chains + skills graph	Find succession candidates

additional-use-cases.cypher

Cypher

// ========================================
// SUPPLY CHAIN: Product Traceability
// ========================================
 
// Track ingredient from farm to product
MATCH path = (ingredient:Ingredient {batchId: $batchId})
              -[:SOURCED_FROM]->(supplier:Supplier)
              -[:LOCATED_IN]->(region:Region)
MATCH (ingredient)<-[:CONTAINS]-(product:Product)-[:SOLD_AT]->(store:Store)
RETURN ingredient.name, supplier.name, region.name,
       product.name, store.location
 
// Find all products affected by supplier recall
MATCH (recalled:Supplier {id: $recalledSupplierId})<-[:SOURCED_FROM*1..3]-(material)
MATCH (material)<-[:CONTAINS*]-(product:Product)
RETURN DISTINCT product.sku, product.name
 
// ========================================
// HEALTHCARE: Clinical Pathways
// ========================================
 
// Patient treatment journey
MATCH path = (patient:Patient {id: $patientId})
              -[:HAD_VISIT]->(visit:Visit)-[:DIAGNOSED_WITH]->(condition:Condition)
MATCH (visit)-[:PRESCRIBED]->(treatment:Treatment)
RETURN visit.date, condition.name, treatment.name
ORDER BY visit.date
 
// Treatment effectiveness by outcome paths
MATCH (treatment:Treatment {name: "Treatment A"})
      <-[:PRESCRIBED]-(visit:Visit)<-[:HAD_VISIT]-(patient:Patient)
      -[:HAD_OUTCOME]->(outcome:Outcome)
RETURN outcome.name, count(DISTINCT patient) AS patients
ORDER BY patients DESC
 
// ========================================
// LOGISTICS: Route Optimization
// ========================================
 
// Find optimal delivery route (TSP approximation)
MATCH (depot:Location {type: "WAREHOUSE"})
MATCH (stop:Location) WHERE stop.id IN $deliveryStops
WITH depot, collect(stop) AS stops
CALL gds.shortestPath.aStar.mutate('roadNetwork', {
    sourceNode: depot,
    relationshipWeightProperty: 'distance'
})
// Additional logic for multi-stop optimization...
RETURN *

Identifying Graph-Friendly Problems

When NOT to Use Graph Databases

Graph databases are powerful but not universal. Understanding anti-patterns is equally important as understanding use cases.

Poor Fit for Graphs

•High-volume simple writes — Event logging, time-series data, click streams perform better in column stores or time-series DBs
•Aggregate-heavy analytics — Sum, average, counts across entire datasets favor columnar databases (Snowflake, BigQuery)
•Simple CRUD without relationships — If entities are independent, relational or document DBs suffice
•Blob/binary storage — Images, videos, documents belong in object stores
•Full-text search — While graphs can index text, Elasticsearch/Solr are purpose-built
•Sparse relationships — If most entities have 0-2 connections, graphs add overhead without benefit

Strong Fit for Graphs

•Relationship-centric queries — Questions about connections, paths, and patterns
•Multi-hop traversals — Queries crossing 2+ relationship boundaries
•Variable-length paths — Unknown depth explorations
•Pattern matching — Finding subgraph structures
•Graph algorithms — Centrality, community detection, similarity
•Dense interconnection — Entities with many diverse relationships

Database Selection Guide
Scenario	Best Choice	Reasoning
User sessions, caching	Key-Value (Redis)	Simple lookup by key, no relationships
Product catalog with nested data	Document (MongoDB)	Hierarchical data, single-entity queries
Financial transactions, ACID compliance	Relational (PostgreSQL)	Strong consistency, complex transactions
Log analysis, time-series	Columnar (ClickHouse)	Append-only, aggregate queries
Social network, fraud detection	Graph (Neo4j)	Relationship traversal, pattern matching
Full-text search	Search Engine (Elastic)	Inverted indexes, relevance scoring
Multi-model requirements	Multi-Model (ArangoDB)	Combines document + graph + key-value

Polyglot Persistence

Summary: Graph Database Use Cases

We've explored the domains where graph databases provide transformative advantages. Let's consolidate the key insights:

Key Takeaways

•Social networks are natural graph territory — Friend suggestions, feed ranking, influence analysis all require multi-hop traversal that graphs handle elegantly.
•Fraud detection gains superpowers from graphs — Fraud rings, money laundering, and synthetic identities become visible patterns when viewed as connected structures.
•Recommendations thrive on connection data — Collaborative filtering, content similarity, and hybrid approaches leverage purchase/view graphs for personalization.
•Knowledge graphs capture semantic relationships — Entities, facts, and inferences form queryable knowledge bases powering search and AI applications.
•IT operations benefit from dependency graphs — Impact analysis, root cause detection, and security path analysis require traversing infrastructure connections.
•Not everything needs graphs — High-volume writes, aggregate analytics, and sparse-relationship data are better served by specialized databases.

Module Complete:

Module Complete