Loading content...
Two application domains have propelled graph databases from academic curiosity to essential infrastructure: social networks and recommendation engines. These aren't niche applications—they power the features billions of people use daily:
Both domains share a common characteristic: the value lies in the connections, not just the entities. Users matter, but their friendships, follows, purchases, and ratings form the true data asset. This makes graph databases the natural fit.
This page explores social network and recommendation engine architectures in depth. You'll learn graph models for social features, feed algorithms, friend suggestion engines, collaborative filtering on graphs, content-based recommendations, and hybrid approaches. We'll examine real production patterns used by leading platforms.
A social network is, at its core, a graph of people connected through relationships. But real social platforms have evolved sophisticated models that capture nuanced interactions.
Core Entities (Nodes):
Core Relationships (Edges):
1234567891011121314151617181920212223242526
// User connections - various relationship types(user1:User)-[:FOLLOWS {since: datetime(), muted: false}]->(user2:User)(user1:User)-[:FRIENDS {since: datetime(), source: 'request'}]->(user2:User)(user1:User)-[:BLOCKED {since: datetime()}]->(user2:User)(user:User)-[:MEMBER_OF {role: 'admin', since: datetime()}]->(group:Group) // Content creation and engagement(user:User)-[:POSTED {at: datetime()}]->(post:Post)(user:User)-[:LIKED {at: datetime()}]->(post:Post)(user:User)-[:COMMENTED {at: datetime()}]->(comment:Comment)-[:ON]->(post:Post)(user:User)-[:SHARED {at: datetime(), via: 'story'}]->(post:Post)(user:User)-[:MENTIONED_IN]->(post:Post)(user:User)-[:TAGGED_IN]->(media:Media) // Content relationships(post:Post)-[:REPLY_TO]->(parentPost:Post) // Thread structure(post:Post)-[:QUOTE_OF]->(originalPost:Post)(post:Post)-[:TAGGED_WITH]->(hashtag:Hashtag)(post:Post)-[:AT_LOCATION]->(location:Location)(post:Post)-[:CONTAINS]->(media:Media) // User attributes(user:User)-[:WORKS_AT {title: 'Engineer', since: date()}]->(company:Company)(user:User)-[:STUDIED_AT {degree: 'BS CS', year: 2018}]->(school:School)(user:User)-[:LIVES_IN]->(city:City)(user:User)-[:INTERESTED_IN]->(topic:Topic)Symmetric vs. Asymmetric Relationships:
| Relationship | Symmetry | Example |
|---|---|---|
| FRIENDS | Symmetric | Facebook mutual friendship |
| FOLLOWS | Asymmetric | Twitter, Instagram |
| BLOCKS | Asymmetric | One-way block |
| KNOWS | Often symmetric | LinkedIn connections |
For symmetric relationships, you typically store both directions:
// When Alice friends Bob, create both directions
CREATE (alice)-[:FRIENDS {since: $date}]->(bob)
CREATE (bob)-[:FRIENDS {since: $date}]->(alice)
Or query bidirectionally:
// Match friends regardless of direction
MATCH (user:User {id: $userId})-[:FRIENDS]-(friend:User)
RETURN friend
Social platforms embed privacy controls into the graph model. Relationship properties track visibility (public/friends/custom). Queries must filter based on viewer permissions—a user's blocked list, privacy settings, and group memberships all affect what data can be returned.
"People You May Know" (PYMK) is one of the most impactful features in social platforms—LinkedIn reports it drives half of all connection requests. The core insight: people with mutual connections are likely to know each other in real life.
Basic Algorithm: Mutual Friends
The simplest PYMK algorithm counts mutual connections:
1234567891011121314151617181920212223242526272829
// Basic PYMK: friends of friends not already connectedMATCH (me:User {id: $userId})-[:FRIENDS]-(friend:User)-[:FRIENDS]-(suggestion:User)WHERE NOT (me)-[:FRIENDS]-(suggestion) AND NOT (me)-[:BLOCKED]-(suggestion) AND me <> suggestionRETURN suggestion.id, suggestion.name, suggestion.avatar, count(friend) AS mutual_count, collect(friend.name)[..3] AS sample_mutualsORDER BY mutual_count DESCLIMIT 20 // With quality scoringMATCH (me:User {id: $userId})-[:FRIENDS]-(friend:User)-[:FRIENDS]-(suggestion:User)WHERE NOT (me)-[:FRIENDS]-(suggestion) AND me <> suggestionWITH suggestion, count(friend) AS mutual_count, collect(friend) AS mutuals// Weight recent mutuals higherWITH suggestion, mutual_count, size([f IN mutuals WHERE f.lastActive > datetime() - duration('P7D')]) AS active_mutualsRETURN suggestion.id, suggestion.name, mutual_count, active_mutuals, // Combined score: mutuals + recency boost mutual_count + (active_mutuals * 0.5) AS scoreORDER BY score DESCLIMIT 20Enhanced Signals:
Production PYMK systems incorporate multiple signals beyond mutual friends:
1234567891011121314151617181920212223242526272829303132333435
// Combine multiple connection signalsMATCH (me:User {id: $userId}) // Signal 1: Mutual friends (strongest signal)OPTIONAL MATCH (me)-[:FRIENDS]-(mutual)-[:FRIENDS]-(s1:User)WHERE NOT (me)-[:FRIENDS]-(s1) AND me <> s1WITH me, s1, count(DISTINCT mutual) AS mutual_count // Signal 2: Same workplaceOPTIONAL MATCH (me)-[:WORKS_AT]->(company)<-[:WORKS_AT]-(s2:User)WHERE NOT (me)-[:FRIENDS]-(s2) AND me <> s2WITH me, collect(DISTINCT {user: s1, mutuals: mutual_count}) AS mutual_suggestions, collect(DISTINCT s2) AS coworkers // Signal 3: Same school/yearOPTIONAL MATCH (me)-[:STUDIED_AT]->(school)<-[:STUDIED_AT]-(s3:User)WHERE NOT (me)-[:FRIENDS]-(s3) AND me <> s3WITH me, mutual_suggestions, coworkers, collect(DISTINCT s3) AS classmates // Signal 4: Same groupsOPTIONAL MATCH (me)-[:MEMBER_OF]->(group)<-[:MEMBER_OF]-(s4:User)WHERE NOT (me)-[:FRIENDS]-(s4) AND me <> s4WITH me, mutual_suggestions, coworkers, classmates, collect(DISTINCT s4) AS group_members // Combine and score all signalsUNWIND mutual_suggestions AS msWITH ms.user AS suggestion, ms.mutuals * 10 AS mutual_score, // Highest weight CASE WHEN suggestion IN coworkers THEN 5 ELSE 0 END AS work_score, CASE WHEN suggestion IN classmates THEN 4 ELSE 0 END AS school_score, CASE WHEN suggestion IN group_members THEN 2 ELSE 0 END AS group_scoreRETURN suggestion.id, suggestion.name, mutual_score + work_score + school_score + group_score AS total_scoreORDER BY total_score DESCLIMIT 20Real-time PYMK computation is expensive for high-connection users. Production systems pre-compute suggestions periodically (hourly/daily), storing results in materialized views. Real-time queries fetch from cache, with background jobs refreshing the suggestions.
The social feed—showing users relevant content from their network—is the core product of most social platforms. Feed algorithms balance recency, relevance, and engagement to maximize user value.
Feed Types:
| Feed Type | Algorithm | Use Case |
|---|---|---|
| Chronological | Time-sorted | Twitter's "Latest" |
| Ranked | ML-based scoring | Facebook's main feed |
| Interest-based | Topic-weighted | TikTok's "For You" |
| Hybrid | Recency + ranking | Most modern platforms |
Basic Chronological Feed:
12345678910111213141516171819202122232425
// Simple time-ordered feed from followed accountsMATCH (me:User {id: $userId})-[:FOLLOWS]->(following:User) -[:POSTED]->(post:Post)WHERE post.createdAt > datetime() - duration('P7D') AND NOT (me)-[:BLOCKED]-(following) AND post.visibility = 'public' OR post.visibility = 'followers'RETURN post.id, post.content, post.createdAt, following.name AS author, following.avatar AS authorAvatarORDER BY post.createdAt DESCLIMIT 50 // With engagement counts (likes, comments, shares)MATCH (me:User {id: $userId})-[:FOLLOWS]->(author:User)-[:POSTED]->(post:Post)WHERE post.createdAt > datetime() - duration('P7D')OPTIONAL MATCH (post)<-[like:LIKED]-()OPTIONAL MATCH (post)<-[:ON]-(comment:Comment)OPTIONAL MATCH (post)<-[share:SHARED]-()RETURN post.id, post.content, post.createdAt, author.name, count(DISTINCT like) AS likes, count(DISTINCT comment) AS comments, count(DISTINCT share) AS sharesORDER BY post.createdAt DESCLIMIT 50Ranked Feed with Engagement Scoring:
Ranked feeds use multiple signals to surface "better" content:
12345678910111213141516171819202122232425262728293031323334353637
// Multi-factor ranked feedMATCH (me:User {id: $userId})-[follow:FOLLOWS]->(author:User)-[:POSTED]->(post:Post)WHERE post.createdAt > datetime() - duration('P3D') // Collect engagement metricsOPTIONAL MATCH (post)<-[:LIKED]-(liker)OPTIONAL MATCH (post)<-[:ON]-(comment)OPTIONAL MATCH (post)<-[:SHARED]-(sharer) WITH me, author, post, follow, count(DISTINCT liker) AS likes, count(DISTINCT comment) AS comments, count(DISTINCT sharer) AS shares // Calculate engagement scoreWITH me, author, post, follow, likes, comments, shares, likes + (comments * 2) + (shares * 3) AS engagement_score // Factor in relationship strengthOPTIONAL MATCH (me)-[interact:LIKED|COMMENTED|SHARED]->(:Post)<-[:POSTED]-(author)WITH me, author, post, engagement_score, count(interact) AS interaction_history // Factor in recency (decay over time)WITH post, author, engagement_score, interaction_history, duration.inSeconds(datetime(), post.createdAt).seconds AS age_seconds // Combined score: engagement + relationship - recency penaltyWITH post, author, engagement_score, interaction_history, age_seconds, engagement_score * 0.4 + interaction_history * 2 + (1.0 / (1 + age_seconds / 86400.0)) * 10 AS feed_score // Decay over 24h RETURN post.id, post.content, author.name, round(feed_score * 100) / 100 AS scoreORDER BY feed_score DESCLIMIT 50Diversification:
Pure engagement-ranked feeds create filter bubbles. Production systems inject diversity:
123456789101112131415
// Ensure variety: no more than 3 posts per author in top 20MATCH (me:User {id: $userId})-[:FOLLOWS]->(author:User)-[:POSTED]->(post:Post)WHERE post.createdAt > datetime() - duration('P3D') WITH author, postORDER BY post.engagementScore DESC // Collect posts per author, take top 3WITH author, collect(post)[..3] AS author_postsUNWIND author_posts AS post // Return diversified feedRETURN post.id, post.content, author.nameORDER BY post.engagementScore DESCLIMIT 50Production feed systems are far more complex—incorporating ML models trained on millions of interactions, A/B testing frameworks, real-time feature stores, and caching layers. The graph provides the relationship substrate; ML models provide personalized scoring.
Recommendation engines predict what users will like based on their behavior and the behavior of similar users. Graphs provide a natural model for these relationships.
Core Approaches:
| Approach | Description | Graph Pattern |
|---|---|---|
| Collaborative Filtering | Users with similar behavior like similar things | User-Item bipartite graph |
| Content-Based | Items similar to what you liked | Item-Feature graph |
| Knowledge-Based | Domain rules and constraints | Knowledge graph |
| Hybrid | Combine multiple approaches | Multi-type graph |
The Bipartite User-Item Graph:
Most recommendations start with a bipartite graph—users on one side, items on the other:
1234567891011121314151617181920
// Core interaction patterns(user:User)-[:VIEWED {at: datetime(), duration: 120}]->(item:Product)(user:User)-[:PURCHASED {at: datetime(), price: 49.99}]->(item:Product)(user:User)-[:RATED {score: 4.5, at: datetime()}]->(item:Movie)(user:User)-[:ADDED_TO_CART]->(item:Product)(user:User)-[:WISHLISTED]->(item:Product)(user:User)-[:REVIEWED {rating: 5, text: '...'}]->(item:Restaurant) // Item metadata(item:Product)-[:IN_CATEGORY]->(category:Category)(item:Product)-[:HAS_TAG]->(tag:Tag)(item:Movie)-[:HAS_GENRE]->(genre:Genre)(item:Movie)-[:STARRING]->(actor:Actor)(item:Movie)-[:DIRECTED_BY]->(director:Director)(item:Product)-[:MADE_BY]->(brand:Brand) // User preferences(user:User)-[:PREFERS]->(category:Category)(user:User)-[:FOLLOWS]->(brand:Brand)(user:User)-[:DISLIKES]->(genre:Genre)Implicit vs. Explicit Signals:
| Signal Type | Examples | Strength |
|---|---|---|
| Explicit | Ratings, reviews, likes | Strong but sparse |
| Implicit | Views, purchases, time spent | Weaker but abundant |
| Negative | Skip, hide, dislike | Important for filtering |
Production systems weight signals appropriately:
// Weighted interaction score
WITH user, item,
CASE type(r)
WHEN 'PURCHASED' THEN 10
WHEN 'RATED' THEN r.score * 2
WHEN 'ADDED_TO_CART' THEN 5
WHEN 'VIEWED' THEN 1
ELSE 0
END AS weight
New users and new items lack interaction history. Solutions include: onboarding questionnaires (explicit preferences), content-based fallbacks (item features), popularity-based defaults, and demographic clustering. Graphs help by enabling transitive inference through shared attributes.
Collaborative filtering (CF) is based on a powerful insight: users who agreed in the past will agree in the future. On graphs, this translates to finding paths through shared items or shared users.
User-Based Collaborative Filtering:
"Find users similar to me, recommend what they liked that I haven't seen."
12345678910111213141516171819202122232425262728293031323334353637383940
// Find users with similar purchase historyMATCH (me:User {id: $userId})-[:PURCHASED]->(shared:Product)<-[:PURCHASED]-(similar:User)WHERE me <> similarWITH similar, count(shared) AS shared_purchasesORDER BY shared_purchases DESCLIMIT 50 // Top 50 similar users // Get their purchases that I don't haveMATCH (similar)-[:PURCHASED]->(recommendation:Product)WHERE NOT (me)-[:PURCHASED]->(recommendation)WITH recommendation, count(similar) AS recommender_countRETURN recommendation.id, recommendation.name, recommender_countORDER BY recommender_count DESCLIMIT 20 // With weighted similarity (Jaccard)MATCH (me:User {id: $userId})-[:PURCHASED]->(myProducts:Product)WITH me, collect(myProducts) AS my_purchases MATCH (similar:User)-[:PURCHASED]->(theirProducts:Product)WHERE similar <> meWITH me, my_purchases, similar, collect(theirProducts) AS their_purchases // Jaccard similarity = intersection / unionWITH me, similar, size([p IN their_purchases WHERE p IN my_purchases]) AS intersection, size(my_purchases) + size(their_purchases) - size([p IN their_purchases WHERE p IN my_purchases]) AS unionWHERE intersection > 2 // Minimum overlapWITH me, similar, intersection * 1.0 / union AS jaccard_similarityORDER BY jaccard_similarity DESCLIMIT 30 // Get recommendations from similar usersMATCH (similar)-[:PURCHASED]->(rec:Product)WHERE NOT (me)-[:PURCHASED]->(rec)RETURN rec.id, rec.name, sum(jaccard_similarity) AS weighted_scoreORDER BY weighted_score DESCLIMIT 10Item-Based Collaborative Filtering:
"Find items similar to what I liked, recommend those."
Item-based CF is often preferred because item similarity is more stable than user similarity (items don't change behavior like users do).
12345678910111213141516171819202122232425262728293031323334353637
// Find items frequently co-purchased with items I boughtMATCH (me:User {id: $userId})-[:PURCHASED]->(myItem:Product) <-[:PURCHASED]-(other:User)-[:PURCHASED]->(coItem:Product)WHERE NOT (me)-[:PURCHASED]->(coItem) AND myItem <> coItem // Count co-purchase frequencyWITH coItem, count(DISTINCT other) AS co_purchase_countRETURN coItem.id, coItem.name, co_purchase_count AS frequently_bought_togetherORDER BY co_purchase_count DESCLIMIT 10 // With lift calculation (co-occurrence beyond random chance)MATCH (allUsers:User)-[:PURCHASED]->(item:Product)WITH item, count(DISTINCT allUsers) AS item_buyers MATCH (me:User {id: $userId})-[:PURCHASED]->(myItem:Product)MATCH (myItem)<-[:PURCHASED]-(copurchaser:User)-[:PURCHASED]->(rec:Product)WHERE NOT (me)-[:PURCHASED]->(rec) AND myItem <> rec WITH rec, myItem, count(DISTINCT copurchaser) AS co_buyersMATCH (anyBuyer:User)-[:PURCHASED]->(rec)WITH rec, myItem, co_buyers, count(DISTINCT anyBuyer) AS rec_total_buyersMATCH (anyBuyer2:User)-[:PURCHASED]->(myItem)WITH rec, co_buyers, rec_total_buyers, count(DISTINCT anyBuyer2) AS my_item_buyersMATCH (allUsers:User)WITH rec, co_buyers, rec_total_buyers, my_item_buyers, count(allUsers) AS total_users // Lift = P(A and B) / (P(A) * P(B))WITH rec, (co_buyers * 1.0 / total_users) / ((rec_total_buyers * 1.0 / total_users) * (my_item_buyers * 1.0 / total_users)) AS liftWHERE lift > 1.5 // Only items with positive liftRETURN rec.name, round(lift * 100) / 100 AS lift_scoreORDER BY lift_score DESCLIMIT 10Raw co-occurrence counts bias toward popular items—bestsellers co-occur with everything. Use lift, pointwise mutual information (PMI), or log-likelihood ratio to surface genuinely associated items, not just popular ones.
Content-based filtering recommends items with similar attributes to items a user has liked. Unlike collaborative filtering, it works for new items with no interaction history (solving the item cold-start problem).
The Item Feature Graph:
123456789101112131415161718192021222324
// Movie with features(movie:Movie { title: 'Inception', releaseYear: 2010, runtime: 148, budget: 160000000})(movie)-[:HAS_GENRE]->(genre:Genre {name: 'Sci-Fi'})(movie)-[:HAS_GENRE]->(genre2:Genre {name: 'Thriller'})(movie)-[:DIRECTED_BY]->(director:Person {name: 'Nolan'})(movie)-[:STARRING]->(actor:Person {name: 'DiCaprio'})(movie)-[:HAS_TAG]->(tag:Tag {name: 'mind-bending'})(movie)-[:FROM_STUDIO]->(studio:Studio {name: 'Warner Bros'}) // Product with features(product:Product { name: 'iPhone 15', price: 999, releaseDate: date('2023-09-22')})(product)-[:IN_CATEGORY]->(category:Category {name: 'Smartphones'})(product)-[:MADE_BY]->(brand:Brand {name: 'Apple'})(product)-[:HAS_FEATURE]->(feature:Feature {name: 'Face ID'})(product)-[:HAS_SPEC {value: '48MP'}]->(spec:Spec {name: 'Camera'})12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
// Find movies similar to ones I've liked based on shared featuresMATCH (me:User {id: $userId})-[r:RATED]->(liked:Movie)WHERE r.score >= 4.0 // Extract features from liked moviesMATCH (liked)-[:HAS_GENRE]->(genre:Genre)MATCH (liked)-[:DIRECTED_BY]->(director:Person)MATCH (liked)-[:STARRING]->(actor:Person)WITH me, collect(DISTINCT genre) AS preferred_genres, collect(DISTINCT director) AS preferred_directors, collect(DISTINCT actor) AS preferred_actors, collect(DISTINCT liked) AS already_seen // Find movies matching those featuresMATCH (rec:Movie)WHERE NOT rec IN already_seenOPTIONAL MATCH (rec)-[:HAS_GENRE]->(g:Genre) WHERE g IN preferred_genresOPTIONAL MATCH (rec)-[:DIRECTED_BY]->(d:Person) WHERE d IN preferred_directorsOPTIONAL MATCH (rec)-[:STARRING]->(a:Person) WHERE a IN preferred_actors WITH rec, count(DISTINCT g) AS genre_matches, count(DISTINCT d) AS director_matches, count(DISTINCT a) AS actor_matches // Weighted feature matchingWITH rec, genre_matches * 2 + director_matches * 5 + actor_matches * 3 AS content_scoreWHERE content_score > 3RETURN rec.title, content_scoreORDER BY content_score DESCLIMIT 20 // Jaccard similarity on feature setsMATCH (me:User {id: $userId})-[:RATED {score: 5}]->(loved:Movie)MATCH (loved)-[:HAS_GENRE|DIRECTED_BY|STARRING]->(feature)WITH me, loved, collect(feature) AS loved_features MATCH (rec:Movie)WHERE rec <> lovedMATCH (rec)-[:HAS_GENRE|DIRECTED_BY|STARRING]->(rec_feature)WITH loved, rec, loved_features, collect(rec_feature) AS rec_features WITH rec, size([f IN rec_features WHERE f IN loved_features]) AS intersection, size(loved_features) + size(rec_features) AS union_approxWITH rec, intersection * 1.0 / (union_approx - intersection) AS jaccardWHERE jaccard > 0.3RETURN rec.title, round(jaccard * 100) / 100 AS similarityORDER BY similarity DESCLIMIT 10Content-based quality depends on feature richness. Netflix uses 1000s of microgenres ('Mind-bending Sci-Fi', 'Visually-striking Dramas'). Rich feature graphs enable nuanced similarity. Consider: explicit features, derived tags, and embeddings-as-nodes.
The most effective recommendation systems combine multiple approaches. Graphs naturally enable hybrid recommendations by connecting users, items, and features in a unified model.
Hybrid Pattern: Multi-Path Recommendations
1234567891011121314151617181920212223242526272829303132333435
// Combine collaborative and content-based signalsMATCH (me:User {id: $userId}) // Path 1: Collaborative - what similar users boughtOPTIONAL MATCH (me)-[:PURCHASED]->(shared:Product)<-[:PURCHASED]-(similar:User) -[:PURCHASED]->(collab_rec:Product)WHERE NOT (me)-[:PURCHASED]->(collab_rec)WITH me, collab_rec, count(DISTINCT similar) AS collab_score // Path 2: Content-based - similar to what I boughtOPTIONAL MATCH (me)-[:PURCHASED]->(bought:Product)-[:IN_CATEGORY]->(cat:Category) <-[:IN_CATEGORY]-(content_rec:Product)WHERE NOT (me)-[:PURCHASED]->(content_rec)WITH me, collab_rec, collab_score, content_rec, count(DISTINCT bought) AS content_score // Path 3: Brand affinity - from brands I buyOPTIONAL MATCH (me)-[:PURCHASED]->(:Product)-[:MADE_BY]->(brand:Brand) <-[:MADE_BY]-(brand_rec:Product)WHERE NOT (me)-[:PURCHASED]->(brand_rec)WITH me, collab_rec, collab_score, content_rec, content_score, brand_rec, count(*) AS brand_score // Combine all recommendationsWITH collect({item: collab_rec, score: collab_score * 3, source: 'collab'}) + collect({item: content_rec, score: content_score * 2, source: 'content'}) + collect({item: brand_rec, score: brand_score * 1.5, source: 'brand'}) AS all_recsUNWIND all_recs AS recWHERE rec.item IS NOT NULL // Aggregate scores per itemWITH rec.item AS item, sum(rec.score) AS total_score, collect(DISTINCT rec.source) AS sourcesRETURN item.name, total_score, sourcesORDER BY total_score DESCLIMIT 15Graph-Native: Leveraging Graph Algorithms
Graph algorithms provide powerful recommendation signals unavailable to other approaches:
123456789101112131415161718192021222324252627282930313233343536
// 1. Node Similarity (GDS): items with similar buyer profilesCALL gds.graph.project( 'purchaseGraph', ['User', 'Product'], {PURCHASED: {type: 'PURCHASED', orientation: 'NATURAL'}}) CALL gds.nodeSimilarity.stream('purchaseGraph', {topK: 10})YIELD node1, node2, similarityWHERE gds.util.asNode(node1):Product AND gds.util.asNode(node2):ProductRETURN gds.util.asNode(node1).name AS product1, gds.util.asNode(node2).name AS product2, similarity // 2. PageRank for item authority// High PageRank items are purchased by users who buy many things (tastemakers)CALL gds.pageRank.stream('purchaseGraph', { relationshipWeightProperty: null})YIELD nodeId, scoreWHERE gds.util.asNode(nodeId):ProductRETURN gds.util.asNode(nodeId).name AS product, score AS authorityORDER BY authority DESCLIMIT 20 // 3. Community detection for user segmentsCALL gds.louvain.stream('purchaseGraph')YIELD nodeId, communityIdWHERE gds.util.asNode(nodeId):UserWITH communityId, collect(gds.util.asNode(nodeId)) AS usersMATCH (u)-[:PURCHASED]->(popular:Product)WHERE u IN usersWITH communityId, popular, count(*) AS purchasesORDER BY communityId, purchases DESCRETURN communityId, collect(popular.name)[..5] AS segment_favoritesPre-compute expensive graph algorithms (PageRank, community detection) in batch. Store results as node properties. Real-time queries use these pre-computed features combined with live interaction data. This balance provides rich signals with low latency.
Deploying social and recommendation features at scale requires careful architecture beyond just graph queries.
Caching Strategy:
123456789101112131415161718192021
┌─────────────────────────────────────────────────────────────────┐│ REQUEST FLOW │└─────────────────────────────────┬───────────────────────────────┘ │ ┌─────────────────────────▼──────────────────────────┐ │ APPLICATION CACHE (Redis) │ │ Personalized feeds, PYMK results, user prefs │ │ TTL: 5-15 minutes for feeds, 1hr for PYMK │ └─────────────────────────┬──────────────────────────┘ │ Cache Miss ┌─────────────────────────▼──────────────────────────┐ │ PRE-COMPUTED RESULTS (Redis/DB) │ │ Batch-generated recommendations, similarities │ │ Refreshed: hourly for active users, daily others │ └─────────────────────────┬──────────────────────────┘ │ Not Pre-computed ┌─────────────────────────▼──────────────────────────┐ │ REAL-TIME GRAPH QUERY │ │ Neo4j with bounded traversals, timeouts │ │ Fallback: popularity-based defaults │ └─────────────────────────────────────────────────────┘Handling High-Degree Nodes:
Celebrities with millions of followers ("superconnectors") break naive graph algorithms. Solutions:
LIMIT mid-query)12345678910111213141516171819
// Skip high-degree nodes in traversalMATCH (me:User {id: $userId})-[:FOLLOWS]->(following:User)WHERE following.followerCount < 1000000 // Exclude celebritiesMATCH (following)-[:FOLLOWS]->(suggestion:User)WHERE NOT (me)-[:FOLLOWS]->(suggestion)RETURN suggestion.name, count(*) AS via_countORDER BY via_count DESCLIMIT 20 // Sample relationships from supernodesMATCH (me:User {id: $userId})-[:FOLLOWS]->(celebrity:User)WHERE celebrity.followerCount > 1000000WITH me, celebrity, rand() AS rORDER BY rLIMIT 10 // Sample 10 random celebrity followsMATCH (celebrity)-[:POSTED]->(post:Post)WHERE post.createdAt > datetime() - duration('P1D')RETURN post.idLIMIT 5| Feature | Latency Target | Refresh Rate | Fallback |
|---|---|---|---|
| Social Feed | < 100ms | Real-time + 5min cache | Chronological |
| PYMK | < 200ms | Every 1-6 hours | Popular users |
| Product Recommendations | < 150ms | Daily batch + real-time signals | Bestsellers |
| Similar Items | < 50ms | Daily pre-compute | Same category items |
Track recommendation quality with metrics: click-through rate (CTR), conversion, engagement time, and diversity. A/B test algorithm changes. Use feedback loops—clicks and purchases generate signals that improve future recommendations.
Social networking and recommendation engines represent the flagship applications of graph databases—domains where the relationship-centric model provides decisive advantages. Let's consolidate the key insights:
What's Next:
We'll complete the graph database module by examining use cases and trade-offs—when to choose graph databases, their limitations, and how they fit into polyglot persistence architectures. This will give you the decision framework for your own systems.
You now understand how graph databases power social networks and recommendation engines—the applications that made graph databases mainstream. These patterns apply across domains wherever human connections and item preferences drive value.