System Design HLDLinkedIn Connections

LinkedIn Connections: Designing Professional Social Graphs

LevelAdvanced

Duration90 mins

TopicLinkedIn Connections

4 / 5

Recommendation Algorithms

The Engine of Network Growth

LinkedIn's "People You May Know" (PYMK) feature generates billions of recommendations daily, driving significant network growth. A well-tuned recommendation system doesn't just suggest connections—it surfaces the exact people who could advance your career, introduce you to opportunities, or help you help others.

Behind this seemingly simple feature lies a sophisticated multi-stage recommendation pipeline that must:

Process hundreds of features per candidate pair
Evaluate millions of potential connections per user
Return results in under 500ms
Balance relevance, diversity, and freshness
Respect privacy and prevent spam

In this page, we'll dissect the recommendation algorithms that power professional network suggestions, from classical graph-based approaches to modern machine learning systems.

What You Will Learn

By the end of this page, you will understand candidate generation via graph proximity, feature engineering for professional relevance, machine learning ranking models, real-time personalization, explore-exploit trade-offs, and evaluation metrics for connection recommendations.

Recommendation System Architecture

Connection recommendation systems follow a multi-stage pipeline architecture that progressively filters and ranks candidates. This funnel approach is necessary because evaluating all 900 million users for each recommendation request is infeasible.

Converting Mermaid diagram...

Pipeline Stage Responsibilities
Stage	Purpose	Scale	Latency Budget
Candidate Generation	Find potentially relevant users	900M → ~100K	Precomputed
Filtering	Remove invalid/unwanted candidates	~100K → ~50K	<50ms
Ranking	Score and order by relevance	~50K → ~1K	<200ms
Post-Processing	Apply business rules, diversity	~1K → 10-50	<50ms

Why Multi-Stage?

The key insight is that different techniques excel at different scales:

Candidate Generation uses fast, approximate methods (graph traversal, bloom filters) to reduce the search space by 10,000x
Filtering applies boolean rules that eliminate clearly wrong candidates
Ranking uses expensive ML models that can only score thousands of candidates in time budget
Post-Processing applies business logic and diversity constraints

This architecture allows LinkedIn to evaluate complex features for ranking while maintaining sub-second response times.

Candidate Generation Algorithms

Candidate generation is the most critical stage—if a valuable connection isn't in the candidate pool, no amount of sophisticated ranking will surface it. Multiple sources contribute candidates, each capturing different types of relevant connections.

Candidate Generation Sources

•2nd-Degree Network (Friends of Friends) — Most PYMK recommendations come from your 2nd-degree network. If two of your connections both know someone, they're likely relevant to you.
•Shared Professional Context — People at your current/past companies, alumni from your schools, and members of your groups.
•Contact Import Matches — Email addresses and phone numbers from user-uploaded contacts matched to LinkedIn profiles.
•Similar Profiles — Users with similar job titles, skills, and career trajectories.
•Collaborative Filtering — Users who connected with similar sets of people often benefit from connecting with each other.
•Content Engagement — People who engaged with the same posts, articles, or events.

Candidate Generation Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
interface CandidateScore {
  memberId: string;
  sources: CandidateSource[];
  aggregateScore: number;
}
 
type CandidateSource = {
  type: 'second_degree' | 'same_company' | 'same_school' | 
        'contact_import' | 'similar_profile' | 'collaborative' | 
        'content_engagement';
  score: number;
  metadata: Record<string, any>;
};
 
class CandidateGenerator {
  private graphStore: GraphStore;
  private contextStore: ContextStore;
  private contactStore: ContactStore;
  private cfModel: CollaborativeFilteringModel;
 
  // Maximum candidates from each source
  private LIMITS = {
    secondDegree: 50000,
    sameCompany: 10000,
    sameSchool: 10000,
    contactImport: 5000,
    similarProfile: 5000,
    collaborative: 5000,
    contentEngagement: 5000,
  };
 
  async generateCandidates(
    memberId: string,
    existingConnections: Set<string>
  ): Promise<CandidateScore[]> {
    // Parallel generation from all sources
    const [
      secondDegree,
      sameCompany,
      sameSchool,
      contacts,
      similar,
      collaborative,
      engagement,
    ] = await Promise.all([
      this.getSecondDegreeCandidates(memberId, existingConnections),
      this.getSameCompanyCandidates(memberId, existingConnections),
      this.getSameSchoolCandidates(memberId, existingConnections),
      this.getContactImportCandidates(memberId, existingConnections),
      this.getSimilarProfileCandidates(memberId, existingConnections),
      this.getCollaborativeCandidates(memberId, existingConnections),
      this.getContentEngagementCandidates(memberId, existingConnections),
    ]);
 
    // Merge and aggregate scores
    return this.aggregateCandidates([
      ...secondDegree,
      ...sameCompany,
      ...sameSchool,
      ...contacts,
      ...similar,
      ...collaborative,
      ...engagement,
    ]);
  }
 
  // Primary source: Friends of friends
  private async getSecondDegreeCandidates(
    memberId: string,
    existing: Set<string>
  ): Promise<CandidateSource[]> {
    const connections = await this.graphStore.getConnections(memberId);
    const mutualCounts = new Map<string, number>();
    const mutualList = new Map<string, string[]>();
 
    // Count how many mutual connections each 2nd-degree has
    for (const conn of connections) {
      const theirConnections = await this.graphStore.getConnections(conn);
      
      for (const fof of theirConnections) {
        // Skip if already connected or is the user
        if (existing.has(fof) || fof === memberId) continue;
        
        mutualCounts.set(fof, (mutualCounts.get(fof) || 0) + 1);
        
        if (!mutualList.has(fof)) mutualList.set(fof, []);
        mutualList.get(fof)!.push(conn);
      }
    }
 
    // Sort by mutual count and take top candidates
    const sorted = Array.from(mutualCounts.entries())
      .sort((a, b) => b[1] - a[1])
      .slice(0, this.LIMITS.secondDegree);
 
    return sorted.map(([fofId, mutualCount]) => ({
      type: 'second_degree' as const,
      score: this.scoreByMutualCount(mutualCount),
      metadata: {
        candidateId: fofId,
        mutualCount,
        sampleMutuals: mutualList.get(fofId)!.slice(0, 5),
      },
    }));
  }
 
  // Colleagues: Same company, may not be connected
  private async getSameCompanyCandidates(
    memberId: string,
    existing: Set<string>
  ): Promise<CandidateSource[]> {
    const member = await this.memberStore.getMember(memberId);
    const candidates: CandidateSource[] = [];
 
    // Current company (highest priority)
    if (member.currentCompanyId) {
      const colleagues = await this.contextStore.getCompanyMembers(
        member.currentCompanyId,
        this.LIMITS.sameCompany
      );
 
      for (const colleague of colleagues) {
        if (existing.has(colleague.id) || colleague.id === memberId) continue;
 
        candidates.push({
          type: 'same_company',
          score: this.scoreByOverlap(member, colleague, 'current'),
          metadata: {
            candidateId: colleague.id,
            companyId: member.currentCompanyId,
            companyName: member.currentCompanyName,
            isCurrent: true,
          },
        });
      }
    }
 
    // Past companies (lower priority)
    for (const pastPosition of member.pastPositions || []) {
      const formerColleagues = await this.contextStore.getCompanyMembers(
        pastPosition.companyId,
        1000  // Smaller limit for past companies
      );
 
      for (const colleague of formerColleagues) {
        if (existing.has(colleague.id) || colleague.id === memberId) continue;
 
        candidates.push({
          type: 'same_company',
          score: this.scoreByOverlap(member, colleague, 'past') * 0.7,
          metadata: {
            candidateId: colleague.id,
            companyId: pastPosition.companyId,
            companyName: pastPosition.companyName,
            isCurrent: false,
          },
        });
      }
    }
 
    return candidates.slice(0, this.LIMITS.sameCompany);
  }
 
  // Collaborative filtering: users who connected with similar people
  private async getCollaborativeCandidates(
    memberId: string,
    existing: Set<string>
  ): Promise<CandidateSource[]> {
    // Get similar users via item-based collaborative filtering
    // "Users who connected with your connections also connected with..."
    const similarUsers = await this.cfModel.getSimilarUsers(
      memberId,
      this.LIMITS.collaborative
    );
 
    return similarUsers
      .filter(u => !existing.has(u.userId) && u.userId !== memberId)
      .map(u => ({
        type: 'collaborative' as const,
        score: u.similarity,
        metadata: {
          candidateId: u.userId,
          similarity: u.similarity,
          sharedConnectionPattern: u.explanation,
        },
      }));
  }
 
  // Aggregate candidates from multiple sources
  private aggregateCandidates(sources: CandidateSource[]): CandidateScore[] {
    const byMember = new Map<string, CandidateSource[]>();
 
    for (const source of sources) {
      const candidateId = source.metadata.candidateId;
      if (!byMember.has(candidateId)) {
        byMember.set(candidateId, []);
      }
      byMember.get(candidateId)!.push(source);
    }
 
    // Aggregate scores with diminishing returns for multiple sources
    const aggregated: CandidateScore[] = [];
 
    for (const [memberId, memberSources] of byMember) {
      // Sort by score
      memberSources.sort((a, b) => b.score - a.score);
 
      // Primary source gets full score, others get diminishing weight
      let aggregateScore = 0;
      for (let i = 0; i < memberSources.length; i++) {
        const weight = 1 / (i + 1);  // 1, 0.5, 0.33, ...
        aggregateScore += memberSources[i].score * weight;
      }
 
      aggregated.push({
        memberId,
        sources: memberSources,
        aggregateScore,
      });
    }
 
    return aggregated.sort((a, b) => b.aggregateScore - a.aggregateScore);
  }
 
  private scoreByMutualCount(count: number): number {
    // Logarithmic scaling: diminishing returns for many mutuals
    // 1 mutual: 0.2, 5 mutuals: 0.5, 20 mutuals: 0.8, 50+: ~1.0
    return Math.min(1.0, Math.log(count + 1) / Math.log(50));
  }
}

Feature Engineering for Professional Networks

The ranking model's effectiveness depends heavily on feature engineering. Professional networks have unique feature categories that capture both topological and semantic signals.

Feature Categories for PYMK Ranking
Category	Example Features	Signal Type
Graph Proximity	Mutual connection count, path length, Jaccard similarity	Structural
Professional Context	Same company, overlapping tenure, same school, shared groups	Semantic
Profile Similarity	Title similarity, skill overlap, industry match, seniority match	Content
Behavioral	Profile views, content engagement, message history	Interaction
Temporal	Recency of connection, activity freshness, account age	Time-based
User-level	Connection count, activity level, network quality	Quality

Feature Extraction
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
interface PYMKFeatures {
  // Graph proximity features (12 features)
  mutualConnectionCount: number;
  mutualConnectionRatio: number;  // mutual / max(connections_a, connections_b)
  jaccardSimilarity: number;      // |A ∩ B| / |A ∪ B|
  adamicAdarScore: number;        // Sum of 1/log(degree) for mutual friends
  commonNeighbors2ndDegree: number;
  shortestPathLength: number;
  pathCount: number;              // Number of shortest paths
  triangleClosure: number;        // Would this connection close triangles?
  clusteringCoefficient: number;
  localBridgeScore: number;       // Does this connect different communities?
  preferentialAttachment: number; // degree_a * degree_b
  resourceAllocation: number;     // Sum of 1/degree for mutual friends
 
  // Professional context features (20+ features)
  sameCurrentCompany: boolean;
  samePastCompany: boolean;
  companyOverlapDays: number;
  sameSchool: boolean;
  schoolOverlapYears: number;
  sameIndustry: boolean;
  sameFunction: boolean;          // Engineering, Sales, Marketing, etc.
  sharedGroupCount: number;
  sharedEventCount: number;
  sharedInterestCount: number;
  geographicDistance: number;
  sameLocation: boolean;
 
  // Profile similarity features (15+ features)
  titleSimilarity: number;        // Semantic similarity of job titles
  skillOverlapRatio: number;
  skillWeightedOverlap: number;   // Weighted by skill rarity
  experienceYearsDiff: number;
  senioritySimilarity: number;    // Entry, mid, senior, exec
  industryPath: string[];         // Shared career industries
  careerTrajectoryMatch: number;
  educationLevelMatch: boolean;
  languageOverlap: number;
  contentTopicSimilarity: number;
 
  // Behavioral features (10+ features)
  viewerViewedCandidate: boolean;
  candidateViewedViewer: boolean;
  daysSinceProfileView: number;
  sharedContentEngagements: number;
  messagedInPast: boolean;
  connectionRequestHistory: 'none' | 'sent' | 'received' | 'rejected';
  searchedForCandidate: boolean;
  
  // Temporal features
  viewerAccountAgeMonths: number;
  candidateAccountAgeMonths: number;
  viewerConnectionRate: number;   // New connections per month
  candidateConnectionRate: number;
  daysSinceCandidateActive: number;
  
  // User quality features
  viewerConnectionCount: number;
  candidateConnectionCount: number;
  candidateProfileCompleteness: number;
  candidateEndorsementCount: number;
  candidateRecommendationCount: number;
  candidateContentEngagementRate: number;
  candidateResponseRate: number;  // Reply rate to messages/requests
}
 
class FeatureExtractor {
  async extractFeatures(
    viewerId: string,
    candidateId: string,
    candidateScore: CandidateScore
  ): Promise<PYMKFeatures> {
    const [
      viewer,
      candidate,
      graphFeatures,
      contextFeatures,
      behaviorFeatures,
    ] = await Promise.all([
      this.memberStore.getMember(viewerId),
      this.memberStore.getMember(candidateId),
      this.extractGraphFeatures(viewerId, candidateId, candidateScore),
      this.extractContextFeatures(viewerId, candidateId),
      this.extractBehaviorFeatures(viewerId, candidateId),
    ]);
 
    const similarityFeatures = await this.extractSimilarityFeatures(
      viewer, candidate
    );
 
    const temporalFeatures = this.extractTemporalFeatures(viewer, candidate);
    const qualityFeatures = this.extractQualityFeatures(viewer, candidate);
 
    return {
      ...graphFeatures,
      ...contextFeatures,
      ...behaviorFeatures,
      ...similarityFeatures,
      ...temporalFeatures,
      ...qualityFeatures,
    };
  }
 
  private async extractGraphFeatures(
    viewerId: string,
    candidateId: string,
    candidateScore: CandidateScore
  ): Promise<Partial<PYMKFeatures>> {
    const [viewerConns, candidateConns] = await Promise.all([
      this.graphStore.getConnectionSet(viewerId),
      this.graphStore.getConnectionSet(candidateId),
    ]);
 
    // Mutual connections
    const mutuals = this.intersection(viewerConns, candidateConns);
    const union = this.union(viewerConns, candidateConns);
 
    // Adamic-Adar: Sum of 1/log(degree) for mutual connections
    // Gives more weight to mutuals with fewer connections (more specific signal)
    let adamicAdar = 0;
    for (const mutual of mutuals) {
      const mutualDegree = await this.graphStore.getConnectionCount(mutual);
      if (mutualDegree > 1) {
        adamicAdar += 1 / Math.log(mutualDegree);
      }
    }
 
    // Resource Allocation: Similar but 1/degree instead of 1/log(degree)
    let resourceAllocation = 0;
    for (const mutual of mutuals) {
      const mutualDegree = await this.graphStore.getConnectionCount(mutual);
      if (mutualDegree > 0) {
        resourceAllocation += 1 / mutualDegree;
      }
    }
 
    return {
      mutualConnectionCount: mutuals.size,
      mutualConnectionRatio: mutuals.size / Math.max(viewerConns.size, candidateConns.size),
      jaccardSimilarity: mutuals.size / union.size,
      adamicAdarScore: adamicAdar,
      resourceAllocation,
      preferentialAttachment: viewerConns.size * candidateConns.size,
      shortestPathLength: candidateScore.sources.find(s => s.type === 'second_degree')
        ? 2 : 3,
    };
  }
 
  private extractSimilarityFeatures(
    viewer: Member,
    candidate: Member
  ): Partial<PYMKFeatures> {
    // Title similarity using TF-IDF or embeddings
    const titleSimilarity = this.computeTitleSimilarity(
      viewer.headline,
      candidate.headline
    );
 
    // Skill overlap
    const viewerSkills = new Set(viewer.skills || []);
    const candidateSkills = new Set(candidate.skills || []);
    const skillOverlap = this.intersection(viewerSkills, candidateSkills);
    
    // Weight by skill rarity (rare shared skills are more meaningful)
    const skillWeightedOverlap = this.computeWeightedSkillOverlap(
      viewer.skills || [],
      candidate.skills || []
    );
 
    // Seniority matching
    const senioritySimilarity = this.computeSenioritySimilarity(
      viewer.seniority,
      candidate.seniority
    );
 
    return {
      titleSimilarity,
      skillOverlapRatio: skillOverlap.size / Math.max(viewerSkills.size, candidateSkills.size),
      skillWeightedOverlap,
      experienceYearsDiff: Math.abs(
        (viewer.totalExperienceYears || 0) - (candidate.totalExperienceYears || 0)
      ),
      senioritySimilarity,
      sameIndustry: viewer.industry === candidate.industry,
    };
  }
 
  private computeWeightedSkillOverlap(
    skills1: string[],
    skills2: string[]
  ): number {
    let score = 0;
    const skills2Set = new Set(skills2);
 
    for (const skill of skills1) {
      if (skills2Set.has(skill)) {
        // Weight by inverse document frequency (rarer skills = higher weight)
        const idf = this.skillIDF.get(skill) || 1.0;
        score += idf;
      }
    }
 
    return score;
  }
}

Feature Importance Insight

In practice, mutual connection count is the single most predictive feature for PYMK. However, for users with few connections (cold start), professional context features like same company/school become dominant. The model must handle this heterogeneity gracefully.

Ranking Models

The ranking model takes extracted features and predicts the probability that showing a recommendation will result in a desired action (connection request, acceptance, or meaningful interaction).

Evolution of PYMK Ranking Models:

Generation 1: Handcrafted Rules

Simple weighted combination of mutual count, same company, etc.
Easy to interpret but misses complex interactions
Example: score = 0.5 * mutualCount + 0.3 * sameCompany + 0.2 * schoolMatch

Generation 2: Linear Models (Logistic Regression)

Learned weights from historical data
Still interpretable, handles feature interactions via feature engineering
Struggles with non-linear patterns

Generation 3: Tree-Based Models (GBDT, XGBoost)

Captures non-linear relationships and feature interactions
Robust to feature scaling, handles missing values
LinkedIn's production model for many years

Generation 4: Deep Learning

Neural networks with embeddings for categorical features
Can learn complex patterns from raw inputs
Requires more data and compute, harder to interpret

Ranking Model Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// Gradient Boosted Decision Tree model for PYMK ranking
interface RankingModel {
  predict(features: PYMKFeatures): number;  // Returns probability [0, 1]
  predictBatch(features: PYMKFeatures[]): number[];
}
 
class GBDTRankingModel implements RankingModel {
  private model: XGBoostModel;
  private featureNames: string[];
 
  constructor(modelPath: string) {
    this.model = XGBoostModel.load(modelPath);
    this.featureNames = this.model.getFeatureNames();
  }
 
  predict(features: PYMKFeatures): number {
    const featureVector = this.toVector(features);
    const logit = this.model.predictLogit(featureVector);
    return this.sigmoid(logit);
  }
 
  predictBatch(features: PYMKFeatures[]): number[] {
    const vectors = features.map(f => this.toVector(f));
    const logits = this.model.predictLogitBatch(vectors);
    return logits.map(l => this.sigmoid(l));
  }
 
  private toVector(features: PYMKFeatures): number[] {
    return this.featureNames.map(name => {
      const value = features[name as keyof PYMKFeatures];
      if (typeof value === 'boolean') return value ? 1 : 0;
      if (typeof value === 'number') return value;
      return 0;  // Handle missing
    });
  }
 
  private sigmoid(x: number): number {
    return 1 / (1 + Math.exp(-x));
  }
}
 
// Deep learning model with embeddings
class NeuralRankingModel implements RankingModel {
  private model: TensorFlowModel;
  private embeddings: {
    company: Embedding;
    school: Embedding;
    title: Embedding;
    skills: MultiHotEmbedding;
  };
 
  predict(features: PYMKFeatures): number {
    // Embed categorical features
    const companyEmb = this.embeddings.company.lookup(features.candidateCompanyId);
    const schoolEmb = this.embeddings.school.lookup(features.candidateSchoolId);
    const titleEmb = this.embeddings.title.lookup(features.candidateTitle);
    const skillsEmb = this.embeddings.skills.lookup(features.candidateSkills);
 
    // Numeric features
    const numericFeatures = this.extractNumericFeatures(features);
 
    // Concatenate and feed through network
    const input = tf.concat([
      companyEmb,
      schoolEmb,
      titleEmb,
      skillsEmb,
      numericFeatures,
    ]);
 
    const output = this.model.predict(input);
    return output.dataSync()[0];
  }
}
 
// Ranking service that orchestrates model inference
class RankingService {
  private model: RankingModel;
  private featureExtractor: FeatureExtractor;
  private cache: RankingCache;
 
  async rankCandidates(
    viewerId: string,
    candidates: CandidateScore[],
    limit: number = 100
  ): Promise<RankedCandidate[]> {
    // Extract features in parallel batches
    const batchSize = 100;
    const allFeatures: PYMKFeatures[] = [];
 
    for (let i = 0; i < candidates.length; i += batchSize) {
      const batch = candidates.slice(i, i + batchSize);
      const batchFeatures = await Promise.all(
        batch.map(c => this.featureExtractor.extractFeatures(viewerId, c.memberId, c))
      );
      allFeatures.push(...batchFeatures);
    }
 
    // Batch prediction
    const scores = this.model.predictBatch(allFeatures);
 
    // Combine with candidates and sort
    const ranked: RankedCandidate[] = candidates.map((c, i) => ({
      candidateId: c.memberId,
      sources: c.sources,
      modelScore: scores[i],
      features: allFeatures[i],
    }));
 
    ranked.sort((a, b) => b.modelScore - a.modelScore);
 
    return ranked.slice(0, limit);
  }
}
 
interface RankedCandidate {
  candidateId: string;
  sources: CandidateSource[];
  modelScore: number;
  features: PYMKFeatures;
}

Multi-Objective Optimization

The model optimizes for a composite objective: P(click) × P(request | click) × P(accept | request) × P(valuable | accept). Simply optimizing for clicks leads to clickbait recommendations; optimizing for accepts alone ignores user engagement. The target balances immediate engagement with long-term network value.

Diversity and Exploration

Pure relevance-based ranking creates filter bubbles—users only see recommendations from their existing professional circle. Diversity injection and exploration strategies broaden networks and prevent stagnation.

Diversity Dimensions

•Source Diversity — Don't show only 2nd-degree; include colleagues, alumni, similar profiles.
•Company Diversity — Recommendations from multiple companies, not just current employer.
•Connection Count Diversity — Mix of popular users (credibility) and less-connected (reciprocity).
•Recency Diversity — Include both recent joiners and established members.
•Industry Diversity — Cross-industry connections for career pivots.
•Seniority Diversity — Peers, mentors, and mentees.

Diversity Optimization
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
class DiversityOptimizer {
  // Maximal Marginal Relevance (MMR) for diverse recommendations
  selectWithDiversity(
    candidates: RankedCandidate[],
    count: number,
    lambda: number = 0.7  // Relevance vs diversity trade-off
  ): RankedCandidate[] {
    const selected: RankedCandidate[] = [];
    const remaining = new Set(candidates);
 
    while (selected.length < count && remaining.size > 0) {
      let bestCandidate: RankedCandidate | null = null;
      let bestMMR = -Infinity;
 
      for (const candidate of remaining) {
        // MMR = λ * relevance - (1 - λ) * max_similarity_to_selected
        const relevance = candidate.modelScore;
        const maxSimilarity = selected.length > 0
          ? Math.max(...selected.map(s => this.similarity(candidate, s)))
          : 0;
 
        const mmr = lambda * relevance - (1 - lambda) * maxSimilarity;
 
        if (mmr > bestMMR) {
          bestMMR = mmr;
          bestCandidate = candidate;
        }
      }
 
      if (bestCandidate) {
        selected.push(bestCandidate);
        remaining.delete(bestCandidate);
      }
    }
 
    return selected;
  }
 
  // Similarity based on multiple dimensions
  private similarity(a: RankedCandidate, b: RankedCandidate): number {
    const sameCompany = a.features.candidateCompanyId === b.features.candidateCompanyId ? 0.3 : 0;
    const sameIndustry = a.features.sameIndustry === b.features.sameIndustry ? 0.2 : 0;
    const connectionCountDiff = 1 - Math.abs(
      a.features.candidateConnectionCount - b.features.candidateConnectionCount
    ) / 1000;
    const sourceOverlap = this.sourceOverlap(a.sources, b.sources);
 
    return sameCompany + sameIndustry + connectionCountDiff * 0.2 + sourceOverlap * 0.3;
  }
 
  private sourceOverlap(a: CandidateSource[], b: CandidateSource[]): number {
    const aTypes = new Set(a.map(s => s.type));
    const bTypes = new Set(b.map(s => s.type));
    const intersection = [...aTypes].filter(t => bTypes.has(t)).length;
    return intersection / Math.max(aTypes.size, bTypes.size);
  }
}
 
// Exploration via Thompson Sampling
class ExplorationStrategy {
  // Each candidate has a Beta distribution for CTR
  private beta: Map<string, { alpha: number; beta: number }> = new Map();
 
  // Thompson Sampling: Sample from posterior, select highest
  selectWithExploration(
    candidates: RankedCandidate[],
    count: number
  ): RankedCandidate[] {
    const sampled = candidates.map(c => ({
      candidate: c,
      sample: this.thompsonSample(c.candidateId, c.modelScore),
    }));
 
    sampled.sort((a, b) => b.sample - a.sample);
 
    return sampled.slice(0, count).map(s => s.candidate);
  }
 
  private thompsonSample(candidateId: string, prior: number): number {
    // Get or initialize Beta distribution
    let params = this.beta.get(candidateId);
    if (!params) {
      // Initialize with weak prior based on model score
      const pseudoSuccess = prior * 10;
      const pseudoFailure = (1 - prior) * 10;
      params = { alpha: 1 + pseudoSuccess, beta: 1 + pseudoFailure };
      this.beta.set(candidateId, params);
    }
 
    // Sample from Beta(alpha, beta)
    return this.sampleBeta(params.alpha, params.beta);
  }
 
  // Record observation to update posterior
  recordOutcome(candidateId: string, wasPositive: boolean): void {
    const params = this.beta.get(candidateId);
    if (params) {
      if (wasPositive) {
        params.alpha += 1;
      } else {
        params.beta += 1;
      }
    }
  }
 
  private sampleBeta(alpha: number, beta: number): number {
    // Approximate Beta sampling
    const x = this.sampleGamma(alpha, 1);
    const y = this.sampleGamma(beta, 1);
    return x / (x + y);
  }
 
  private sampleGamma(shape: number, scale: number): number {
    // Implementation of Gamma sampling
    // ... (standard algorithm)
    return 0; // Placeholder
  }
}

The Cold Start Problem

New users have no connection graph, so graph-based features are useless. Cold start strategies include: leveraging contact import aggressively, using profile attributes (company, school) as primary signals, applying content-based filtering on skills/interests, and showing popular users in their company/school as initial suggestions.

Evaluation and Metrics

Recommendation system success must be measured carefully. Wrong metrics lead to optimizing for the wrong outcomes.

PYMK Evaluation Metrics
Metric	Definition	Purpose	Caveats
CTR	Clicks / Impressions	Engagement measurement	Clickbait can inflate
Request Rate	Requests / Impressions	Intent measurement	Doesn't measure quality
Accept Rate	Accepts / Requests	Relevance measurement	Biased by recipient activity
Network Growth	New connections / User / Day	Business outcome	Can be gamed
Message Rate	Messages / New Connection	Connection quality	Long-term metric
Retention Impact	User retention vs control	Overall value	Hard to attribute

Recommendation Evaluation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
class PYMKMetrics {
  // Offline evaluation metrics
  calculateOfflineMetrics(
    predictions: Array<{ predicted: number; actual: boolean }>,
    atK: number[] = [5, 10, 20]
  ): OfflineMetrics {
    // Sort by predicted score
    predictions.sort((a, b) => b.predicted - a.predicted);
 
    return {
      auc: this.calculateAUC(predictions),
      precision: Object.fromEntries(
        atK.map(k => [k, this.precisionAtK(predictions, k)])
      ),
      recall: Object.fromEntries(
        atK.map(k => [k, this.recallAtK(predictions, k)])
      ),
      ndcg: Object.fromEntries(
        atK.map(k => [k, this.ndcgAtK(predictions, k)])
      ),
      mrr: this.meanReciprocalRank(predictions),
    };
  }
 
  // Normalized Discounted Cumulative Gain
  private ndcgAtK(
    predictions: Array<{ predicted: number; actual: boolean }>,
    k: number
  ): number {
    const topK = predictions.slice(0, k);
    
    // DCG = sum((2^rel - 1) / log2(position + 1))
    const dcg = topK.reduce((sum, item, index) => {
      const relevance = item.actual ? 1 : 0;
      const discount = Math.log2(index + 2);  // log2(2) = 1 for position 1
      return sum + (Math.pow(2, relevance) - 1) / discount;
    }, 0);
 
    // Ideal DCG (all relevant items first)
    const relevantCount = predictions.filter(p => p.actual).length;
    const idealK = Math.min(k, relevantCount);
    let idcg = 0;
    for (let i = 0; i < idealK; i++) {
      idcg += (Math.pow(2, 1) - 1) / Math.log2(i + 2);
    }
 
    return idcg > 0 ? dcg / idcg : 0;
  }
 
  // Online A/B test metrics
  calculateOnlineMetrics(
    control: ExperimentData,
    treatment: ExperimentData
  ): ABTestResult {
    const ctr = this.compareRates(
      treatment.clicks / treatment.impressions,
      control.clicks / control.impressions,
      treatment.impressions,
      control.impressions
    );
 
    const requestRate = this.compareRates(
      treatment.requests / treatment.impressions,
      control.requests / control.impressions,
      treatment.impressions,
      control.impressions
    );
 
    const acceptRate = this.compareRates(
      treatment.accepts / treatment.requests,
      control.accepts / control.requests,
      treatment.requests,
      control.requests
    );
 
    const connectionGrowth = this.compareRates(
      treatment.newConnections / treatment.users,
      control.newConnections / control.users,
      treatment.users,
      control.users
    );
 
    return {
      ctr,
      requestRate,
      acceptRate,
      connectionGrowth,
      recommendation: this.makeRecommendation(
        ctr, requestRate, acceptRate, connectionGrowth
      ),
    };
  }
 
  private compareRates(
    treatmentRate: number,
    controlRate: number,
    treatmentN: number,
    controlN: number
  ): MetricComparison {
    const pooledRate = (treatmentRate * treatmentN + controlRate * controlN) 
                       / (treatmentN + controlN);
    const se = Math.sqrt(pooledRate * (1 - pooledRate) * 
               (1/treatmentN + 1/controlN));
    const zScore = (treatmentRate - controlRate) / se;
    const pValue = 2 * (1 - this.normalCDF(Math.abs(zScore)));
 
    return {
      treatment: treatmentRate,
      control: controlRate,
      lift: (treatmentRate - controlRate) / controlRate,
      pValue,
      significant: pValue < 0.05,
    };
  }
}
 
interface OfflineMetrics {
  auc: number;
  precision: Record<number, number>;
  recall: Record<number, number>;
  ndcg: Record<number, number>;
  mrr: number;
}
 
interface MetricComparison {
  treatment: number;
  control: number;
  lift: number;
  pValue: number;
  significant: boolean;
}

Summary: Recommendation Algorithms

We've explored the complete recommendation pipeline for professional networks. Let's consolidate the key insights:

Key Takeaways

•Multi-stage pipeline is essential — Candidate generation (fast/approximate) → Filtering → Ranking (accurate/expensive) → Post-processing.
•Multiple candidate sources — 2nd-degree network, shared context, contacts, similar profiles, and collaborative filtering each capture different relevant connections.
•Feature engineering matters — Graph proximity, professional context, profile similarity, and behavioral signals all contribute unique predictive value.
•Model choice depends on scale — GBDT/XGBoost is the workhorse for most cases; deep learning adds value for embedding categorical features and learning complex patterns.
•Diversity prevents filter bubbles — MMR and Thompson Sampling ensure users see varied recommendations, not just the highest-ranked similar candidates.
•Cold start needs special handling — New users require different strategies: contact import, profile-based matching, and popular connections in their company/school.
•Composite metrics measure true value — Optimize for clicks × requests × accepts × engagement, not just any single metric.

Up Next: Scaling Social Graphs

You now understand how recommendation algorithms power professional network growth. In the final page, we'll explore scaling strategies for social graphs—how to maintain performance as the network grows to billions of edges.

4 / 5

Loading learning content...

System Design HLDLinkedIn Connections

LinkedIn Connections: Designing Professional Social Graphs

LevelAdvanced

Duration90 mins

TopicLinkedIn Connections

4 / 5

Recommendation Algorithms

The Engine of Network Growth

Behind this seemingly simple feature lies a sophisticated multi-stage recommendation pipeline that must:

Process hundreds of features per candidate pair
Evaluate millions of potential connections per user
Return results in under 500ms
Balance relevance, diversity, and freshness
Respect privacy and prevent spam

In this page, we'll dissect the recommendation algorithms that power professional network suggestions, from classical graph-based approaches to modern machine learning systems.

What You Will Learn

Recommendation System Architecture

Converting Mermaid diagram...

Pipeline Stage Responsibilities
Stage	Purpose	Scale	Latency Budget
Candidate Generation	Find potentially relevant users	900M → ~100K	Precomputed
Filtering	Remove invalid/unwanted candidates	~100K → ~50K	<50ms
Ranking	Score and order by relevance	~50K → ~1K	<200ms
Post-Processing	Apply business rules, diversity	~1K → 10-50	<50ms

Why Multi-Stage?

The key insight is that different techniques excel at different scales:

Candidate Generation uses fast, approximate methods (graph traversal, bloom filters) to reduce the search space by 10,000x
Filtering applies boolean rules that eliminate clearly wrong candidates
Ranking uses expensive ML models that can only score thousands of candidates in time budget
Post-Processing applies business logic and diversity constraints

This architecture allows LinkedIn to evaluate complex features for ranking while maintaining sub-second response times.

Candidate Generation Algorithms

Candidate Generation Sources

•2nd-Degree Network (Friends of Friends) — Most PYMK recommendations come from your 2nd-degree network. If two of your connections both know someone, they're likely relevant to you.
•Shared Professional Context — People at your current/past companies, alumni from your schools, and members of your groups.
•Contact Import Matches — Email addresses and phone numbers from user-uploaded contacts matched to LinkedIn profiles.
•Similar Profiles — Users with similar job titles, skills, and career trajectories.
•Collaborative Filtering — Users who connected with similar sets of people often benefit from connecting with each other.
•Content Engagement — People who engaged with the same posts, articles, or events.

Candidate Generation Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
interface CandidateScore {
  memberId: string;
  sources: CandidateSource[];
  aggregateScore: number;
}
 
type CandidateSource = {
  type: 'second_degree' | 'same_company' | 'same_school' | 
        'contact_import' | 'similar_profile' | 'collaborative' | 
        'content_engagement';
  score: number;
  metadata: Record<string, any>;
};
 
class CandidateGenerator {
  private graphStore: GraphStore;
  private contextStore: ContextStore;
  private contactStore: ContactStore;
  private cfModel: CollaborativeFilteringModel;
 
  // Maximum candidates from each source
  private LIMITS = {
    secondDegree: 50000,
    sameCompany: 10000,
    sameSchool: 10000,
    contactImport: 5000,
    similarProfile: 5000,
    collaborative: 5000,
    contentEngagement: 5000,
  };
 
  async generateCandidates(
    memberId: string,
    existingConnections: Set<string>
  ): Promise<CandidateScore[]> {
    // Parallel generation from all sources
    const [
      secondDegree,
      sameCompany,
      sameSchool,
      contacts,
      similar,
      collaborative,
      engagement,
    ] = await Promise.all([
      this.getSecondDegreeCandidates(memberId, existingConnections),
      this.getSameCompanyCandidates(memberId, existingConnections),
      this.getSameSchoolCandidates(memberId, existingConnections),
      this.getContactImportCandidates(memberId, existingConnections),
      this.getSimilarProfileCandidates(memberId, existingConnections),
      this.getCollaborativeCandidates(memberId, existingConnections),
      this.getContentEngagementCandidates(memberId, existingConnections),
    ]);
 
    // Merge and aggregate scores
    return this.aggregateCandidates([
      ...secondDegree,
      ...sameCompany,
      ...sameSchool,
      ...contacts,
      ...similar,
      ...collaborative,
      ...engagement,
    ]);
  }
 
  // Primary source: Friends of friends
  private async getSecondDegreeCandidates(
    memberId: string,
    existing: Set<string>
  ): Promise<CandidateSource[]> {
    const connections = await this.graphStore.getConnections(memberId);
    const mutualCounts = new Map<string, number>();
    const mutualList = new Map<string, string[]>();
 
    // Count how many mutual connections each 2nd-degree has
    for (const conn of connections) {
      const theirConnections = await this.graphStore.getConnections(conn);
      
      for (const fof of theirConnections) {
        // Skip if already connected or is the user
        if (existing.has(fof) || fof === memberId) continue;
        
        mutualCounts.set(fof, (mutualCounts.get(fof) || 0) + 1);
        
        if (!mutualList.has(fof)) mutualList.set(fof, []);
        mutualList.get(fof)!.push(conn);
      }
    }
 
    // Sort by mutual count and take top candidates
    const sorted = Array.from(mutualCounts.entries())
      .sort((a, b) => b[1] - a[1])
      .slice(0, this.LIMITS.secondDegree);
 
    return sorted.map(([fofId, mutualCount]) => ({
      type: 'second_degree' as const,
      score: this.scoreByMutualCount(mutualCount),
      metadata: {
        candidateId: fofId,
        mutualCount,
        sampleMutuals: mutualList.get(fofId)!.slice(0, 5),
      },
    }));
  }
 
  // Colleagues: Same company, may not be connected
  private async getSameCompanyCandidates(
    memberId: string,
    existing: Set<string>
  ): Promise<CandidateSource[]> {
    const member = await this.memberStore.getMember(memberId);
    const candidates: CandidateSource[] = [];
 
    // Current company (highest priority)
    if (member.currentCompanyId) {
      const colleagues = await this.contextStore.getCompanyMembers(
        member.currentCompanyId,
        this.LIMITS.sameCompany
      );
 
      for (const colleague of colleagues) {
        if (existing.has(colleague.id) || colleague.id === memberId) continue;
 
        candidates.push({
          type: 'same_company',
          score: this.scoreByOverlap(member, colleague, 'current'),
          metadata: {
            candidateId: colleague.id,
            companyId: member.currentCompanyId,
            companyName: member.currentCompanyName,
            isCurrent: true,
          },
        });
      }
    }
 
    // Past companies (lower priority)
    for (const pastPosition of member.pastPositions || []) {
      const formerColleagues = await this.contextStore.getCompanyMembers(
        pastPosition.companyId,
        1000  // Smaller limit for past companies
      );
 
      for (const colleague of formerColleagues) {
        if (existing.has(colleague.id) || colleague.id === memberId) continue;
 
        candidates.push({
          type: 'same_company',
          score: this.scoreByOverlap(member, colleague, 'past') * 0.7,
          metadata: {
            candidateId: colleague.id,
            companyId: pastPosition.companyId,
            companyName: pastPosition.companyName,
            isCurrent: false,
          },
        });
      }
    }
 
    return candidates.slice(0, this.LIMITS.sameCompany);
  }
 
  // Collaborative filtering: users who connected with similar people
  private async getCollaborativeCandidates(
    memberId: string,
    existing: Set<string>
  ): Promise<CandidateSource[]> {
    // Get similar users via item-based collaborative filtering
    // "Users who connected with your connections also connected with..."
    const similarUsers = await this.cfModel.getSimilarUsers(
      memberId,
      this.LIMITS.collaborative
    );
 
    return similarUsers
      .filter(u => !existing.has(u.userId) && u.userId !== memberId)
      .map(u => ({
        type: 'collaborative' as const,
        score: u.similarity,
        metadata: {
          candidateId: u.userId,
          similarity: u.similarity,
          sharedConnectionPattern: u.explanation,
        },
      }));
  }
 
  // Aggregate candidates from multiple sources
  private aggregateCandidates(sources: CandidateSource[]): CandidateScore[] {
    const byMember = new Map<string, CandidateSource[]>();
 
    for (const source of sources) {
      const candidateId = source.metadata.candidateId;
      if (!byMember.has(candidateId)) {
        byMember.set(candidateId, []);
      }
      byMember.get(candidateId)!.push(source);
    }
 
    // Aggregate scores with diminishing returns for multiple sources
    const aggregated: CandidateScore[] = [];
 
    for (const [memberId, memberSources] of byMember) {
      // Sort by score
      memberSources.sort((a, b) => b.score - a.score);
 
      // Primary source gets full score, others get diminishing weight
      let aggregateScore = 0;
      for (let i = 0; i < memberSources.length; i++) {
        const weight = 1 / (i + 1);  // 1, 0.5, 0.33, ...
        aggregateScore += memberSources[i].score * weight;
      }
 
      aggregated.push({
        memberId,
        sources: memberSources,
        aggregateScore,
      });
    }
 
    return aggregated.sort((a, b) => b.aggregateScore - a.aggregateScore);
  }
 
  private scoreByMutualCount(count: number): number {
    // Logarithmic scaling: diminishing returns for many mutuals
    // 1 mutual: 0.2, 5 mutuals: 0.5, 20 mutuals: 0.8, 50+: ~1.0
    return Math.min(1.0, Math.log(count + 1) / Math.log(50));
  }
}

Feature Engineering for Professional Networks

The ranking model's effectiveness depends heavily on feature engineering. Professional networks have unique feature categories that capture both topological and semantic signals.

Feature Categories for PYMK Ranking
Category	Example Features	Signal Type
Graph Proximity	Mutual connection count, path length, Jaccard similarity	Structural
Professional Context	Same company, overlapping tenure, same school, shared groups	Semantic
Profile Similarity	Title similarity, skill overlap, industry match, seniority match	Content
Behavioral	Profile views, content engagement, message history	Interaction
Temporal	Recency of connection, activity freshness, account age	Time-based
User-level	Connection count, activity level, network quality	Quality

Feature Extraction
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
interface PYMKFeatures {
  // Graph proximity features (12 features)
  mutualConnectionCount: number;
  mutualConnectionRatio: number;  // mutual / max(connections_a, connections_b)
  jaccardSimilarity: number;      // |A ∩ B| / |A ∪ B|
  adamicAdarScore: number;        // Sum of 1/log(degree) for mutual friends
  commonNeighbors2ndDegree: number;
  shortestPathLength: number;
  pathCount: number;              // Number of shortest paths
  triangleClosure: number;        // Would this connection close triangles?
  clusteringCoefficient: number;
  localBridgeScore: number;       // Does this connect different communities?
  preferentialAttachment: number; // degree_a * degree_b
  resourceAllocation: number;     // Sum of 1/degree for mutual friends
 
  // Professional context features (20+ features)
  sameCurrentCompany: boolean;
  samePastCompany: boolean;
  companyOverlapDays: number;
  sameSchool: boolean;
  schoolOverlapYears: number;
  sameIndustry: boolean;
  sameFunction: boolean;          // Engineering, Sales, Marketing, etc.
  sharedGroupCount: number;
  sharedEventCount: number;
  sharedInterestCount: number;
  geographicDistance: number;
  sameLocation: boolean;
 
  // Profile similarity features (15+ features)
  titleSimilarity: number;        // Semantic similarity of job titles
  skillOverlapRatio: number;
  skillWeightedOverlap: number;   // Weighted by skill rarity
  experienceYearsDiff: number;
  senioritySimilarity: number;    // Entry, mid, senior, exec
  industryPath: string[];         // Shared career industries
  careerTrajectoryMatch: number;
  educationLevelMatch: boolean;
  languageOverlap: number;
  contentTopicSimilarity: number;
 
  // Behavioral features (10+ features)
  viewerViewedCandidate: boolean;
  candidateViewedViewer: boolean;
  daysSinceProfileView: number;
  sharedContentEngagements: number;
  messagedInPast: boolean;
  connectionRequestHistory: 'none' | 'sent' | 'received' | 'rejected';
  searchedForCandidate: boolean;
  
  // Temporal features
  viewerAccountAgeMonths: number;
  candidateAccountAgeMonths: number;
  viewerConnectionRate: number;   // New connections per month
  candidateConnectionRate: number;
  daysSinceCandidateActive: number;
  
  // User quality features
  viewerConnectionCount: number;
  candidateConnectionCount: number;
  candidateProfileCompleteness: number;
  candidateEndorsementCount: number;
  candidateRecommendationCount: number;
  candidateContentEngagementRate: number;
  candidateResponseRate: number;  // Reply rate to messages/requests
}
 
class FeatureExtractor {
  async extractFeatures(
    viewerId: string,
    candidateId: string,
    candidateScore: CandidateScore
  ): Promise<PYMKFeatures> {
    const [
      viewer,
      candidate,
      graphFeatures,
      contextFeatures,
      behaviorFeatures,
    ] = await Promise.all([
      this.memberStore.getMember(viewerId),
      this.memberStore.getMember(candidateId),
      this.extractGraphFeatures(viewerId, candidateId, candidateScore),
      this.extractContextFeatures(viewerId, candidateId),
      this.extractBehaviorFeatures(viewerId, candidateId),
    ]);
 
    const similarityFeatures = await this.extractSimilarityFeatures(
      viewer, candidate
    );
 
    const temporalFeatures = this.extractTemporalFeatures(viewer, candidate);
    const qualityFeatures = this.extractQualityFeatures(viewer, candidate);
 
    return {
      ...graphFeatures,
      ...contextFeatures,
      ...behaviorFeatures,
      ...similarityFeatures,
      ...temporalFeatures,
      ...qualityFeatures,
    };
  }
 
  private async extractGraphFeatures(
    viewerId: string,
    candidateId: string,
    candidateScore: CandidateScore
  ): Promise<Partial<PYMKFeatures>> {
    const [viewerConns, candidateConns] = await Promise.all([
      this.graphStore.getConnectionSet(viewerId),
      this.graphStore.getConnectionSet(candidateId),
    ]);
 
    // Mutual connections
    const mutuals = this.intersection(viewerConns, candidateConns);
    const union = this.union(viewerConns, candidateConns);
 
    // Adamic-Adar: Sum of 1/log(degree) for mutual connections
    // Gives more weight to mutuals with fewer connections (more specific signal)
    let adamicAdar = 0;
    for (const mutual of mutuals) {
      const mutualDegree = await this.graphStore.getConnectionCount(mutual);
      if (mutualDegree > 1) {
        adamicAdar += 1 / Math.log(mutualDegree);
      }
    }
 
    // Resource Allocation: Similar but 1/degree instead of 1/log(degree)
    let resourceAllocation = 0;
    for (const mutual of mutuals) {
      const mutualDegree = await this.graphStore.getConnectionCount(mutual);
      if (mutualDegree > 0) {
        resourceAllocation += 1 / mutualDegree;
      }
    }
 
    return {
      mutualConnectionCount: mutuals.size,
      mutualConnectionRatio: mutuals.size / Math.max(viewerConns.size, candidateConns.size),
      jaccardSimilarity: mutuals.size / union.size,
      adamicAdarScore: adamicAdar,
      resourceAllocation,
      preferentialAttachment: viewerConns.size * candidateConns.size,
      shortestPathLength: candidateScore.sources.find(s => s.type === 'second_degree')
        ? 2 : 3,
    };
  }
 
  private extractSimilarityFeatures(
    viewer: Member,
    candidate: Member
  ): Partial<PYMKFeatures> {
    // Title similarity using TF-IDF or embeddings
    const titleSimilarity = this.computeTitleSimilarity(
      viewer.headline,
      candidate.headline
    );
 
    // Skill overlap
    const viewerSkills = new Set(viewer.skills || []);
    const candidateSkills = new Set(candidate.skills || []);
    const skillOverlap = this.intersection(viewerSkills, candidateSkills);
    
    // Weight by skill rarity (rare shared skills are more meaningful)
    const skillWeightedOverlap = this.computeWeightedSkillOverlap(
      viewer.skills || [],
      candidate.skills || []
    );
 
    // Seniority matching
    const senioritySimilarity = this.computeSenioritySimilarity(
      viewer.seniority,
      candidate.seniority
    );
 
    return {
      titleSimilarity,
      skillOverlapRatio: skillOverlap.size / Math.max(viewerSkills.size, candidateSkills.size),
      skillWeightedOverlap,
      experienceYearsDiff: Math.abs(
        (viewer.totalExperienceYears || 0) - (candidate.totalExperienceYears || 0)
      ),
      senioritySimilarity,
      sameIndustry: viewer.industry === candidate.industry,
    };
  }
 
  private computeWeightedSkillOverlap(
    skills1: string[],
    skills2: string[]
  ): number {
    let score = 0;
    const skills2Set = new Set(skills2);
 
    for (const skill of skills1) {
      if (skills2Set.has(skill)) {
        // Weight by inverse document frequency (rarer skills = higher weight)
        const idf = this.skillIDF.get(skill) || 1.0;
        score += idf;
      }
    }
 
    return score;
  }
}

Feature Importance Insight

Ranking Models

The ranking model takes extracted features and predicts the probability that showing a recommendation will result in a desired action (connection request, acceptance, or meaningful interaction).

Evolution of PYMK Ranking Models:

Generation 1: Handcrafted Rules

Simple weighted combination of mutual count, same company, etc.
Easy to interpret but misses complex interactions
Example: score = 0.5 * mutualCount + 0.3 * sameCompany + 0.2 * schoolMatch

Generation 2: Linear Models (Logistic Regression)

Learned weights from historical data
Still interpretable, handles feature interactions via feature engineering
Struggles with non-linear patterns

Generation 3: Tree-Based Models (GBDT, XGBoost)

Captures non-linear relationships and feature interactions
Robust to feature scaling, handles missing values
LinkedIn's production model for many years

Generation 4: Deep Learning

Neural networks with embeddings for categorical features
Can learn complex patterns from raw inputs
Requires more data and compute, harder to interpret

Ranking Model Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// Gradient Boosted Decision Tree model for PYMK ranking
interface RankingModel {
  predict(features: PYMKFeatures): number;  // Returns probability [0, 1]
  predictBatch(features: PYMKFeatures[]): number[];
}
 
class GBDTRankingModel implements RankingModel {
  private model: XGBoostModel;
  private featureNames: string[];
 
  constructor(modelPath: string) {
    this.model = XGBoostModel.load(modelPath);
    this.featureNames = this.model.getFeatureNames();
  }
 
  predict(features: PYMKFeatures): number {
    const featureVector = this.toVector(features);
    const logit = this.model.predictLogit(featureVector);
    return this.sigmoid(logit);
  }
 
  predictBatch(features: PYMKFeatures[]): number[] {
    const vectors = features.map(f => this.toVector(f));
    const logits = this.model.predictLogitBatch(vectors);
    return logits.map(l => this.sigmoid(l));
  }
 
  private toVector(features: PYMKFeatures): number[] {
    return this.featureNames.map(name => {
      const value = features[name as keyof PYMKFeatures];
      if (typeof value === 'boolean') return value ? 1 : 0;
      if (typeof value === 'number') return value;
      return 0;  // Handle missing
    });
  }
 
  private sigmoid(x: number): number {
    return 1 / (1 + Math.exp(-x));
  }
}
 
// Deep learning model with embeddings
class NeuralRankingModel implements RankingModel {
  private model: TensorFlowModel;
  private embeddings: {
    company: Embedding;
    school: Embedding;
    title: Embedding;
    skills: MultiHotEmbedding;
  };
 
  predict(features: PYMKFeatures): number {
    // Embed categorical features
    const companyEmb = this.embeddings.company.lookup(features.candidateCompanyId);
    const schoolEmb = this.embeddings.school.lookup(features.candidateSchoolId);
    const titleEmb = this.embeddings.title.lookup(features.candidateTitle);
    const skillsEmb = this.embeddings.skills.lookup(features.candidateSkills);
 
    // Numeric features
    const numericFeatures = this.extractNumericFeatures(features);
 
    // Concatenate and feed through network
    const input = tf.concat([
      companyEmb,
      schoolEmb,
      titleEmb,
      skillsEmb,
      numericFeatures,
    ]);
 
    const output = this.model.predict(input);
    return output.dataSync()[0];
  }
}
 
// Ranking service that orchestrates model inference
class RankingService {
  private model: RankingModel;
  private featureExtractor: FeatureExtractor;
  private cache: RankingCache;
 
  async rankCandidates(
    viewerId: string,
    candidates: CandidateScore[],
    limit: number = 100
  ): Promise<RankedCandidate[]> {
    // Extract features in parallel batches
    const batchSize = 100;
    const allFeatures: PYMKFeatures[] = [];
 
    for (let i = 0; i < candidates.length; i += batchSize) {
      const batch = candidates.slice(i, i + batchSize);
      const batchFeatures = await Promise.all(
        batch.map(c => this.featureExtractor.extractFeatures(viewerId, c.memberId, c))
      );
      allFeatures.push(...batchFeatures);
    }
 
    // Batch prediction
    const scores = this.model.predictBatch(allFeatures);
 
    // Combine with candidates and sort
    const ranked: RankedCandidate[] = candidates.map((c, i) => ({
      candidateId: c.memberId,
      sources: c.sources,
      modelScore: scores[i],
      features: allFeatures[i],
    }));
 
    ranked.sort((a, b) => b.modelScore - a.modelScore);
 
    return ranked.slice(0, limit);
  }
}
 
interface RankedCandidate {
  candidateId: string;
  sources: CandidateSource[];
  modelScore: number;
  features: PYMKFeatures;
}

Multi-Objective Optimization

Diversity and Exploration

Diversity Dimensions

•Source Diversity — Don't show only 2nd-degree; include colleagues, alumni, similar profiles.
•Company Diversity — Recommendations from multiple companies, not just current employer.
•Connection Count Diversity — Mix of popular users (credibility) and less-connected (reciprocity).
•Recency Diversity — Include both recent joiners and established members.
•Industry Diversity — Cross-industry connections for career pivots.
•Seniority Diversity — Peers, mentors, and mentees.

Diversity Optimization
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
class DiversityOptimizer {
  // Maximal Marginal Relevance (MMR) for diverse recommendations
  selectWithDiversity(
    candidates: RankedCandidate[],
    count: number,
    lambda: number = 0.7  // Relevance vs diversity trade-off
  ): RankedCandidate[] {
    const selected: RankedCandidate[] = [];
    const remaining = new Set(candidates);
 
    while (selected.length < count && remaining.size > 0) {
      let bestCandidate: RankedCandidate | null = null;
      let bestMMR = -Infinity;
 
      for (const candidate of remaining) {
        // MMR = λ * relevance - (1 - λ) * max_similarity_to_selected
        const relevance = candidate.modelScore;
        const maxSimilarity = selected.length > 0
          ? Math.max(...selected.map(s => this.similarity(candidate, s)))
          : 0;
 
        const mmr = lambda * relevance - (1 - lambda) * maxSimilarity;
 
        if (mmr > bestMMR) {
          bestMMR = mmr;
          bestCandidate = candidate;
        }
      }
 
      if (bestCandidate) {
        selected.push(bestCandidate);
        remaining.delete(bestCandidate);
      }
    }
 
    return selected;
  }
 
  // Similarity based on multiple dimensions
  private similarity(a: RankedCandidate, b: RankedCandidate): number {
    const sameCompany = a.features.candidateCompanyId === b.features.candidateCompanyId ? 0.3 : 0;
    const sameIndustry = a.features.sameIndustry === b.features.sameIndustry ? 0.2 : 0;
    const connectionCountDiff = 1 - Math.abs(
      a.features.candidateConnectionCount - b.features.candidateConnectionCount
    ) / 1000;
    const sourceOverlap = this.sourceOverlap(a.sources, b.sources);
 
    return sameCompany + sameIndustry + connectionCountDiff * 0.2 + sourceOverlap * 0.3;
  }
 
  private sourceOverlap(a: CandidateSource[], b: CandidateSource[]): number {
    const aTypes = new Set(a.map(s => s.type));
    const bTypes = new Set(b.map(s => s.type));
    const intersection = [...aTypes].filter(t => bTypes.has(t)).length;
    return intersection / Math.max(aTypes.size, bTypes.size);
  }
}
 
// Exploration via Thompson Sampling
class ExplorationStrategy {
  // Each candidate has a Beta distribution for CTR
  private beta: Map<string, { alpha: number; beta: number }> = new Map();
 
  // Thompson Sampling: Sample from posterior, select highest
  selectWithExploration(
    candidates: RankedCandidate[],
    count: number
  ): RankedCandidate[] {
    const sampled = candidates.map(c => ({
      candidate: c,
      sample: this.thompsonSample(c.candidateId, c.modelScore),
    }));
 
    sampled.sort((a, b) => b.sample - a.sample);
 
    return sampled.slice(0, count).map(s => s.candidate);
  }
 
  private thompsonSample(candidateId: string, prior: number): number {
    // Get or initialize Beta distribution
    let params = this.beta.get(candidateId);
    if (!params) {
      // Initialize with weak prior based on model score
      const pseudoSuccess = prior * 10;
      const pseudoFailure = (1 - prior) * 10;
      params = { alpha: 1 + pseudoSuccess, beta: 1 + pseudoFailure };
      this.beta.set(candidateId, params);
    }
 
    // Sample from Beta(alpha, beta)
    return this.sampleBeta(params.alpha, params.beta);
  }
 
  // Record observation to update posterior
  recordOutcome(candidateId: string, wasPositive: boolean): void {
    const params = this.beta.get(candidateId);
    if (params) {
      if (wasPositive) {
        params.alpha += 1;
      } else {
        params.beta += 1;
      }
    }
  }
 
  private sampleBeta(alpha: number, beta: number): number {
    // Approximate Beta sampling
    const x = this.sampleGamma(alpha, 1);
    const y = this.sampleGamma(beta, 1);
    return x / (x + y);
  }
 
  private sampleGamma(shape: number, scale: number): number {
    // Implementation of Gamma sampling
    // ... (standard algorithm)
    return 0; // Placeholder
  }
}

The Cold Start Problem

Evaluation and Metrics

Recommendation system success must be measured carefully. Wrong metrics lead to optimizing for the wrong outcomes.

PYMK Evaluation Metrics
Metric	Definition	Purpose	Caveats
CTR	Clicks / Impressions	Engagement measurement	Clickbait can inflate
Request Rate	Requests / Impressions	Intent measurement	Doesn't measure quality
Accept Rate	Accepts / Requests	Relevance measurement	Biased by recipient activity
Network Growth	New connections / User / Day	Business outcome	Can be gamed
Message Rate	Messages / New Connection	Connection quality	Long-term metric
Retention Impact	User retention vs control	Overall value	Hard to attribute

Recommendation Evaluation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
class PYMKMetrics {
  // Offline evaluation metrics
  calculateOfflineMetrics(
    predictions: Array<{ predicted: number; actual: boolean }>,
    atK: number[] = [5, 10, 20]
  ): OfflineMetrics {
    // Sort by predicted score
    predictions.sort((a, b) => b.predicted - a.predicted);
 
    return {
      auc: this.calculateAUC(predictions),
      precision: Object.fromEntries(
        atK.map(k => [k, this.precisionAtK(predictions, k)])
      ),
      recall: Object.fromEntries(
        atK.map(k => [k, this.recallAtK(predictions, k)])
      ),
      ndcg: Object.fromEntries(
        atK.map(k => [k, this.ndcgAtK(predictions, k)])
      ),
      mrr: this.meanReciprocalRank(predictions),
    };
  }
 
  // Normalized Discounted Cumulative Gain
  private ndcgAtK(
    predictions: Array<{ predicted: number; actual: boolean }>,
    k: number
  ): number {
    const topK = predictions.slice(0, k);
    
    // DCG = sum((2^rel - 1) / log2(position + 1))
    const dcg = topK.reduce((sum, item, index) => {
      const relevance = item.actual ? 1 : 0;
      const discount = Math.log2(index + 2);  // log2(2) = 1 for position 1
      return sum + (Math.pow(2, relevance) - 1) / discount;
    }, 0);
 
    // Ideal DCG (all relevant items first)
    const relevantCount = predictions.filter(p => p.actual).length;
    const idealK = Math.min(k, relevantCount);
    let idcg = 0;
    for (let i = 0; i < idealK; i++) {
      idcg += (Math.pow(2, 1) - 1) / Math.log2(i + 2);
    }
 
    return idcg > 0 ? dcg / idcg : 0;
  }
 
  // Online A/B test metrics
  calculateOnlineMetrics(
    control: ExperimentData,
    treatment: ExperimentData
  ): ABTestResult {
    const ctr = this.compareRates(
      treatment.clicks / treatment.impressions,
      control.clicks / control.impressions,
      treatment.impressions,
      control.impressions
    );
 
    const requestRate = this.compareRates(
      treatment.requests / treatment.impressions,
      control.requests / control.impressions,
      treatment.impressions,
      control.impressions
    );
 
    const acceptRate = this.compareRates(
      treatment.accepts / treatment.requests,
      control.accepts / control.requests,
      treatment.requests,
      control.requests
    );
 
    const connectionGrowth = this.compareRates(
      treatment.newConnections / treatment.users,
      control.newConnections / control.users,
      treatment.users,
      control.users
    );
 
    return {
      ctr,
      requestRate,
      acceptRate,
      connectionGrowth,
      recommendation: this.makeRecommendation(
        ctr, requestRate, acceptRate, connectionGrowth
      ),
    };
  }
 
  private compareRates(
    treatmentRate: number,
    controlRate: number,
    treatmentN: number,
    controlN: number
  ): MetricComparison {
    const pooledRate = (treatmentRate * treatmentN + controlRate * controlN) 
                       / (treatmentN + controlN);
    const se = Math.sqrt(pooledRate * (1 - pooledRate) * 
               (1/treatmentN + 1/controlN));
    const zScore = (treatmentRate - controlRate) / se;
    const pValue = 2 * (1 - this.normalCDF(Math.abs(zScore)));
 
    return {
      treatment: treatmentRate,
      control: controlRate,
      lift: (treatmentRate - controlRate) / controlRate,
      pValue,
      significant: pValue < 0.05,
    };
  }
}
 
interface OfflineMetrics {
  auc: number;
  precision: Record<number, number>;
  recall: Record<number, number>;
  ndcg: Record<number, number>;
  mrr: number;
}
 
interface MetricComparison {
  treatment: number;
  control: number;
  lift: number;
  pValue: number;
  significant: boolean;
}

Summary: Recommendation Algorithms

We've explored the complete recommendation pipeline for professional networks. Let's consolidate the key insights:

Key Takeaways

•Multi-stage pipeline is essential — Candidate generation (fast/approximate) → Filtering → Ranking (accurate/expensive) → Post-processing.
•Multiple candidate sources — 2nd-degree network, shared context, contacts, similar profiles, and collaborative filtering each capture different relevant connections.
•Feature engineering matters — Graph proximity, professional context, profile similarity, and behavioral signals all contribute unique predictive value.
•Model choice depends on scale — GBDT/XGBoost is the workhorse for most cases; deep learning adds value for embedding categorical features and learning complex patterns.
•Diversity prevents filter bubbles — MMR and Thompson Sampling ensure users see varied recommendations, not just the highest-ranked similar candidates.
•Cold start needs special handling — New users require different strategies: contact import, profile-based matching, and popular connections in their company/school.
•Composite metrics measure true value — Optimize for clicks × requests × accepts × engagement, not just any single metric.

Up Next: Scaling Social Graphs

4 / 5