LinkedIn Connections - Learning Module

Loading content...

0/273

Scaling Social Graphs

The Scale of Professional Networks

LinkedIn processes tens of billions of graph operations daily across a network of 900+ million members and 50+ billion connections. This isn't just about storing large amounts of data—it's about maintaining sub-second latency while the graph grows by millions of new connections per day, handling 65+ million daily active users, and ensuring zero data loss for professional relationships that matter to careers.

Scaling a social graph is uniquely challenging because:

Interconnected data — You can't isolate a user's data; it's connected to thousands of others
Fan-out operations — A single action (new connection) triggers updates across caches, feeds, and recommendations
Real-time expectations — Users expect immediate visibility of new connections
High read-to-write ratio — Writes trigger exponentially more reads

This page explores the architecture and techniques that enable professional networks to scale reliably.

What You Will Learn

By the end of this page, you will understand horizontal scaling strategies for graph data, sharding approaches and their trade-offs, consistency vs availability decisions, real-time update propagation, operational practices for reliability, and capacity planning for growth.

Horizontal Scaling Architecture

Horizontal scaling—adding more machines rather than bigger machines—is the only viable approach for social graphs at LinkedIn scale. The architecture must distribute data and computation across thousands of servers while presenting a unified view to applications.

Converting Mermaid diagram...

Scaling Components
Component	Scaling Strategy	Typical Scale	Key Challenge
API Servers	Stateless, horizontal	1000+ instances	Session affinity if needed
Connection Service	Stateless, horizontal	500+ instances	Database connection pooling
Database Shards	Horizontal partitioning	1000+ shards	Cross-shard queries
Cache Cluster	Consistent hashing	1000+ nodes	Cache coherence
Event Queue	Partitioned topics	100+ partitions	Ordering guarantees
Worker Pool	Horizontal, autoscaled	Variable	Backpressure handling

Service Scaling Configuration
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// Connection Service scaling configuration
interface ServiceConfig {
  // Horizontal scaling
  minInstances: number;
  maxInstances: number;
  targetCPU: number;           // Target CPU utilization %
  targetRPS: number;           // Target requests per second per instance
  
  // Connection pooling
  dbConnectionsPerInstance: number;
  cacheConnectionsPerInstance: number;
  
  // Circuit breakers
  circuitBreakerThreshold: number;
  circuitBreakerTimeout: number;
  
  // Rate limiting
  rateLimitPerUser: number;
  rateLimitBurst: number;
}
 
const connectionServiceConfig: ServiceConfig = {
  // Scale between 200-800 instances based on load
  minInstances: 200,
  maxInstances: 800,
  targetCPU: 60,              // Keep headroom for spikes
  targetRPS: 5000,            // 5K RPS per instance
  
  // Each instance maintains pools
  dbConnectionsPerInstance: 50,
  cacheConnectionsPerInstance: 100,
  
  // Fail fast on downstream issues
  circuitBreakerThreshold: 0.5,   // 50% error rate trips
  circuitBreakerTimeout: 30000,   // 30s before retry
  
  // Protect against abuse
  rateLimitPerUser: 100,          // 100 ops/minute
  rateLimitBurst: 50,             // Allow bursts
};
 
// Autoscaling logic
class AutoScaler {
  async evaluateScale(metrics: ServiceMetrics): Promise<ScaleDecision> {
    const currentInstances = metrics.instanceCount;
    
    // Scale up triggers
    if (metrics.avgCPU > 75 || metrics.avgLatencyP99 > 500) {
      const newCount = Math.min(
        currentInstances * 1.5,
        connectionServiceConfig.maxInstances
      );
      return { action: 'scale_up', targetInstances: Math.ceil(newCount) };
    }
    
    // Scale down triggers (conservative)
    if (metrics.avgCPU < 30 && metrics.avgLatencyP99 < 100) {
      const newCount = Math.max(
        currentInstances * 0.8,
        connectionServiceConfig.minInstances
      );
      return { action: 'scale_down', targetInstances: Math.floor(newCount) };
    }
    
    return { action: 'maintain', targetInstances: currentInstances };
  }
}

Data Sharding Strategies

Sharding the connection graph across multiple databases is essential but challenging. Unlike independent records, connections create dependencies between users that complicate partitioning.

Sharding Approaches

•User-based Sharding — Each user's data lives on one shard. Shard = hash(userId) mod N. Simple but edges span shards.
•Edge-based Sharding — Edges stored on both endpoints' shards (replication). Enables local reads but doubles writes.
•Graph-aware Partitioning — Cluster connected users on same shards. Minimizes cross-shard edges but expensive to maintain.
•Temporal Sharding — Hot data on fast storage, cold data on cheaper storage. Good for time-based access patterns.

Sharding Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// User-based sharding with edge replication
class ShardRouter {
  private shardCount: number;
  private shardConnections: Map<number, DatabasePool>;
  private consistentHash: ConsistentHashRing;
 
  constructor(shardCount: number) {
    this.shardCount = shardCount;
    this.consistentHash = new ConsistentHashRing(shardCount, 150); // 150 virtual nodes
  }
 
  // Determine shard for a user
  getShardForUser(userId: string): number {
    return this.consistentHash.getNode(userId);
  }
 
  // Get shards involved in a connection
  getShardsForConnection(userId1: string, userId2: string): number[] {
    const shard1 = this.getShardForUser(userId1);
    const shard2 = this.getShardForUser(userId2);
    
    if (shard1 === shard2) {
      return [shard1];  // Lucky - same shard
    }
    return [shard1, shard2];  // Cross-shard connection
  }
 
  // Query that must touch multiple shards
  async queryAcrossShards<T>(
    userIds: string[],
    query: (db: DatabasePool, userIds: string[]) => Promise<T[]>
  ): Promise<T[]> {
    // Group by shard
    const byShards = new Map<number, string[]>();
    for (const userId of userIds) {
      const shard = this.getShardForUser(userId);
      if (!byShards.has(shard)) byShards.set(shard, []);
      byShards.get(shard)!.push(userId);
    }
 
    // Parallel queries to each shard
    const results = await Promise.all(
      Array.from(byShards.entries()).map(([shard, ids]) => 
        query(this.shardConnections.get(shard)!, ids)
      )
    );
 
    return results.flat();
  }
}
 
// Connection write with edge replication
class ConnectionWriter {
  private router: ShardRouter;
  private eventPublisher: EventPublisher;
 
  async createConnection(
    userId1: string,
    userId2: string,
    metadata: ConnectionMetadata
  ): Promise<void> {
    const shards = this.router.getShardsForConnection(userId1, userId2);
    const connectionId = this.generateConnectionId(userId1, userId2);
    
    const connection = {
      id: connectionId,
      userId1: this.canonicalFirst(userId1, userId2),
      userId2: this.canonicalSecond(userId1, userId2),
      connectedAt: new Date(),
      ...metadata,
    };
 
    if (shards.length === 1) {
      // Same shard - single write
      await this.writeSingleShard(shards[0], connection);
    } else {
      // Cross-shard - need distributed transaction or eventual consistency
      await this.writeCrossShard(shards, connection);
    }
 
    // Publish event for async processing
    await this.eventPublisher.publish('connection.created', {
      connectionId,
      userId1,
      userId2,
      timestamp: Date.now(),
    });
  }
 
  // Two-phase commit for cross-shard writes
  private async writeCrossShard(
    shards: number[],
    connection: Connection
  ): Promise<void> {
    const txId = this.generateTxId();
    
    try {
      // Phase 1: Prepare on both shards
      await Promise.all(shards.map(shard => 
        this.prepare(shard, txId, connection)
      ));
 
      // Phase 2: Commit on both
      await Promise.all(shards.map(shard =>
        this.commit(shard, txId)
      ));
    } catch (error) {
      // Rollback on failure
      await Promise.all(shards.map(shard =>
        this.rollback(shard, txId).catch(() => {})
      ));
      throw error;
    }
  }
 
  // Alternative: Saga pattern with compensating actions
  private async writeCrossShardSaga(
    shards: number[],
    connection: Connection
  ): Promise<void> {
    const completedShards: number[] = [];
 
    try {
      for (const shard of shards) {
        await this.writeSingleShard(shard, connection);
        completedShards.push(shard);
      }
    } catch (error) {
      // Compensate: Delete from completed shards
      for (const shard of completedShards) {
        await this.deleteSingleShard(shard, connection.id).catch(() => {});
      }
      throw error;
    }
  }
}

The Resharding Challenge

Adding or removing shards requires moving large amounts of data while serving live traffic. Consistent hashing minimizes movement, but even moving 1% of 50 billion edges means relocating 500 million records. LinkedIn uses gradual migration with dual-writes and careful cutover procedures.

Consistency vs Availability Trade-offs

The CAP theorem forces distributed systems to choose between consistency and availability during network partitions. For social graphs, different operations have different requirements.

Consistency Requirements by Operation
Operation	Consistency Need	Availability Need	Strategy
Connection Create	Strong (no duplicates)	High	Synchronous replication, retry on failure
Connection Read (own)	Strong	Very High	Read from primary, cache heavily
Connection Read (others)	Eventual OK	Very High	Read from replica, stale OK
PYMK	Eventual	High	Precomputed, hourly refresh OK
Mutual Connections	Near-real-time	High	Async update, 5s delay OK
Connection Count	Eventual	Very High	Cached, periodic refresh

Consistency Strategies
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// Different consistency levels for different operations
enum ConsistencyLevel {
  STRONG = 'strong',           // Read from primary, synchronous write
  SESSION = 'session',         // Consistent within user's session
  EVENTUAL = 'eventual',       // Read from any replica
  CAUSAL = 'causal',          // Respects causal ordering
}
 
class ConnectionStore {
  private primary: Database;
  private replicas: Database[];
  private cache: CacheCluster;
 
  // Strong consistency for mutations
  async createConnection(
    conn: ConnectionCreate
  ): Promise<Connection> {
    // Write to primary with synchronous replication to at least one replica
    const result = await this.primary.insert('connections', conn, {
      replicationFactor: 2,
      waitForReplication: true,
    });
 
    // Invalidate caches immediately
    await this.invalidateCaches(conn.userId1, conn.userId2);
 
    return result;
  }
 
  // Read with session consistency for own data
  async getMyConnections(
    userId: string,
    sessionToken: string
  ): Promise<Connection[]> {
    // Check if user recently made changes
    const lastWriteTime = await this.getLastWriteTime(userId, sessionToken);
    
    if (lastWriteTime && Date.now() - lastWriteTime < 5000) {
      // Recent write - read from primary to see own changes
      return this.primary.query(
        'SELECT * FROM connections WHERE user_id = ?',
        [userId]
      );
    }
 
    // No recent writes - replica is fine
    return this.readFromReplica(userId);
  }
 
  // Eventual consistency for others' data
  async getOtherUserConnections(
    viewerId: string,
    targetId: string
  ): Promise<Connection[]> {
    // Try cache first
    const cached = await this.cache.get(`connections:${targetId}`);
    if (cached) return JSON.parse(cached);
 
    // Read from any replica
    const connections = await this.readFromReplica(targetId);
    
    // Cache with short TTL
    await this.cache.setex(
      `connections:${targetId}`,
      300,  // 5 minutes
      JSON.stringify(connections)
    );
 
    return connections;
  }
 
  private async readFromReplica(userId: string): Promise<Connection[]> {
    // Round-robin or least-loaded replica
    const replica = this.selectReplica();
    return replica.query(
      'SELECT * FROM connections WHERE user_id = ?',
      [userId]
    );
  }
 
  // Handle split-brain scenarios
  async reconcileAfterPartition(
    shard1Data: Connection[],
    shard2Data: Connection[]
  ): Promise<void> {
    // Use timestamp-based last-write-wins for conflict resolution
    const byId = new Map<string, Connection>();
    
    for (const conn of [...shard1Data, ...shard2Data]) {
      const existing = byId.get(conn.id);
      if (!existing || conn.updatedAt > existing.updatedAt) {
        byId.set(conn.id, conn);
      }
    }
 
    // Write reconciled state to both shards
    const reconciled = Array.from(byId.values());
    await Promise.all([
      this.primary.bulkUpsert('connections', reconciled),
      this.replicas[0].bulkUpsert('connections', reconciled),
    ]);
  }
}
 
// Causal consistency for distributed updates
class CausalConsistencyManager {
  private vectorClocks: Map<string, VectorClock> = new Map();
 
  // Track causality for connection updates
  recordUpdate(
    userId: string,
    operation: Operation
  ): VectorClock {
    let clock = this.vectorClocks.get(userId);
    if (!clock) {
      clock = new VectorClock();
      this.vectorClocks.set(userId, clock);
    }
 
    clock.increment(userId);
    return clock.copy();
  }
 
  // Ensure causal order on reads
  async waitForCausality(
    userId: string,
    expectedClock: VectorClock
  ): Promise<void> {
    const currentClock = this.vectorClocks.get(userId);
    
    while (!currentClock?.happensBefore(expectedClock)) {
      await sleep(10);
      // Timeout after 5s
    }
  }
}

Real-Time Update Propagation

When a connection is created, the update must propagate through multiple systems: caches, feeds, recommendations, search indexes, and more. This fan-out must be reliable, fast, and scalable.

Converting Mermaid diagram...

Event-Driven Update Propagation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
// Event schema for connection changes
interface ConnectionEvent {
  eventId: string;
  eventType: 'created' | 'removed';
  connectionId: string;
  userId1: string;
  userId2: string;
  timestamp: number;
  metadata: {
    source: string;    // 'pymk', 'search', etc.
    traceId: string;   // For distributed tracing
  };
}
 
// Publisher ensures at-least-once delivery
class ConnectionEventPublisher {
  private kafka: KafkaProducer;
  private topic = 'connection-events';
 
  async publish(event: ConnectionEvent): Promise<void> {
    // Partition by userId1 to maintain ordering per user
    const partition = this.getPartition(event.userId1);
    
    await this.kafka.send({
      topic: this.topic,
      partition,
      messages: [{
        key: event.userId1,
        value: JSON.stringify(event),
        headers: {
          eventType: event.eventType,
          timestamp: String(event.timestamp),
        },
      }],
    });
  }
 
  private getPartition(userId: string): number {
    return murmurhash(userId) % this.partitionCount;
  }
}
 
// Consumer workers process updates
class CacheInvalidationWorker {
  private redis: RedisCluster;
  private consumer: KafkaConsumer;
 
  async process(event: ConnectionEvent): Promise<void> {
    const { userId1, userId2 } = event;
 
    // Invalidate all affected caches
    const keysToInvalidate = [
      // Connection lists
      `connections:${userId1}`,
      `connections:${userId2}`,
      
      // Connection counts
      `connection_count:${userId1}`,
      `connection_count:${userId2}`,
      
      // Mutual connections for their mutual friends
      // (This is expensive - might do async/batch)
      ...await this.getMutualCacheKeys(userId1, userId2),
      
      // PYMK caches
      `pymk:${userId1}`,
      `pymk:${userId2}`,
    ];
 
    // Batch delete for efficiency
    await this.redis.del(...keysToInvalidate);
    
    // Optionally, pre-warm critical caches
    await this.prewarmCaches(userId1, userId2);
  }
 
  private async prewarmCaches(
    userId1: string,
    userId2: string
  ): Promise<void> {
    // Pre-fetch and cache connection lists
    // Prioritize if users are active
    const [activity1, activity2] = await Promise.all([
      this.getUserActivity(userId1),
      this.getUserActivity(userId2),
    ]);
 
    if (activity1.isActive) {
      this.backgroundPrewarm(userId1);
    }
    if (activity2.isActive) {
      this.backgroundPrewarm(userId2);
    }
  }
}
 
class RecommendationUpdateWorker {
  private recStore: RecommendationStore;
  private graphStore: GraphStore;
 
  async process(event: ConnectionEvent): Promise<void> {
    const { userId1, userId2, eventType } = event;
 
    if (eventType === 'created') {
      // Remove each other from PYMK (already connected)
      await Promise.all([
        this.recStore.removePYMKCandidate(userId1, userId2),
        this.recStore.removePYMKCandidate(userId2, userId1),
      ]);
 
      // Boost mutual friends in PYMK (common connections = more likely to connect)
      await this.boostMutualFriends(userId1, userId2);
    } else {
      // Connection removed - they might want to reconnect
      // Don't immediately re-add to PYMK (feels stalky)
      await this.scheduleForRecomputeLater(userId1, userId2);
    }
  }
 
  private async boostMutualFriends(
    userId1: string,
    userId2: string
  ): Promise<void> {
    // Get connections of userId1 who aren't connected to userId2
    // These are now 2nd-degree via the new connection
    const conn1 = await this.graphStore.getConnectionSet(userId1);
    const conn2 = await this.graphStore.getConnectionSet(userId2);
 
    // Connections of user1 not connected to user2
    const newSecondDegreeFor2 = [...conn1].filter(c => 
      !conn2.has(c) && c !== userId2
    );
 
    // Boost these in user2's PYMK
    for (const candidate of newSecondDegreeFor2.slice(0, 100)) {
      await this.recStore.boostPYMKScore(userId2, candidate, 0.1);
    }
 
    // Same for user1
    const newSecondDegreeFor1 = [...conn2].filter(c =>
      !conn1.has(c) && c !== userId1
    );
 
    for (const candidate of newSecondDegreeFor1.slice(0, 100)) {
      await this.recStore.boostPYMKScore(userId1, candidate, 0.1);
    }
  }
}
 
// Ensure exactly-once processing with idempotency
class IdempotentProcessor {
  private processedEvents: BloomFilter;
  private processedStore: KeyValueStore;
 
  async processOnce(
    event: ConnectionEvent,
    handler: (e: ConnectionEvent) => Promise<void>
  ): Promise<void> {
    // Quick check with Bloom filter (no false negatives)
    if (this.processedEvents.mightContain(event.eventId)) {
      // Might be duplicate - check definitive store
      if (await this.processedStore.exists(event.eventId)) {
        return;  // Already processed, skip
      }
    }
 
    // Process the event
    await handler(event);
 
    // Mark as processed
    this.processedEvents.add(event.eventId);
    await this.processedStore.set(event.eventId, '1', { ttl: 86400 * 7 });
  }
}

Fan-out Limits

When a supernode (user with 30K connections) adds a new connection, updating all derived data naively would touch millions of records. LinkedIn limits eager updates to the most important derived data (caches, active user feeds) and handles the rest lazily or in scheduled batch jobs.

Caching at Scale

Caching is not optional at LinkedIn scale—it's the primary read path. The database is a persistence layer; the cache is the query layer.

Cache Tier Architecture
Tier	Technology	Latency	Hit Rate Target	Data
L1 (In-Process)	Local LRU	< 1ms	30-40%	Hot data per instance
L2 (Distributed)	Redis Cluster	1-5ms	80-90%	All frequently accessed
L3 (CDN)	Akamai/CloudFront	5-50ms	95%+ for static	Aggregated, static

Multi-Tier Caching
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
class MultiTierCache {
  private l1: LRUCache<string, any>;          // In-process
  private l2: RedisCluster;                    // Distributed
  private metrics: CacheMetrics;
 
  constructor(l1Size: number = 10000) {
    this.l1 = new LRUCache({ max: l1Size, ttl: 60000 });
  }
 
  async get<T>(key: string): Promise<T | null> {
    // L1: In-process cache
    const l1Value = this.l1.get(key) as T | undefined;
    if (l1Value !== undefined) {
      this.metrics.recordHit('l1');
      return l1Value;
    }
 
    // L2: Redis
    const l2Value = await this.l2.get(key);
    if (l2Value !== null) {
      this.metrics.recordHit('l2');
      const parsed = JSON.parse(l2Value) as T;
      
      // Populate L1
      this.l1.set(key, parsed);
      
      return parsed;
    }
 
    this.metrics.recordMiss();
    return null;
  }
 
  async set<T>(
    key: string,
    value: T,
    options: CacheOptions = {}
  ): Promise<void> {
    const { ttl = 3600 } = options;
    
    // Write to both tiers
    this.l1.set(key, value, { ttl: Math.min(ttl, 60) * 1000 });
    await this.l2.setex(key, ttl, JSON.stringify(value));
  }
 
  async invalidate(keys: string[]): Promise<void> {
    // L1: Invalidate locally
    for (const key of keys) {
      this.l1.delete(key);
    }
 
    // L2: Invalidate in Redis
    if (keys.length > 0) {
      await this.l2.del(...keys);
    }
 
    // Broadcast invalidation to other instances
    await this.publishInvalidation(keys);
  }
 
  // Handle invalidation broadcasts
  private async publishInvalidation(keys: string[]): Promise<void> {
    await this.l2.publish('cache:invalidate', JSON.stringify(keys));
  }
 
  // Subscribe to invalidations from other instances
  subscribeToInvalidations(): void {
    this.l2.subscribe('cache:invalidate', (message) => {
      const keys = JSON.parse(message) as string[];
      for (const key of keys) {
        this.l1.delete(key);
      }
    });
  }
}
 
// Cache warming for predictable access patterns
class CacheWarmer {
  private cache: MultiTierCache;
  private graphStore: GraphStore;
 
  // Pre-warm caches during low-traffic periods
  async warmDailyActiveUsers(): Promise<void> {
    const activeUsers = await this.getActiveUserIds();
    
    // Batch processing with concurrency limit
    const concurrency = 100;
    
    for (let i = 0; i < activeUsers.length; i += concurrency) {
      const batch = activeUsers.slice(i, i + concurrency);
      await Promise.all(batch.map(userId => this.warmUserCache(userId)));
    }
  }
 
  private async warmUserCache(userId: string): Promise<void> {
    // Fetch and cache connection data
    const [connections, connectionCount] = await Promise.all([
      this.graphStore.getConnections(userId),
      this.graphStore.getConnectionCount(userId),
    ]);
 
    await Promise.all([
      this.cache.set(`connections:${userId}`, connections, { ttl: 3600 }),
      this.cache.set(`connection_count:${userId}`, connectionCount, { ttl: 3600 }),
    ]);
  }
 
  // Warm cache before anticipated traffic spikes
  async warmBeforeEvent(eventAttendees: string[]): Promise<void> {
    // People at events often view each other's profiles
    // Pre-warm their connection data
    for (const userId of eventAttendees) {
      await this.warmUserCache(userId);
    }
  }
}
 
// Cache stampede prevention
class StampedeProtection {
  private locks: Map<string, Promise<any>> = new Map();
 
  async getWithProtection<T>(
    key: string,
    fetcher: () => Promise<T>,
    cache: MultiTierCache
  ): Promise<T> {
    // Check cache
    const cached = await cache.get<T>(key);
    if (cached !== null) return cached;
 
    // Check if fetch is in progress
    if (this.locks.has(key)) {
      return this.locks.get(key) as Promise<T>;
    }
 
    // Start fetch with lock
    const fetchPromise = (async () => {
      try {
        const value = await fetcher();
        await cache.set(key, value);
        return value;
      } finally {
        this.locks.delete(key);
      }
    })();
 
    this.locks.set(key, fetchPromise);
    return fetchPromise;
  }
}

Operational Excellence

Running social graphs at scale requires rigorous operational practices. Failures are inevitable; the goal is to minimize impact and recover quickly.

Monitoring and Alerting

•Latency Percentiles — Track p50, p95, p99, p99.9 for all operations. Alert on degradation.
•Error Rates — Track by operation type, shard, region. Alert on threshold breaches.
•Cache Hit Rates — Monitor L1 and L2 separately. Investigate drops.
•Queue Depths — Monitor Kafka consumer lag. Alert if falling behind.
•Database Connections — Track pool utilization. Alert on exhaustion.
•Shard Health — Monitor replication lag, disk usage, query performance per shard.

Failure Handling Strategies

•Circuit Breakers — Fail fast when downstream is unhealthy. Prevent cascade failures.
•Retry with Backoff — Exponential backoff with jitter for transient failures.
•Bulkheads — Isolate failure domains. One shard's problems don't affect others.
•Graceful Degradation — Serve stale cache on database failure. Show 'N/A' instead of error.
•Rate Limiting — Protect against traffic spikes, both organic and attacks.
•Load Shedding — Drop low-priority requests under extreme load.

Resilience Patterns
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
// Circuit breaker implementation
class CircuitBreaker {
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private failures = 0;
  private lastFailure: number = 0;
  private successesInHalfOpen = 0;
 
  constructor(
    private threshold: number = 5,
    private timeout: number = 30000,
    private successThreshold: number = 3
  ) {}
 
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (!this.canExecute()) {
      throw new CircuitOpenError('Circuit breaker is open');
    }
 
    try {
      const result = await fn();
      this.recordSuccess();
      return result;
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }
 
  private canExecute(): boolean {
    if (this.state === 'closed') return true;
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.timeout) {
        this.state = 'half-open';
        return true;
      }
      return false;
    }
    return true;  // half-open allows limited traffic
  }
 
  private recordSuccess(): void {
    if (this.state === 'half-open') {
      this.successesInHalfOpen++;
      if (this.successesInHalfOpen >= this.successThreshold) {
        this.state = 'closed';
        this.failures = 0;
        this.successesInHalfOpen = 0;
      }
    } else {
      this.failures = 0;
    }
  }
 
  private recordFailure(): void {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= this.threshold) {
      this.state = 'open';
      this.successesInHalfOpen = 0;
    }
  }
}
 
// Graceful degradation with fallbacks
class GracefulConnectionService {
  private db: ConnectionStore;
  private cache: MultiTierCache;
  private circuitBreaker: CircuitBreaker;
 
  async getConnections(userId: string): Promise<ConnectionResult> {
    try {
      return await this.circuitBreaker.execute(async () => {
        // Try cache first
        const cached = await this.cache.get<Connection[]>(`connections:${userId}`);
        if (cached) {
          return { connections: cached, source: 'cache', stale: false };
        }
 
        // Query database
        const connections = await this.db.getConnections(userId);
        await this.cache.set(`connections:${userId}`, connections);
        return { connections, source: 'database', stale: false };
      });
    } catch (error) {
      if (error instanceof CircuitOpenError) {
        // Database is down - try stale cache
        return this.getStaleOrEmpty(userId);
      }
      throw error;
    }
  }
 
  private async getStaleOrEmpty(userId: string): Promise<ConnectionResult> {
    // Try to get from cache even if expired
    const stale = await this.cache.getStale<Connection[]>(`connections:${userId}`);
    if (stale) {
      return { connections: stale, source: 'stale_cache', stale: true };
    }
 
    // Last resort: return empty with degradation notice
    return { connections: [], source: 'degraded', stale: true };
  }
}
 
// Load shedding under pressure
class LoadShedder {
  private requestCounter = 0;
  private successCounter = 0;
  private windowStart = Date.now();
  private windowSize = 1000;  // 1 second
 
  shouldShed(priority: 'high' | 'medium' | 'low'): boolean {
    this.maybeRotateWindow();
 
    const errorRate = 1 - (this.successCounter / Math.max(this.requestCounter, 1));
    
    // Shed low-priority requests first
    if (priority === 'low' && errorRate > 0.1) return true;
    if (priority === 'medium' && errorRate > 0.3) return true;
    if (priority === 'high' && errorRate > 0.5) return true;
 
    return false;
  }
 
  recordRequest(success: boolean): void {
    this.requestCounter++;
    if (success) this.successCounter++;
  }
 
  private maybeRotateWindow(): void {
    if (Date.now() - this.windowStart > this.windowSize) {
      this.requestCounter = 0;
      this.successCounter = 0;
      this.windowStart = Date.now();
    }
  }
}

Capacity Planning for Growth

Social networks grow exponentially—both in users and in graph density (connections per user). Capacity planning must account for this growth while maintaining cost efficiency.

Capacity Planning Factors
Factor	Growth Model	Planning Horizon	Action Trigger
User Count	+10% annually	18 months	70% threshold
Connections/User	+5% annually	12 months	Average > 600
Daily Active Users	Seasonal + growth	6 months	Peak vs provisioned
Query Complexity	Feature-driven	Per release	Latency regression
Storage	Linear with edges	24 months	80% capacity
Cache Size	Grows with active	12 months	Hit rate < target

Capacity Planning Model
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
interface CapacityModel {
  currentState: SystemState;
  growthAssumptions: GrowthAssumptions;
  planningHorizonMonths: number;
}
 
interface SystemState {
  users: number;
  avgConnectionsPerUser: number;
  dailyActiveUsers: number;
  peakQPS: number;
  storageGB: number;
  shardCount: number;
  cacheNodeCount: number;
}
 
interface GrowthAssumptions {
  userGrowthRate: number;           // Annual %
  connectionGrowthRate: number;     // Annual % increase in avg connections
  dauGrowthRate: number;            // Annual %
  queryComplexityGrowth: number;    // Annual % (new features add cost)
}
 
class CapacityPlanner {
  projectCapacityNeeds(
    model: CapacityModel
  ): CapacityProjection[] {
    const { currentState, growthAssumptions, planningHorizonMonths } = model;
    const projections: CapacityProjection[] = [];
 
    for (let month = 1; month <= planningHorizonMonths; month++) {
      const yearFraction = month / 12;
      
      // Project growth
      const users = currentState.users * Math.pow(
        1 + growthAssumptions.userGrowthRate, yearFraction
      );
      const avgConnections = currentState.avgConnectionsPerUser * Math.pow(
        1 + growthAssumptions.connectionGrowthRate, yearFraction
      );
      const totalConnections = users * avgConnections;
      
      const dau = currentState.dailyActiveUsers * Math.pow(
        1 + growthAssumptions.dauGrowthRate, yearFraction
      );
      const qps = this.estimateQPS(dau, avgConnections);
      
      // Storage: ~100 bytes per connection edge
      const storageGB = (totalConnections * 100) / (1024 * 1024 * 1024);
      
      // Shards: target 100GB per shard
      const recommendedShards = Math.ceil(storageGB / 100);
      
      // Cache: 10% of data should be cacheable, 50 bytes per entry avg
      const cacheEntriesNeeded = users * 0.1;
      const cacheMemoryGB = (cacheEntriesNeeded * 50) / (1024 * 1024 * 1024);
      const cacheNodes = Math.ceil(cacheMemoryGB / 64);  // 64GB per node
      
      projections.push({
        month,
        users: Math.round(users),
        connections: Math.round(totalConnections),
        dau: Math.round(dau),
        peakQPS: Math.round(qps),
        storageGB: Math.round(storageGB),
        recommendedShards,
        recommendedCacheNodes: cacheNodes,
        estimatedCost: this.estimateCost({
          shards: recommendedShards,
          cacheNodes,
          computeInstances: Math.ceil(qps / 5000),
        }),
      });
    }
 
    return projections;
  }
 
  private estimateQPS(dau: number, avgConnections: number): number {
    // DAU generates ~20 connection-related queries/day on average
    const dailyQueries = dau * 20;
    
    // Peak is 5x average, concentrated in 4 hours
    const avgQPS = dailyQueries / 86400;
    const peakQPS = avgQPS * 5;
    
    // Complexity factor: more connections = more complex queries
    const complexityFactor = 1 + (avgConnections / 1000);
    
    return peakQPS * complexityFactor;
  }
 
  generateAlerts(projections: CapacityProjection[]): CapacityAlert[] {
    const alerts: CapacityAlert[] = [];
    const current = projections[0];
 
    for (const projection of projections) {
      // Storage alert: 6 months before 80% capacity
      if (projection.storageGB > current.storageGB * 0.8) {
        alerts.push({
          type: 'storage',
          severity: projection.month < 6 ? 'critical' : 'warning',
          message: `Storage will reach 80% capacity in ${projection.month} months`,
          recommendation: `Add ${projection.recommendedShards - current.recommendedShards} shards`,
        });
        break;
      }
    }
 
    // Similar alerts for compute, cache, etc.
    return alerts;
  }
}
 
interface CapacityProjection {
  month: number;
  users: number;
  connections: number;
  dau: number;
  peakQPS: number;
  storageGB: number;
  recommendedShards: number;
  recommendedCacheNodes: number;
  estimatedCost: CostEstimate;
}

Summary: Scaling Social Graphs

We've explored the complete scaling architecture for professional social networks. Let's consolidate the key learnings:

Key Takeaways

•Horizontal scaling with sharding — Partition users across shards using consistent hashing. Accept cross-shard edges or replicate edges to both endpoints.
•Consistency varies by operation — Strong consistency for writes, session consistency for own data, eventual consistency for others' data and aggregations.
•Event-driven fan-out — Publish connection events to Kafka; workers update caches, feeds, recommendations, and search indexes asynchronously.
•Multi-tier caching is essential — L1 in-process, L2 distributed cache, L3 CDN for aggregated data. Cache is the read path; database is persistence.
•Operational resilience — Circuit breakers, graceful degradation, load shedding, and comprehensive monitoring prevent cascading failures.
•Supernode handling — Special storage, rate limiting, and processing for users with 10K+ connections.
•Proactive capacity planning — Model growth, set thresholds, and add capacity before constraints become incidents.

Module Complete: LinkedIn Connections

Congratulations! You've completed the LinkedIn Connections module. You now understand how to design a professional social network from requirements through scaling. Key skills covered: graph storage and traversal, degree of separation queries, recommendation algorithms, sharding strategies, event-driven architecture, and operational excellence. These patterns apply broadly to any social graph system at scale.

Scaling Social Graphs

The Scale of Professional Networks

Scaling a social graph is uniquely challenging because:

Interconnected data — You can't isolate a user's data; it's connected to thousands of others
Fan-out operations — A single action (new connection) triggers updates across caches, feeds, and recommendations
Real-time expectations — Users expect immediate visibility of new connections
High read-to-write ratio — Writes trigger exponentially more reads

This page explores the architecture and techniques that enable professional networks to scale reliably.

What You Will Learn

Horizontal Scaling Architecture

Converting Mermaid diagram...

Scaling Components
Component	Scaling Strategy	Typical Scale	Key Challenge
API Servers	Stateless, horizontal	1000+ instances	Session affinity if needed
Connection Service	Stateless, horizontal	500+ instances	Database connection pooling
Database Shards	Horizontal partitioning	1000+ shards	Cross-shard queries
Cache Cluster	Consistent hashing	1000+ nodes	Cache coherence
Event Queue	Partitioned topics	100+ partitions	Ordering guarantees
Worker Pool	Horizontal, autoscaled	Variable	Backpressure handling

Service Scaling Configuration
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// Connection Service scaling configuration
interface ServiceConfig {
  // Horizontal scaling
  minInstances: number;
  maxInstances: number;
  targetCPU: number;           // Target CPU utilization %
  targetRPS: number;           // Target requests per second per instance
  
  // Connection pooling
  dbConnectionsPerInstance: number;
  cacheConnectionsPerInstance: number;
  
  // Circuit breakers
  circuitBreakerThreshold: number;
  circuitBreakerTimeout: number;
  
  // Rate limiting
  rateLimitPerUser: number;
  rateLimitBurst: number;
}
 
const connectionServiceConfig: ServiceConfig = {
  // Scale between 200-800 instances based on load
  minInstances: 200,
  maxInstances: 800,
  targetCPU: 60,              // Keep headroom for spikes
  targetRPS: 5000,            // 5K RPS per instance
  
  // Each instance maintains pools
  dbConnectionsPerInstance: 50,
  cacheConnectionsPerInstance: 100,
  
  // Fail fast on downstream issues
  circuitBreakerThreshold: 0.5,   // 50% error rate trips
  circuitBreakerTimeout: 30000,   // 30s before retry
  
  // Protect against abuse
  rateLimitPerUser: 100,          // 100 ops/minute
  rateLimitBurst: 50,             // Allow bursts
};
 
// Autoscaling logic
class AutoScaler {
  async evaluateScale(metrics: ServiceMetrics): Promise<ScaleDecision> {
    const currentInstances = metrics.instanceCount;
    
    // Scale up triggers
    if (metrics.avgCPU > 75 || metrics.avgLatencyP99 > 500) {
      const newCount = Math.min(
        currentInstances * 1.5,
        connectionServiceConfig.maxInstances
      );
      return { action: 'scale_up', targetInstances: Math.ceil(newCount) };
    }
    
    // Scale down triggers (conservative)
    if (metrics.avgCPU < 30 && metrics.avgLatencyP99 < 100) {
      const newCount = Math.max(
        currentInstances * 0.8,
        connectionServiceConfig.minInstances
      );
      return { action: 'scale_down', targetInstances: Math.floor(newCount) };
    }
    
    return { action: 'maintain', targetInstances: currentInstances };
  }
}

Data Sharding Strategies

Sharding the connection graph across multiple databases is essential but challenging. Unlike independent records, connections create dependencies between users that complicate partitioning.

Sharding Approaches

•User-based Sharding — Each user's data lives on one shard. Shard = hash(userId) mod N. Simple but edges span shards.
•Edge-based Sharding — Edges stored on both endpoints' shards (replication). Enables local reads but doubles writes.
•Graph-aware Partitioning — Cluster connected users on same shards. Minimizes cross-shard edges but expensive to maintain.
•Temporal Sharding — Hot data on fast storage, cold data on cheaper storage. Good for time-based access patterns.

Sharding Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// User-based sharding with edge replication
class ShardRouter {
  private shardCount: number;
  private shardConnections: Map<number, DatabasePool>;
  private consistentHash: ConsistentHashRing;
 
  constructor(shardCount: number) {
    this.shardCount = shardCount;
    this.consistentHash = new ConsistentHashRing(shardCount, 150); // 150 virtual nodes
  }
 
  // Determine shard for a user
  getShardForUser(userId: string): number {
    return this.consistentHash.getNode(userId);
  }
 
  // Get shards involved in a connection
  getShardsForConnection(userId1: string, userId2: string): number[] {
    const shard1 = this.getShardForUser(userId1);
    const shard2 = this.getShardForUser(userId2);
    
    if (shard1 === shard2) {
      return [shard1];  // Lucky - same shard
    }
    return [shard1, shard2];  // Cross-shard connection
  }
 
  // Query that must touch multiple shards
  async queryAcrossShards<T>(
    userIds: string[],
    query: (db: DatabasePool, userIds: string[]) => Promise<T[]>
  ): Promise<T[]> {
    // Group by shard
    const byShards = new Map<number, string[]>();
    for (const userId of userIds) {
      const shard = this.getShardForUser(userId);
      if (!byShards.has(shard)) byShards.set(shard, []);
      byShards.get(shard)!.push(userId);
    }
 
    // Parallel queries to each shard
    const results = await Promise.all(
      Array.from(byShards.entries()).map(([shard, ids]) => 
        query(this.shardConnections.get(shard)!, ids)
      )
    );
 
    return results.flat();
  }
}
 
// Connection write with edge replication
class ConnectionWriter {
  private router: ShardRouter;
  private eventPublisher: EventPublisher;
 
  async createConnection(
    userId1: string,
    userId2: string,
    metadata: ConnectionMetadata
  ): Promise<void> {
    const shards = this.router.getShardsForConnection(userId1, userId2);
    const connectionId = this.generateConnectionId(userId1, userId2);
    
    const connection = {
      id: connectionId,
      userId1: this.canonicalFirst(userId1, userId2),
      userId2: this.canonicalSecond(userId1, userId2),
      connectedAt: new Date(),
      ...metadata,
    };
 
    if (shards.length === 1) {
      // Same shard - single write
      await this.writeSingleShard(shards[0], connection);
    } else {
      // Cross-shard - need distributed transaction or eventual consistency
      await this.writeCrossShard(shards, connection);
    }
 
    // Publish event for async processing
    await this.eventPublisher.publish('connection.created', {
      connectionId,
      userId1,
      userId2,
      timestamp: Date.now(),
    });
  }
 
  // Two-phase commit for cross-shard writes
  private async writeCrossShard(
    shards: number[],
    connection: Connection
  ): Promise<void> {
    const txId = this.generateTxId();
    
    try {
      // Phase 1: Prepare on both shards
      await Promise.all(shards.map(shard => 
        this.prepare(shard, txId, connection)
      ));
 
      // Phase 2: Commit on both
      await Promise.all(shards.map(shard =>
        this.commit(shard, txId)
      ));
    } catch (error) {
      // Rollback on failure
      await Promise.all(shards.map(shard =>
        this.rollback(shard, txId).catch(() => {})
      ));
      throw error;
    }
  }
 
  // Alternative: Saga pattern with compensating actions
  private async writeCrossShardSaga(
    shards: number[],
    connection: Connection
  ): Promise<void> {
    const completedShards: number[] = [];
 
    try {
      for (const shard of shards) {
        await this.writeSingleShard(shard, connection);
        completedShards.push(shard);
      }
    } catch (error) {
      // Compensate: Delete from completed shards
      for (const shard of completedShards) {
        await this.deleteSingleShard(shard, connection.id).catch(() => {});
      }
      throw error;
    }
  }
}

The Resharding Challenge

Consistency vs Availability Trade-offs

The CAP theorem forces distributed systems to choose between consistency and availability during network partitions. For social graphs, different operations have different requirements.

Consistency Requirements by Operation
Operation	Consistency Need	Availability Need	Strategy
Connection Create	Strong (no duplicates)	High	Synchronous replication, retry on failure
Connection Read (own)	Strong	Very High	Read from primary, cache heavily
Connection Read (others)	Eventual OK	Very High	Read from replica, stale OK
PYMK	Eventual	High	Precomputed, hourly refresh OK
Mutual Connections	Near-real-time	High	Async update, 5s delay OK
Connection Count	Eventual	Very High	Cached, periodic refresh

Consistency Strategies
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// Different consistency levels for different operations
enum ConsistencyLevel {
  STRONG = 'strong',           // Read from primary, synchronous write
  SESSION = 'session',         // Consistent within user's session
  EVENTUAL = 'eventual',       // Read from any replica
  CAUSAL = 'causal',          // Respects causal ordering
}
 
class ConnectionStore {
  private primary: Database;
  private replicas: Database[];
  private cache: CacheCluster;
 
  // Strong consistency for mutations
  async createConnection(
    conn: ConnectionCreate
  ): Promise<Connection> {
    // Write to primary with synchronous replication to at least one replica
    const result = await this.primary.insert('connections', conn, {
      replicationFactor: 2,
      waitForReplication: true,
    });
 
    // Invalidate caches immediately
    await this.invalidateCaches(conn.userId1, conn.userId2);
 
    return result;
  }
 
  // Read with session consistency for own data
  async getMyConnections(
    userId: string,
    sessionToken: string
  ): Promise<Connection[]> {
    // Check if user recently made changes
    const lastWriteTime = await this.getLastWriteTime(userId, sessionToken);
    
    if (lastWriteTime && Date.now() - lastWriteTime < 5000) {
      // Recent write - read from primary to see own changes
      return this.primary.query(
        'SELECT * FROM connections WHERE user_id = ?',
        [userId]
      );
    }
 
    // No recent writes - replica is fine
    return this.readFromReplica(userId);
  }
 
  // Eventual consistency for others' data
  async getOtherUserConnections(
    viewerId: string,
    targetId: string
  ): Promise<Connection[]> {
    // Try cache first
    const cached = await this.cache.get(`connections:${targetId}`);
    if (cached) return JSON.parse(cached);
 
    // Read from any replica
    const connections = await this.readFromReplica(targetId);
    
    // Cache with short TTL
    await this.cache.setex(
      `connections:${targetId}`,
      300,  // 5 minutes
      JSON.stringify(connections)
    );
 
    return connections;
  }
 
  private async readFromReplica(userId: string): Promise<Connection[]> {
    // Round-robin or least-loaded replica
    const replica = this.selectReplica();
    return replica.query(
      'SELECT * FROM connections WHERE user_id = ?',
      [userId]
    );
  }
 
  // Handle split-brain scenarios
  async reconcileAfterPartition(
    shard1Data: Connection[],
    shard2Data: Connection[]
  ): Promise<void> {
    // Use timestamp-based last-write-wins for conflict resolution
    const byId = new Map<string, Connection>();
    
    for (const conn of [...shard1Data, ...shard2Data]) {
      const existing = byId.get(conn.id);
      if (!existing || conn.updatedAt > existing.updatedAt) {
        byId.set(conn.id, conn);
      }
    }
 
    // Write reconciled state to both shards
    const reconciled = Array.from(byId.values());
    await Promise.all([
      this.primary.bulkUpsert('connections', reconciled),
      this.replicas[0].bulkUpsert('connections', reconciled),
    ]);
  }
}
 
// Causal consistency for distributed updates
class CausalConsistencyManager {
  private vectorClocks: Map<string, VectorClock> = new Map();
 
  // Track causality for connection updates
  recordUpdate(
    userId: string,
    operation: Operation
  ): VectorClock {
    let clock = this.vectorClocks.get(userId);
    if (!clock) {
      clock = new VectorClock();
      this.vectorClocks.set(userId, clock);
    }
 
    clock.increment(userId);
    return clock.copy();
  }
 
  // Ensure causal order on reads
  async waitForCausality(
    userId: string,
    expectedClock: VectorClock
  ): Promise<void> {
    const currentClock = this.vectorClocks.get(userId);
    
    while (!currentClock?.happensBefore(expectedClock)) {
      await sleep(10);
      // Timeout after 5s
    }
  }
}

Real-Time Update Propagation

When a connection is created, the update must propagate through multiple systems: caches, feeds, recommendations, search indexes, and more. This fan-out must be reliable, fast, and scalable.

Converting Mermaid diagram...

Event-Driven Update Propagation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
// Event schema for connection changes
interface ConnectionEvent {
  eventId: string;
  eventType: 'created' | 'removed';
  connectionId: string;
  userId1: string;
  userId2: string;
  timestamp: number;
  metadata: {
    source: string;    // 'pymk', 'search', etc.
    traceId: string;   // For distributed tracing
  };
}
 
// Publisher ensures at-least-once delivery
class ConnectionEventPublisher {
  private kafka: KafkaProducer;
  private topic = 'connection-events';
 
  async publish(event: ConnectionEvent): Promise<void> {
    // Partition by userId1 to maintain ordering per user
    const partition = this.getPartition(event.userId1);
    
    await this.kafka.send({
      topic: this.topic,
      partition,
      messages: [{
        key: event.userId1,
        value: JSON.stringify(event),
        headers: {
          eventType: event.eventType,
          timestamp: String(event.timestamp),
        },
      }],
    });
  }
 
  private getPartition(userId: string): number {
    return murmurhash(userId) % this.partitionCount;
  }
}
 
// Consumer workers process updates
class CacheInvalidationWorker {
  private redis: RedisCluster;
  private consumer: KafkaConsumer;
 
  async process(event: ConnectionEvent): Promise<void> {
    const { userId1, userId2 } = event;
 
    // Invalidate all affected caches
    const keysToInvalidate = [
      // Connection lists
      `connections:${userId1}`,
      `connections:${userId2}`,
      
      // Connection counts
      `connection_count:${userId1}`,
      `connection_count:${userId2}`,
      
      // Mutual connections for their mutual friends
      // (This is expensive - might do async/batch)
      ...await this.getMutualCacheKeys(userId1, userId2),
      
      // PYMK caches
      `pymk:${userId1}`,
      `pymk:${userId2}`,
    ];
 
    // Batch delete for efficiency
    await this.redis.del(...keysToInvalidate);
    
    // Optionally, pre-warm critical caches
    await this.prewarmCaches(userId1, userId2);
  }
 
  private async prewarmCaches(
    userId1: string,
    userId2: string
  ): Promise<void> {
    // Pre-fetch and cache connection lists
    // Prioritize if users are active
    const [activity1, activity2] = await Promise.all([
      this.getUserActivity(userId1),
      this.getUserActivity(userId2),
    ]);
 
    if (activity1.isActive) {
      this.backgroundPrewarm(userId1);
    }
    if (activity2.isActive) {
      this.backgroundPrewarm(userId2);
    }
  }
}
 
class RecommendationUpdateWorker {
  private recStore: RecommendationStore;
  private graphStore: GraphStore;
 
  async process(event: ConnectionEvent): Promise<void> {
    const { userId1, userId2, eventType } = event;
 
    if (eventType === 'created') {
      // Remove each other from PYMK (already connected)
      await Promise.all([
        this.recStore.removePYMKCandidate(userId1, userId2),
        this.recStore.removePYMKCandidate(userId2, userId1),
      ]);
 
      // Boost mutual friends in PYMK (common connections = more likely to connect)
      await this.boostMutualFriends(userId1, userId2);
    } else {
      // Connection removed - they might want to reconnect
      // Don't immediately re-add to PYMK (feels stalky)
      await this.scheduleForRecomputeLater(userId1, userId2);
    }
  }
 
  private async boostMutualFriends(
    userId1: string,
    userId2: string
  ): Promise<void> {
    // Get connections of userId1 who aren't connected to userId2
    // These are now 2nd-degree via the new connection
    const conn1 = await this.graphStore.getConnectionSet(userId1);
    const conn2 = await this.graphStore.getConnectionSet(userId2);
 
    // Connections of user1 not connected to user2
    const newSecondDegreeFor2 = [...conn1].filter(c => 
      !conn2.has(c) && c !== userId2
    );
 
    // Boost these in user2's PYMK
    for (const candidate of newSecondDegreeFor2.slice(0, 100)) {
      await this.recStore.boostPYMKScore(userId2, candidate, 0.1);
    }
 
    // Same for user1
    const newSecondDegreeFor1 = [...conn2].filter(c =>
      !conn1.has(c) && c !== userId1
    );
 
    for (const candidate of newSecondDegreeFor1.slice(0, 100)) {
      await this.recStore.boostPYMKScore(userId1, candidate, 0.1);
    }
  }
}
 
// Ensure exactly-once processing with idempotency
class IdempotentProcessor {
  private processedEvents: BloomFilter;
  private processedStore: KeyValueStore;
 
  async processOnce(
    event: ConnectionEvent,
    handler: (e: ConnectionEvent) => Promise<void>
  ): Promise<void> {
    // Quick check with Bloom filter (no false negatives)
    if (this.processedEvents.mightContain(event.eventId)) {
      // Might be duplicate - check definitive store
      if (await this.processedStore.exists(event.eventId)) {
        return;  // Already processed, skip
      }
    }
 
    // Process the event
    await handler(event);
 
    // Mark as processed
    this.processedEvents.add(event.eventId);
    await this.processedStore.set(event.eventId, '1', { ttl: 86400 * 7 });
  }
}

Fan-out Limits

Caching at Scale

Caching is not optional at LinkedIn scale—it's the primary read path. The database is a persistence layer; the cache is the query layer.

Cache Tier Architecture
Tier	Technology	Latency	Hit Rate Target	Data
L1 (In-Process)	Local LRU	< 1ms	30-40%	Hot data per instance
L2 (Distributed)	Redis Cluster	1-5ms	80-90%	All frequently accessed
L3 (CDN)	Akamai/CloudFront	5-50ms	95%+ for static	Aggregated, static

Multi-Tier Caching
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
class MultiTierCache {
  private l1: LRUCache<string, any>;          // In-process
  private l2: RedisCluster;                    // Distributed
  private metrics: CacheMetrics;
 
  constructor(l1Size: number = 10000) {
    this.l1 = new LRUCache({ max: l1Size, ttl: 60000 });
  }
 
  async get<T>(key: string): Promise<T | null> {
    // L1: In-process cache
    const l1Value = this.l1.get(key) as T | undefined;
    if (l1Value !== undefined) {
      this.metrics.recordHit('l1');
      return l1Value;
    }
 
    // L2: Redis
    const l2Value = await this.l2.get(key);
    if (l2Value !== null) {
      this.metrics.recordHit('l2');
      const parsed = JSON.parse(l2Value) as T;
      
      // Populate L1
      this.l1.set(key, parsed);
      
      return parsed;
    }
 
    this.metrics.recordMiss();
    return null;
  }
 
  async set<T>(
    key: string,
    value: T,
    options: CacheOptions = {}
  ): Promise<void> {
    const { ttl = 3600 } = options;
    
    // Write to both tiers
    this.l1.set(key, value, { ttl: Math.min(ttl, 60) * 1000 });
    await this.l2.setex(key, ttl, JSON.stringify(value));
  }
 
  async invalidate(keys: string[]): Promise<void> {
    // L1: Invalidate locally
    for (const key of keys) {
      this.l1.delete(key);
    }
 
    // L2: Invalidate in Redis
    if (keys.length > 0) {
      await this.l2.del(...keys);
    }
 
    // Broadcast invalidation to other instances
    await this.publishInvalidation(keys);
  }
 
  // Handle invalidation broadcasts
  private async publishInvalidation(keys: string[]): Promise<void> {
    await this.l2.publish('cache:invalidate', JSON.stringify(keys));
  }
 
  // Subscribe to invalidations from other instances
  subscribeToInvalidations(): void {
    this.l2.subscribe('cache:invalidate', (message) => {
      const keys = JSON.parse(message) as string[];
      for (const key of keys) {
        this.l1.delete(key);
      }
    });
  }
}
 
// Cache warming for predictable access patterns
class CacheWarmer {
  private cache: MultiTierCache;
  private graphStore: GraphStore;
 
  // Pre-warm caches during low-traffic periods
  async warmDailyActiveUsers(): Promise<void> {
    const activeUsers = await this.getActiveUserIds();
    
    // Batch processing with concurrency limit
    const concurrency = 100;
    
    for (let i = 0; i < activeUsers.length; i += concurrency) {
      const batch = activeUsers.slice(i, i + concurrency);
      await Promise.all(batch.map(userId => this.warmUserCache(userId)));
    }
  }
 
  private async warmUserCache(userId: string): Promise<void> {
    // Fetch and cache connection data
    const [connections, connectionCount] = await Promise.all([
      this.graphStore.getConnections(userId),
      this.graphStore.getConnectionCount(userId),
    ]);
 
    await Promise.all([
      this.cache.set(`connections:${userId}`, connections, { ttl: 3600 }),
      this.cache.set(`connection_count:${userId}`, connectionCount, { ttl: 3600 }),
    ]);
  }
 
  // Warm cache before anticipated traffic spikes
  async warmBeforeEvent(eventAttendees: string[]): Promise<void> {
    // People at events often view each other's profiles
    // Pre-warm their connection data
    for (const userId of eventAttendees) {
      await this.warmUserCache(userId);
    }
  }
}
 
// Cache stampede prevention
class StampedeProtection {
  private locks: Map<string, Promise<any>> = new Map();
 
  async getWithProtection<T>(
    key: string,
    fetcher: () => Promise<T>,
    cache: MultiTierCache
  ): Promise<T> {
    // Check cache
    const cached = await cache.get<T>(key);
    if (cached !== null) return cached;
 
    // Check if fetch is in progress
    if (this.locks.has(key)) {
      return this.locks.get(key) as Promise<T>;
    }
 
    // Start fetch with lock
    const fetchPromise = (async () => {
      try {
        const value = await fetcher();
        await cache.set(key, value);
        return value;
      } finally {
        this.locks.delete(key);
      }
    })();
 
    this.locks.set(key, fetchPromise);
    return fetchPromise;
  }
}

Operational Excellence

Running social graphs at scale requires rigorous operational practices. Failures are inevitable; the goal is to minimize impact and recover quickly.

Monitoring and Alerting

•Latency Percentiles — Track p50, p95, p99, p99.9 for all operations. Alert on degradation.
•Error Rates — Track by operation type, shard, region. Alert on threshold breaches.
•Cache Hit Rates — Monitor L1 and L2 separately. Investigate drops.
•Queue Depths — Monitor Kafka consumer lag. Alert if falling behind.
•Database Connections — Track pool utilization. Alert on exhaustion.
•Shard Health — Monitor replication lag, disk usage, query performance per shard.

Failure Handling Strategies

•Circuit Breakers — Fail fast when downstream is unhealthy. Prevent cascade failures.
•Retry with Backoff — Exponential backoff with jitter for transient failures.
•Bulkheads — Isolate failure domains. One shard's problems don't affect others.
•Graceful Degradation — Serve stale cache on database failure. Show 'N/A' instead of error.
•Rate Limiting — Protect against traffic spikes, both organic and attacks.
•Load Shedding — Drop low-priority requests under extreme load.

Resilience Patterns
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
// Circuit breaker implementation
class CircuitBreaker {
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private failures = 0;
  private lastFailure: number = 0;
  private successesInHalfOpen = 0;
 
  constructor(
    private threshold: number = 5,
    private timeout: number = 30000,
    private successThreshold: number = 3
  ) {}
 
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (!this.canExecute()) {
      throw new CircuitOpenError('Circuit breaker is open');
    }
 
    try {
      const result = await fn();
      this.recordSuccess();
      return result;
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }
 
  private canExecute(): boolean {
    if (this.state === 'closed') return true;
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.timeout) {
        this.state = 'half-open';
        return true;
      }
      return false;
    }
    return true;  // half-open allows limited traffic
  }
 
  private recordSuccess(): void {
    if (this.state === 'half-open') {
      this.successesInHalfOpen++;
      if (this.successesInHalfOpen >= this.successThreshold) {
        this.state = 'closed';
        this.failures = 0;
        this.successesInHalfOpen = 0;
      }
    } else {
      this.failures = 0;
    }
  }
 
  private recordFailure(): void {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= this.threshold) {
      this.state = 'open';
      this.successesInHalfOpen = 0;
    }
  }
}
 
// Graceful degradation with fallbacks
class GracefulConnectionService {
  private db: ConnectionStore;
  private cache: MultiTierCache;
  private circuitBreaker: CircuitBreaker;
 
  async getConnections(userId: string): Promise<ConnectionResult> {
    try {
      return await this.circuitBreaker.execute(async () => {
        // Try cache first
        const cached = await this.cache.get<Connection[]>(`connections:${userId}`);
        if (cached) {
          return { connections: cached, source: 'cache', stale: false };
        }
 
        // Query database
        const connections = await this.db.getConnections(userId);
        await this.cache.set(`connections:${userId}`, connections);
        return { connections, source: 'database', stale: false };
      });
    } catch (error) {
      if (error instanceof CircuitOpenError) {
        // Database is down - try stale cache
        return this.getStaleOrEmpty(userId);
      }
      throw error;
    }
  }
 
  private async getStaleOrEmpty(userId: string): Promise<ConnectionResult> {
    // Try to get from cache even if expired
    const stale = await this.cache.getStale<Connection[]>(`connections:${userId}`);
    if (stale) {
      return { connections: stale, source: 'stale_cache', stale: true };
    }
 
    // Last resort: return empty with degradation notice
    return { connections: [], source: 'degraded', stale: true };
  }
}
 
// Load shedding under pressure
class LoadShedder {
  private requestCounter = 0;
  private successCounter = 0;
  private windowStart = Date.now();
  private windowSize = 1000;  // 1 second
 
  shouldShed(priority: 'high' | 'medium' | 'low'): boolean {
    this.maybeRotateWindow();
 
    const errorRate = 1 - (this.successCounter / Math.max(this.requestCounter, 1));
    
    // Shed low-priority requests first
    if (priority === 'low' && errorRate > 0.1) return true;
    if (priority === 'medium' && errorRate > 0.3) return true;
    if (priority === 'high' && errorRate > 0.5) return true;
 
    return false;
  }
 
  recordRequest(success: boolean): void {
    this.requestCounter++;
    if (success) this.successCounter++;
  }
 
  private maybeRotateWindow(): void {
    if (Date.now() - this.windowStart > this.windowSize) {
      this.requestCounter = 0;
      this.successCounter = 0;
      this.windowStart = Date.now();
    }
  }
}

Capacity Planning for Growth

Social networks grow exponentially—both in users and in graph density (connections per user). Capacity planning must account for this growth while maintaining cost efficiency.

Capacity Planning Factors
Factor	Growth Model	Planning Horizon	Action Trigger
User Count	+10% annually	18 months	70% threshold
Connections/User	+5% annually	12 months	Average > 600
Daily Active Users	Seasonal + growth	6 months	Peak vs provisioned
Query Complexity	Feature-driven	Per release	Latency regression
Storage	Linear with edges	24 months	80% capacity
Cache Size	Grows with active	12 months	Hit rate < target

Capacity Planning Model
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
interface CapacityModel {
  currentState: SystemState;
  growthAssumptions: GrowthAssumptions;
  planningHorizonMonths: number;
}
 
interface SystemState {
  users: number;
  avgConnectionsPerUser: number;
  dailyActiveUsers: number;
  peakQPS: number;
  storageGB: number;
  shardCount: number;
  cacheNodeCount: number;
}
 
interface GrowthAssumptions {
  userGrowthRate: number;           // Annual %
  connectionGrowthRate: number;     // Annual % increase in avg connections
  dauGrowthRate: number;            // Annual %
  queryComplexityGrowth: number;    // Annual % (new features add cost)
}
 
class CapacityPlanner {
  projectCapacityNeeds(
    model: CapacityModel
  ): CapacityProjection[] {
    const { currentState, growthAssumptions, planningHorizonMonths } = model;
    const projections: CapacityProjection[] = [];
 
    for (let month = 1; month <= planningHorizonMonths; month++) {
      const yearFraction = month / 12;
      
      // Project growth
      const users = currentState.users * Math.pow(
        1 + growthAssumptions.userGrowthRate, yearFraction
      );
      const avgConnections = currentState.avgConnectionsPerUser * Math.pow(
        1 + growthAssumptions.connectionGrowthRate, yearFraction
      );
      const totalConnections = users * avgConnections;
      
      const dau = currentState.dailyActiveUsers * Math.pow(
        1 + growthAssumptions.dauGrowthRate, yearFraction
      );
      const qps = this.estimateQPS(dau, avgConnections);
      
      // Storage: ~100 bytes per connection edge
      const storageGB = (totalConnections * 100) / (1024 * 1024 * 1024);
      
      // Shards: target 100GB per shard
      const recommendedShards = Math.ceil(storageGB / 100);
      
      // Cache: 10% of data should be cacheable, 50 bytes per entry avg
      const cacheEntriesNeeded = users * 0.1;
      const cacheMemoryGB = (cacheEntriesNeeded * 50) / (1024 * 1024 * 1024);
      const cacheNodes = Math.ceil(cacheMemoryGB / 64);  // 64GB per node
      
      projections.push({
        month,
        users: Math.round(users),
        connections: Math.round(totalConnections),
        dau: Math.round(dau),
        peakQPS: Math.round(qps),
        storageGB: Math.round(storageGB),
        recommendedShards,
        recommendedCacheNodes: cacheNodes,
        estimatedCost: this.estimateCost({
          shards: recommendedShards,
          cacheNodes,
          computeInstances: Math.ceil(qps / 5000),
        }),
      });
    }
 
    return projections;
  }
 
  private estimateQPS(dau: number, avgConnections: number): number {
    // DAU generates ~20 connection-related queries/day on average
    const dailyQueries = dau * 20;
    
    // Peak is 5x average, concentrated in 4 hours
    const avgQPS = dailyQueries / 86400;
    const peakQPS = avgQPS * 5;
    
    // Complexity factor: more connections = more complex queries
    const complexityFactor = 1 + (avgConnections / 1000);
    
    return peakQPS * complexityFactor;
  }
 
  generateAlerts(projections: CapacityProjection[]): CapacityAlert[] {
    const alerts: CapacityAlert[] = [];
    const current = projections[0];
 
    for (const projection of projections) {
      // Storage alert: 6 months before 80% capacity
      if (projection.storageGB > current.storageGB * 0.8) {
        alerts.push({
          type: 'storage',
          severity: projection.month < 6 ? 'critical' : 'warning',
          message: `Storage will reach 80% capacity in ${projection.month} months`,
          recommendation: `Add ${projection.recommendedShards - current.recommendedShards} shards`,
        });
        break;
      }
    }
 
    // Similar alerts for compute, cache, etc.
    return alerts;
  }
}
 
interface CapacityProjection {
  month: number;
  users: number;
  connections: number;
  dau: number;
  peakQPS: number;
  storageGB: number;
  recommendedShards: number;
  recommendedCacheNodes: number;
  estimatedCost: CostEstimate;
}

Summary: Scaling Social Graphs

We've explored the complete scaling architecture for professional social networks. Let's consolidate the key learnings:

Key Takeaways

•Horizontal scaling with sharding — Partition users across shards using consistent hashing. Accept cross-shard edges or replicate edges to both endpoints.
•Consistency varies by operation — Strong consistency for writes, session consistency for own data, eventual consistency for others' data and aggregations.
•Event-driven fan-out — Publish connection events to Kafka; workers update caches, feeds, recommendations, and search indexes asynchronously.
•Multi-tier caching is essential — L1 in-process, L2 distributed cache, L3 CDN for aggregated data. Cache is the read path; database is persistence.
•Operational resilience — Circuit breakers, graceful degradation, load shedding, and comprehensive monitoring prevent cascading failures.
•Supernode handling — Special storage, rate limiting, and processing for users with 10K+ connections.
•Proactive capacity planning — Model growth, set thresholds, and add capacity before constraints become incidents.

Module Complete: LinkedIn Connections