Loading content...
LinkedIn processes tens of billions of graph operations daily across a network of 900+ million members and 50+ billion connections. This isn't just about storing large amounts of data—it's about maintaining sub-second latency while the graph grows by millions of new connections per day, handling 65+ million daily active users, and ensuring zero data loss for professional relationships that matter to careers.
Scaling a social graph is uniquely challenging because:
This page explores the architecture and techniques that enable professional networks to scale reliably.
By the end of this page, you will understand horizontal scaling strategies for graph data, sharding approaches and their trade-offs, consistency vs availability decisions, real-time update propagation, operational practices for reliability, and capacity planning for growth.
Horizontal scaling—adding more machines rather than bigger machines—is the only viable approach for social graphs at LinkedIn scale. The architecture must distribute data and computation across thousands of servers while presenting a unified view to applications.
| Component | Scaling Strategy | Typical Scale | Key Challenge |
|---|---|---|---|
| API Servers | Stateless, horizontal | 1000+ instances | Session affinity if needed |
| Connection Service | Stateless, horizontal | 500+ instances | Database connection pooling |
| Database Shards | Horizontal partitioning | 1000+ shards | Cross-shard queries |
| Cache Cluster | Consistent hashing | 1000+ nodes | Cache coherence |
| Event Queue | Partitioned topics | 100+ partitions | Ordering guarantees |
| Worker Pool | Horizontal, autoscaled | Variable | Backpressure handling |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
// Connection Service scaling configurationinterface ServiceConfig { // Horizontal scaling minInstances: number; maxInstances: number; targetCPU: number; // Target CPU utilization % targetRPS: number; // Target requests per second per instance // Connection pooling dbConnectionsPerInstance: number; cacheConnectionsPerInstance: number; // Circuit breakers circuitBreakerThreshold: number; circuitBreakerTimeout: number; // Rate limiting rateLimitPerUser: number; rateLimitBurst: number;} const connectionServiceConfig: ServiceConfig = { // Scale between 200-800 instances based on load minInstances: 200, maxInstances: 800, targetCPU: 60, // Keep headroom for spikes targetRPS: 5000, // 5K RPS per instance // Each instance maintains pools dbConnectionsPerInstance: 50, cacheConnectionsPerInstance: 100, // Fail fast on downstream issues circuitBreakerThreshold: 0.5, // 50% error rate trips circuitBreakerTimeout: 30000, // 30s before retry // Protect against abuse rateLimitPerUser: 100, // 100 ops/minute rateLimitBurst: 50, // Allow bursts}; // Autoscaling logicclass AutoScaler { async evaluateScale(metrics: ServiceMetrics): Promise<ScaleDecision> { const currentInstances = metrics.instanceCount; // Scale up triggers if (metrics.avgCPU > 75 || metrics.avgLatencyP99 > 500) { const newCount = Math.min( currentInstances * 1.5, connectionServiceConfig.maxInstances ); return { action: 'scale_up', targetInstances: Math.ceil(newCount) }; } // Scale down triggers (conservative) if (metrics.avgCPU < 30 && metrics.avgLatencyP99 < 100) { const newCount = Math.max( currentInstances * 0.8, connectionServiceConfig.minInstances ); return { action: 'scale_down', targetInstances: Math.floor(newCount) }; } return { action: 'maintain', targetInstances: currentInstances }; }}Sharding the connection graph across multiple databases is essential but challenging. Unlike independent records, connections create dependencies between users that complicate partitioning.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
// User-based sharding with edge replicationclass ShardRouter { private shardCount: number; private shardConnections: Map<number, DatabasePool>; private consistentHash: ConsistentHashRing; constructor(shardCount: number) { this.shardCount = shardCount; this.consistentHash = new ConsistentHashRing(shardCount, 150); // 150 virtual nodes } // Determine shard for a user getShardForUser(userId: string): number { return this.consistentHash.getNode(userId); } // Get shards involved in a connection getShardsForConnection(userId1: string, userId2: string): number[] { const shard1 = this.getShardForUser(userId1); const shard2 = this.getShardForUser(userId2); if (shard1 === shard2) { return [shard1]; // Lucky - same shard } return [shard1, shard2]; // Cross-shard connection } // Query that must touch multiple shards async queryAcrossShards<T>( userIds: string[], query: (db: DatabasePool, userIds: string[]) => Promise<T[]> ): Promise<T[]> { // Group by shard const byShards = new Map<number, string[]>(); for (const userId of userIds) { const shard = this.getShardForUser(userId); if (!byShards.has(shard)) byShards.set(shard, []); byShards.get(shard)!.push(userId); } // Parallel queries to each shard const results = await Promise.all( Array.from(byShards.entries()).map(([shard, ids]) => query(this.shardConnections.get(shard)!, ids) ) ); return results.flat(); }} // Connection write with edge replicationclass ConnectionWriter { private router: ShardRouter; private eventPublisher: EventPublisher; async createConnection( userId1: string, userId2: string, metadata: ConnectionMetadata ): Promise<void> { const shards = this.router.getShardsForConnection(userId1, userId2); const connectionId = this.generateConnectionId(userId1, userId2); const connection = { id: connectionId, userId1: this.canonicalFirst(userId1, userId2), userId2: this.canonicalSecond(userId1, userId2), connectedAt: new Date(), ...metadata, }; if (shards.length === 1) { // Same shard - single write await this.writeSingleShard(shards[0], connection); } else { // Cross-shard - need distributed transaction or eventual consistency await this.writeCrossShard(shards, connection); } // Publish event for async processing await this.eventPublisher.publish('connection.created', { connectionId, userId1, userId2, timestamp: Date.now(), }); } // Two-phase commit for cross-shard writes private async writeCrossShard( shards: number[], connection: Connection ): Promise<void> { const txId = this.generateTxId(); try { // Phase 1: Prepare on both shards await Promise.all(shards.map(shard => this.prepare(shard, txId, connection) )); // Phase 2: Commit on both await Promise.all(shards.map(shard => this.commit(shard, txId) )); } catch (error) { // Rollback on failure await Promise.all(shards.map(shard => this.rollback(shard, txId).catch(() => {}) )); throw error; } } // Alternative: Saga pattern with compensating actions private async writeCrossShardSaga( shards: number[], connection: Connection ): Promise<void> { const completedShards: number[] = []; try { for (const shard of shards) { await this.writeSingleShard(shard, connection); completedShards.push(shard); } } catch (error) { // Compensate: Delete from completed shards for (const shard of completedShards) { await this.deleteSingleShard(shard, connection.id).catch(() => {}); } throw error; } }}Adding or removing shards requires moving large amounts of data while serving live traffic. Consistent hashing minimizes movement, but even moving 1% of 50 billion edges means relocating 500 million records. LinkedIn uses gradual migration with dual-writes and careful cutover procedures.
The CAP theorem forces distributed systems to choose between consistency and availability during network partitions. For social graphs, different operations have different requirements.
| Operation | Consistency Need | Availability Need | Strategy |
|---|---|---|---|
| Connection Create | Strong (no duplicates) | High | Synchronous replication, retry on failure |
| Connection Read (own) | Strong | Very High | Read from primary, cache heavily |
| Connection Read (others) | Eventual OK | Very High | Read from replica, stale OK |
| PYMK | Eventual | High | Precomputed, hourly refresh OK |
| Mutual Connections | Near-real-time | High | Async update, 5s delay OK |
| Connection Count | Eventual | Very High | Cached, periodic refresh |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
// Different consistency levels for different operationsenum ConsistencyLevel { STRONG = 'strong', // Read from primary, synchronous write SESSION = 'session', // Consistent within user's session EVENTUAL = 'eventual', // Read from any replica CAUSAL = 'causal', // Respects causal ordering} class ConnectionStore { private primary: Database; private replicas: Database[]; private cache: CacheCluster; // Strong consistency for mutations async createConnection( conn: ConnectionCreate ): Promise<Connection> { // Write to primary with synchronous replication to at least one replica const result = await this.primary.insert('connections', conn, { replicationFactor: 2, waitForReplication: true, }); // Invalidate caches immediately await this.invalidateCaches(conn.userId1, conn.userId2); return result; } // Read with session consistency for own data async getMyConnections( userId: string, sessionToken: string ): Promise<Connection[]> { // Check if user recently made changes const lastWriteTime = await this.getLastWriteTime(userId, sessionToken); if (lastWriteTime && Date.now() - lastWriteTime < 5000) { // Recent write - read from primary to see own changes return this.primary.query( 'SELECT * FROM connections WHERE user_id = ?', [userId] ); } // No recent writes - replica is fine return this.readFromReplica(userId); } // Eventual consistency for others' data async getOtherUserConnections( viewerId: string, targetId: string ): Promise<Connection[]> { // Try cache first const cached = await this.cache.get(`connections:${targetId}`); if (cached) return JSON.parse(cached); // Read from any replica const connections = await this.readFromReplica(targetId); // Cache with short TTL await this.cache.setex( `connections:${targetId}`, 300, // 5 minutes JSON.stringify(connections) ); return connections; } private async readFromReplica(userId: string): Promise<Connection[]> { // Round-robin or least-loaded replica const replica = this.selectReplica(); return replica.query( 'SELECT * FROM connections WHERE user_id = ?', [userId] ); } // Handle split-brain scenarios async reconcileAfterPartition( shard1Data: Connection[], shard2Data: Connection[] ): Promise<void> { // Use timestamp-based last-write-wins for conflict resolution const byId = new Map<string, Connection>(); for (const conn of [...shard1Data, ...shard2Data]) { const existing = byId.get(conn.id); if (!existing || conn.updatedAt > existing.updatedAt) { byId.set(conn.id, conn); } } // Write reconciled state to both shards const reconciled = Array.from(byId.values()); await Promise.all([ this.primary.bulkUpsert('connections', reconciled), this.replicas[0].bulkUpsert('connections', reconciled), ]); }} // Causal consistency for distributed updatesclass CausalConsistencyManager { private vectorClocks: Map<string, VectorClock> = new Map(); // Track causality for connection updates recordUpdate( userId: string, operation: Operation ): VectorClock { let clock = this.vectorClocks.get(userId); if (!clock) { clock = new VectorClock(); this.vectorClocks.set(userId, clock); } clock.increment(userId); return clock.copy(); } // Ensure causal order on reads async waitForCausality( userId: string, expectedClock: VectorClock ): Promise<void> { const currentClock = this.vectorClocks.get(userId); while (!currentClock?.happensBefore(expectedClock)) { await sleep(10); // Timeout after 5s } }}When a connection is created, the update must propagate through multiple systems: caches, feeds, recommendations, search indexes, and more. This fan-out must be reliable, fast, and scalable.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
// Event schema for connection changesinterface ConnectionEvent { eventId: string; eventType: 'created' | 'removed'; connectionId: string; userId1: string; userId2: string; timestamp: number; metadata: { source: string; // 'pymk', 'search', etc. traceId: string; // For distributed tracing };} // Publisher ensures at-least-once deliveryclass ConnectionEventPublisher { private kafka: KafkaProducer; private topic = 'connection-events'; async publish(event: ConnectionEvent): Promise<void> { // Partition by userId1 to maintain ordering per user const partition = this.getPartition(event.userId1); await this.kafka.send({ topic: this.topic, partition, messages: [{ key: event.userId1, value: JSON.stringify(event), headers: { eventType: event.eventType, timestamp: String(event.timestamp), }, }], }); } private getPartition(userId: string): number { return murmurhash(userId) % this.partitionCount; }} // Consumer workers process updatesclass CacheInvalidationWorker { private redis: RedisCluster; private consumer: KafkaConsumer; async process(event: ConnectionEvent): Promise<void> { const { userId1, userId2 } = event; // Invalidate all affected caches const keysToInvalidate = [ // Connection lists `connections:${userId1}`, `connections:${userId2}`, // Connection counts `connection_count:${userId1}`, `connection_count:${userId2}`, // Mutual connections for their mutual friends // (This is expensive - might do async/batch) ...await this.getMutualCacheKeys(userId1, userId2), // PYMK caches `pymk:${userId1}`, `pymk:${userId2}`, ]; // Batch delete for efficiency await this.redis.del(...keysToInvalidate); // Optionally, pre-warm critical caches await this.prewarmCaches(userId1, userId2); } private async prewarmCaches( userId1: string, userId2: string ): Promise<void> { // Pre-fetch and cache connection lists // Prioritize if users are active const [activity1, activity2] = await Promise.all([ this.getUserActivity(userId1), this.getUserActivity(userId2), ]); if (activity1.isActive) { this.backgroundPrewarm(userId1); } if (activity2.isActive) { this.backgroundPrewarm(userId2); } }} class RecommendationUpdateWorker { private recStore: RecommendationStore; private graphStore: GraphStore; async process(event: ConnectionEvent): Promise<void> { const { userId1, userId2, eventType } = event; if (eventType === 'created') { // Remove each other from PYMK (already connected) await Promise.all([ this.recStore.removePYMKCandidate(userId1, userId2), this.recStore.removePYMKCandidate(userId2, userId1), ]); // Boost mutual friends in PYMK (common connections = more likely to connect) await this.boostMutualFriends(userId1, userId2); } else { // Connection removed - they might want to reconnect // Don't immediately re-add to PYMK (feels stalky) await this.scheduleForRecomputeLater(userId1, userId2); } } private async boostMutualFriends( userId1: string, userId2: string ): Promise<void> { // Get connections of userId1 who aren't connected to userId2 // These are now 2nd-degree via the new connection const conn1 = await this.graphStore.getConnectionSet(userId1); const conn2 = await this.graphStore.getConnectionSet(userId2); // Connections of user1 not connected to user2 const newSecondDegreeFor2 = [...conn1].filter(c => !conn2.has(c) && c !== userId2 ); // Boost these in user2's PYMK for (const candidate of newSecondDegreeFor2.slice(0, 100)) { await this.recStore.boostPYMKScore(userId2, candidate, 0.1); } // Same for user1 const newSecondDegreeFor1 = [...conn2].filter(c => !conn1.has(c) && c !== userId1 ); for (const candidate of newSecondDegreeFor1.slice(0, 100)) { await this.recStore.boostPYMKScore(userId1, candidate, 0.1); } }} // Ensure exactly-once processing with idempotencyclass IdempotentProcessor { private processedEvents: BloomFilter; private processedStore: KeyValueStore; async processOnce( event: ConnectionEvent, handler: (e: ConnectionEvent) => Promise<void> ): Promise<void> { // Quick check with Bloom filter (no false negatives) if (this.processedEvents.mightContain(event.eventId)) { // Might be duplicate - check definitive store if (await this.processedStore.exists(event.eventId)) { return; // Already processed, skip } } // Process the event await handler(event); // Mark as processed this.processedEvents.add(event.eventId); await this.processedStore.set(event.eventId, '1', { ttl: 86400 * 7 }); }}When a supernode (user with 30K connections) adds a new connection, updating all derived data naively would touch millions of records. LinkedIn limits eager updates to the most important derived data (caches, active user feeds) and handles the rest lazily or in scheduled batch jobs.
Caching is not optional at LinkedIn scale—it's the primary read path. The database is a persistence layer; the cache is the query layer.
| Tier | Technology | Latency | Hit Rate Target | Data |
|---|---|---|---|---|
| L1 (In-Process) | Local LRU | < 1ms | 30-40% | Hot data per instance |
| L2 (Distributed) | Redis Cluster | 1-5ms | 80-90% | All frequently accessed |
| L3 (CDN) | Akamai/CloudFront | 5-50ms | 95%+ for static | Aggregated, static |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150
class MultiTierCache { private l1: LRUCache<string, any>; // In-process private l2: RedisCluster; // Distributed private metrics: CacheMetrics; constructor(l1Size: number = 10000) { this.l1 = new LRUCache({ max: l1Size, ttl: 60000 }); } async get<T>(key: string): Promise<T | null> { // L1: In-process cache const l1Value = this.l1.get(key) as T | undefined; if (l1Value !== undefined) { this.metrics.recordHit('l1'); return l1Value; } // L2: Redis const l2Value = await this.l2.get(key); if (l2Value !== null) { this.metrics.recordHit('l2'); const parsed = JSON.parse(l2Value) as T; // Populate L1 this.l1.set(key, parsed); return parsed; } this.metrics.recordMiss(); return null; } async set<T>( key: string, value: T, options: CacheOptions = {} ): Promise<void> { const { ttl = 3600 } = options; // Write to both tiers this.l1.set(key, value, { ttl: Math.min(ttl, 60) * 1000 }); await this.l2.setex(key, ttl, JSON.stringify(value)); } async invalidate(keys: string[]): Promise<void> { // L1: Invalidate locally for (const key of keys) { this.l1.delete(key); } // L2: Invalidate in Redis if (keys.length > 0) { await this.l2.del(...keys); } // Broadcast invalidation to other instances await this.publishInvalidation(keys); } // Handle invalidation broadcasts private async publishInvalidation(keys: string[]): Promise<void> { await this.l2.publish('cache:invalidate', JSON.stringify(keys)); } // Subscribe to invalidations from other instances subscribeToInvalidations(): void { this.l2.subscribe('cache:invalidate', (message) => { const keys = JSON.parse(message) as string[]; for (const key of keys) { this.l1.delete(key); } }); }} // Cache warming for predictable access patternsclass CacheWarmer { private cache: MultiTierCache; private graphStore: GraphStore; // Pre-warm caches during low-traffic periods async warmDailyActiveUsers(): Promise<void> { const activeUsers = await this.getActiveUserIds(); // Batch processing with concurrency limit const concurrency = 100; for (let i = 0; i < activeUsers.length; i += concurrency) { const batch = activeUsers.slice(i, i + concurrency); await Promise.all(batch.map(userId => this.warmUserCache(userId))); } } private async warmUserCache(userId: string): Promise<void> { // Fetch and cache connection data const [connections, connectionCount] = await Promise.all([ this.graphStore.getConnections(userId), this.graphStore.getConnectionCount(userId), ]); await Promise.all([ this.cache.set(`connections:${userId}`, connections, { ttl: 3600 }), this.cache.set(`connection_count:${userId}`, connectionCount, { ttl: 3600 }), ]); } // Warm cache before anticipated traffic spikes async warmBeforeEvent(eventAttendees: string[]): Promise<void> { // People at events often view each other's profiles // Pre-warm their connection data for (const userId of eventAttendees) { await this.warmUserCache(userId); } }} // Cache stampede preventionclass StampedeProtection { private locks: Map<string, Promise<any>> = new Map(); async getWithProtection<T>( key: string, fetcher: () => Promise<T>, cache: MultiTierCache ): Promise<T> { // Check cache const cached = await cache.get<T>(key); if (cached !== null) return cached; // Check if fetch is in progress if (this.locks.has(key)) { return this.locks.get(key) as Promise<T>; } // Start fetch with lock const fetchPromise = (async () => { try { const value = await fetcher(); await cache.set(key, value); return value; } finally { this.locks.delete(key); } })(); this.locks.set(key, fetchPromise); return fetchPromise; }}Running social graphs at scale requires rigorous operational practices. Failures are inevitable; the goal is to minimize impact and recover quickly.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
// Circuit breaker implementationclass CircuitBreaker { private state: 'closed' | 'open' | 'half-open' = 'closed'; private failures = 0; private lastFailure: number = 0; private successesInHalfOpen = 0; constructor( private threshold: number = 5, private timeout: number = 30000, private successThreshold: number = 3 ) {} async execute<T>(fn: () => Promise<T>): Promise<T> { if (!this.canExecute()) { throw new CircuitOpenError('Circuit breaker is open'); } try { const result = await fn(); this.recordSuccess(); return result; } catch (error) { this.recordFailure(); throw error; } } private canExecute(): boolean { if (this.state === 'closed') return true; if (this.state === 'open') { if (Date.now() - this.lastFailure > this.timeout) { this.state = 'half-open'; return true; } return false; } return true; // half-open allows limited traffic } private recordSuccess(): void { if (this.state === 'half-open') { this.successesInHalfOpen++; if (this.successesInHalfOpen >= this.successThreshold) { this.state = 'closed'; this.failures = 0; this.successesInHalfOpen = 0; } } else { this.failures = 0; } } private recordFailure(): void { this.failures++; this.lastFailure = Date.now(); if (this.failures >= this.threshold) { this.state = 'open'; this.successesInHalfOpen = 0; } }} // Graceful degradation with fallbacksclass GracefulConnectionService { private db: ConnectionStore; private cache: MultiTierCache; private circuitBreaker: CircuitBreaker; async getConnections(userId: string): Promise<ConnectionResult> { try { return await this.circuitBreaker.execute(async () => { // Try cache first const cached = await this.cache.get<Connection[]>(`connections:${userId}`); if (cached) { return { connections: cached, source: 'cache', stale: false }; } // Query database const connections = await this.db.getConnections(userId); await this.cache.set(`connections:${userId}`, connections); return { connections, source: 'database', stale: false }; }); } catch (error) { if (error instanceof CircuitOpenError) { // Database is down - try stale cache return this.getStaleOrEmpty(userId); } throw error; } } private async getStaleOrEmpty(userId: string): Promise<ConnectionResult> { // Try to get from cache even if expired const stale = await this.cache.getStale<Connection[]>(`connections:${userId}`); if (stale) { return { connections: stale, source: 'stale_cache', stale: true }; } // Last resort: return empty with degradation notice return { connections: [], source: 'degraded', stale: true }; }} // Load shedding under pressureclass LoadShedder { private requestCounter = 0; private successCounter = 0; private windowStart = Date.now(); private windowSize = 1000; // 1 second shouldShed(priority: 'high' | 'medium' | 'low'): boolean { this.maybeRotateWindow(); const errorRate = 1 - (this.successCounter / Math.max(this.requestCounter, 1)); // Shed low-priority requests first if (priority === 'low' && errorRate > 0.1) return true; if (priority === 'medium' && errorRate > 0.3) return true; if (priority === 'high' && errorRate > 0.5) return true; return false; } recordRequest(success: boolean): void { this.requestCounter++; if (success) this.successCounter++; } private maybeRotateWindow(): void { if (Date.now() - this.windowStart > this.windowSize) { this.requestCounter = 0; this.successCounter = 0; this.windowStart = Date.now(); } }}Social networks grow exponentially—both in users and in graph density (connections per user). Capacity planning must account for this growth while maintaining cost efficiency.
| Factor | Growth Model | Planning Horizon | Action Trigger |
|---|---|---|---|
| User Count | +10% annually | 18 months | 70% threshold |
| Connections/User | +5% annually | 12 months | Average > 600 |
| Daily Active Users | Seasonal + growth | 6 months | Peak vs provisioned |
| Query Complexity | Feature-driven | Per release | Latency regression |
| Storage | Linear with edges | 24 months | 80% capacity |
| Cache Size | Grows with active | 12 months | Hit rate < target |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125
interface CapacityModel { currentState: SystemState; growthAssumptions: GrowthAssumptions; planningHorizonMonths: number;} interface SystemState { users: number; avgConnectionsPerUser: number; dailyActiveUsers: number; peakQPS: number; storageGB: number; shardCount: number; cacheNodeCount: number;} interface GrowthAssumptions { userGrowthRate: number; // Annual % connectionGrowthRate: number; // Annual % increase in avg connections dauGrowthRate: number; // Annual % queryComplexityGrowth: number; // Annual % (new features add cost)} class CapacityPlanner { projectCapacityNeeds( model: CapacityModel ): CapacityProjection[] { const { currentState, growthAssumptions, planningHorizonMonths } = model; const projections: CapacityProjection[] = []; for (let month = 1; month <= planningHorizonMonths; month++) { const yearFraction = month / 12; // Project growth const users = currentState.users * Math.pow( 1 + growthAssumptions.userGrowthRate, yearFraction ); const avgConnections = currentState.avgConnectionsPerUser * Math.pow( 1 + growthAssumptions.connectionGrowthRate, yearFraction ); const totalConnections = users * avgConnections; const dau = currentState.dailyActiveUsers * Math.pow( 1 + growthAssumptions.dauGrowthRate, yearFraction ); const qps = this.estimateQPS(dau, avgConnections); // Storage: ~100 bytes per connection edge const storageGB = (totalConnections * 100) / (1024 * 1024 * 1024); // Shards: target 100GB per shard const recommendedShards = Math.ceil(storageGB / 100); // Cache: 10% of data should be cacheable, 50 bytes per entry avg const cacheEntriesNeeded = users * 0.1; const cacheMemoryGB = (cacheEntriesNeeded * 50) / (1024 * 1024 * 1024); const cacheNodes = Math.ceil(cacheMemoryGB / 64); // 64GB per node projections.push({ month, users: Math.round(users), connections: Math.round(totalConnections), dau: Math.round(dau), peakQPS: Math.round(qps), storageGB: Math.round(storageGB), recommendedShards, recommendedCacheNodes: cacheNodes, estimatedCost: this.estimateCost({ shards: recommendedShards, cacheNodes, computeInstances: Math.ceil(qps / 5000), }), }); } return projections; } private estimateQPS(dau: number, avgConnections: number): number { // DAU generates ~20 connection-related queries/day on average const dailyQueries = dau * 20; // Peak is 5x average, concentrated in 4 hours const avgQPS = dailyQueries / 86400; const peakQPS = avgQPS * 5; // Complexity factor: more connections = more complex queries const complexityFactor = 1 + (avgConnections / 1000); return peakQPS * complexityFactor; } generateAlerts(projections: CapacityProjection[]): CapacityAlert[] { const alerts: CapacityAlert[] = []; const current = projections[0]; for (const projection of projections) { // Storage alert: 6 months before 80% capacity if (projection.storageGB > current.storageGB * 0.8) { alerts.push({ type: 'storage', severity: projection.month < 6 ? 'critical' : 'warning', message: `Storage will reach 80% capacity in ${projection.month} months`, recommendation: `Add ${projection.recommendedShards - current.recommendedShards} shards`, }); break; } } // Similar alerts for compute, cache, etc. return alerts; }} interface CapacityProjection { month: number; users: number; connections: number; dau: number; peakQPS: number; storageGB: number; recommendedShards: number; recommendedCacheNodes: number; estimatedCost: CostEstimate;}We've explored the complete scaling architecture for professional social networks. Let's consolidate the key learnings:
Congratulations! You've completed the LinkedIn Connections module. You now understand how to design a professional social network from requirements through scaling. Key skills covered: graph storage and traversal, degree of separation queries, recommendation algorithms, sharding strategies, event-driven architecture, and operational excellence. These patterns apply broadly to any social graph system at scale.