Facebook Newsfeed - Learning Module

Loading content...

0/273

Privacy Considerations

Privacy as an Architectural Constraint

In 2018, the Cambridge Analytica scandal revealed that Facebook had allowed third parties to harvest data from millions of users without proper consent. The fallout was catastrophic: billions in fines, congressional hearings, and a fundamental erosion of user trust.

This wasn't just a policy failure—it was an architectural failure. The system was designed for engagement and growth, with privacy as an afterthought. The lesson for system designers is clear:

Privacy cannot be bolted on. It must be architected in from the ground up.

For feed systems specifically, privacy is uniquely challenging because personalization inherently requires understanding users deeply. The ranking model needs to know your interests, your relationships, your behaviors. But users simultaneously want personalization AND privacy—a tension that requires careful architectural solutions.

What You Will Learn

By the end of this page, you will understand privacy-by-design principles for feed systems, master content visibility enforcement at scale, explore privacy-preserving ML techniques, and learn about regulatory compliance (GDPR, CCPA) from an engineering perspective.

Privacy Threat Model

Before designing privacy protections, we must understand what we're protecting against. A feed system faces multiple privacy threats from different actors.

Privacy Threat Categories
Threat Actor	Attack Vector	Impact	Mitigation
Other Users	Viewing private content	Privacy violation	Access control enforcement
Scrapers	Mass harvesting public data	Data aggregation	Rate limiting, bot detection
Third-party Apps	API data extraction	Data misuse	Scoped permissions, audit logging
Advertisers	Over-targeting	User surveillance feel	Targeting restrictions, transparency
Insider Threats	Employee data access	Unauthorized viewing	Access logging, need-to-know
Attackers	Unauthorized access	Data breach	Encryption, auth, monitoring
The Platform Itself	Excessive data collection	Trust erosion	Data minimization, purpose limits

1.1 Data Sensitivity Classification

Data Sensitivity Tiers

•Tier 1 (Public) — Public posts, public profile info, page content → Anyone can access
•Tier 2 (Friends) — Friend-only posts, friend list, photos → Only approved connections
•Tier 3 (Personal) — Messages, private posts, activity log → Only the user
•Tier 4 (Sensitive) — Health info, political affiliation, racial data → Special regulations apply
•Tier 5 (System) — Passwords, tokens, internal IDs → Never exposed to users

data_classification.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Data classification system
 
enum DataTier {
  PUBLIC = 1,      // Visible to all
  FRIENDS = 2,     // Visible to connections
  PERSONAL = 3,    // Visible to owner only
  SENSITIVE = 4,   // Regulated data
  SYSTEM = 5,      // Internal only
}
 
interface DataClassification {
  field: string;
  tier: DataTier;
  piiType?: PiiType;
  retentionPolicy: RetentionPolicy;
  encryptionRequired: boolean;
  auditRequired: boolean;
}
 
const classificationSchema: DataClassification[] = [
  // Profile data
  { field: 'profile.name', tier: DataTier.PUBLIC, piiType: 'name', retentionPolicy: 'until_deletion', encryptionRequired: false, auditRequired: false },
  { field: 'profile.email', tier: DataTier.PERSONAL, piiType: 'email', retentionPolicy: 'until_deletion', encryptionRequired: true, auditRequired: true },
  { field: 'profile.phone', tier: DataTier.PERSONAL, piiType: 'phone', retentionPolicy: 'until_deletion', encryptionRequired: true, auditRequired: true },
  
  // Post data
  { field: 'post.content', tier: DataTier.FRIENDS, retentionPolicy: 'until_deletion', encryptionRequired: false, auditRequired: false },
  { field: 'post.location', tier: DataTier.SENSITIVE, piiType: 'location', retentionPolicy: '90_days', encryptionRequired: true, auditRequired: true },
  
  // Behavioral data
  { field: 'activity.pages_viewed', tier: DataTier.PERSONAL, retentionPolicy: '30_days', encryptionRequired: false, auditRequired: false },
  { field: 'activity.search_history', tier: DataTier.PERSONAL, piiType: 'behavior', retentionPolicy: '90_days', encryptionRequired: true, auditRequired: true },
  
  // System data
  { field: 'auth.password_hash', tier: DataTier.SYSTEM, retentionPolicy: 'until_deletion', encryptionRequired: true, auditRequired: true },
  { field: 'auth.session_token', tier: DataTier.SYSTEM, retentionPolicy: '30_days', encryptionRequired: true, auditRequired: false },
];

Aggregation Risk

Individual data points may be innocuous, but aggregating them creates privacy risks. Knowing someone's city, age, employer, and interests can uniquely identify them. Feed systems must consider aggregation attacks when designing data access controls.

Access Control Architecture

The most fundamental privacy protection is ensuring users can only see content they're authorized to see. In a feed system, this means enforcing visibility rules on every post, for every viewer, in every context.

2.1 Visibility Model

visibility_model.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// Post visibility model
 
enum VisibilityLevel {
  PUBLIC = 'public',           // Anyone on/off Facebook
  FRIENDS = 'friends',         // Approved connections only
  FRIENDS_OF_FRIENDS = 'fof',  // 2-degree connections
  FRIENDS_EXCEPT = 'friends_except',  // Friends minus exceptions
  SPECIFIC_FRIENDS = 'specific',      // Whitelist only
  ONLY_ME = 'only_me',         // Private to author
}
 
interface PostVisibility {
  level: VisibilityLevel;
  
  // For complex visibility rules
  includedLists?: string[];    // Custom friend lists to include
  excludedUsers?: string[];    // Specific users to block
  
  // Derived visibility (cached for performance)
  allowedViewers?: string[];   // Pre-computed for small whitelists
}
 
// Visibility check is called on EVERY post in feed
async function canViewPost(
  viewerId: string, 
  post: Post
): Promise<boolean> {
  
  // Fast path: Public posts
  if (post.visibility.level === VisibilityLevel.PUBLIC) {
    return !await isBlockedBy(viewerId, post.authorId);
  }
  
  // Author can always view their own posts
  if (viewerId === post.authorId) {
    return true;
  }
  
  // Only Me = only author
  if (post.visibility.level === VisibilityLevel.ONLY_ME) {
    return false;
  }
  
  // Check if blocked (mutual blocks hide all content)
  if (await isBlockedBy(viewerId, post.authorId) || 
      await isBlockedBy(post.authorId, viewerId)) {
    return false;
  }
  
  // Friends visibility
  if (post.visibility.level === VisibilityLevel.FRIENDS) {
    return await areFriends(viewerId, post.authorId);
  }
  
  // Friends of friends
  if (post.visibility.level === VisibilityLevel.FRIENDS_OF_FRIENDS) {
    return await areFriendsOrFof(viewerId, post.authorId);
  }
  
  // Friends except specific users
  if (post.visibility.level === VisibilityLevel.FRIENDS_EXCEPT) {
    if (post.visibility.excludedUsers?.includes(viewerId)) {
      return false;
    }
    return await areFriends(viewerId, post.authorId);
  }
  
  // Specific friends only (whitelist)
  if (post.visibility.level === VisibilityLevel.SPECIFIC_FRIENDS) {
    return post.visibility.allowedViewers?.includes(viewerId) || 
           await isInFriendList(viewerId, post.visibility.includedLists);
  }
  
  return false;  // Default deny
}

2.2 Access Control Enforcement Points

Converting Mermaid diagram...

Defense in Depth: Multiple Enforcement Points

•Write Time — Visibility rules stored with post; validated for consistency
•Candidate Generation — Only aggregate posts from visible sources (friends, public pages)
•Pre-Ranking Filter — Remove obviously invisible posts before expensive ML inference
•Post-Ranking Filter — Double-check visibility after ranking (catch edge cases)
•Serving Time — Final check before sending to client (last line of defense)
•Client Validation — Client-side checks for extra safety (not relied upon for security)

The 'Never Leak' Principle

A single bug that shows private content to unauthorized users is catastrophic. Defense in depth means even if one check fails, others catch it. Privacy bugs often arise in edge cases: cached data after unfriend, race conditions during privacy changes, or new features that bypass checks.

Block and Privacy State Propagation

When a user blocks someone or changes their privacy settings, that change must propagate instantly across the distributed system. This is one of the few areas where strong consistency is non-negotiable.

3.1 Block Implementation

block_implementation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// Block action implementation
 
class BlockService {
  private socialGraph: SocialGraphService;
  private feedCache: FeedCacheService;
  private eventBus: EventBus;
  
  async blockUser(blockerId: string, blockedId: string): Promise<void> {
    // 1. Create block edge in social graph (strongly consistent)
    await this.socialGraph.createEdge({
      from: blockerId,
      to: blockedId,
      type: 'block',
      createdAt: Date.now(),
    });
    
    // 2. Remove any existing friendship
    await this.socialGraph.removeEdge(blockerId, blockedId, 'friend');
    await this.socialGraph.removeEdge(blockedId, blockerId, 'friend');
    
    // 3. Invalidate feed caches (immediate)
    await Promise.all([
      this.feedCache.invalidate(blockerId),
      this.feedCache.invalidate(blockedId),
    ]);
    
    // 4. Publish event for async cleanup
    await this.eventBus.publish('user:blocked', {
      blockerId,
      blockedId,
      timestamp: Date.now(),
    });
    
    // 5. Kill any active sessions that might serve stale data
    await this.invalidateActiveSessions(blockerId, blockedId);
  }
  
  // Async cleanup jobs triggered by block event
  async handleBlockEvent(event: BlockEvent) {
    const { blockerId, blockedId } = event;
    
    // Remove blocked user's posts from blocker's feed cache
    await this.scrubFeedCache(blockerId, blockedId);
    
    // Remove blocker's posts from blocked user's feed cache
    await this.scrubFeedCache(blockedId, blockerId);
    
    // Remove from any shared group activity
    await this.scrubGroupActivity(blockerId, blockedId);
    
    // Clear any pending notifications between users
    await this.clearPendingNotifications(blockerId, blockedId);
    
    // Remove from 'People You May Know' suggestions
    await this.removeSuggestions(blockerId, blockedId);
  }
  
  // Real-time session invalidation
  async invalidateActiveSessions(user1: string, user2: string) {
    // Push invalidation message to connected clients
    await this.pushService.sendToUser(user1, {
      type: 'privacy_state_change',
      action: 'refresh_feed',
    });
    
    await this.pushService.sendToUser(user2, {
      type: 'privacy_state_change', 
      action: 'refresh_feed',
    });
  }
}

3.2 Privacy Settings Changes

privacy_settings.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Privacy settings change propagation
 
class PrivacySettingsService {
  
  async changeDefaultPostVisibility(
    userId: string, 
    newDefault: VisibilityLevel
  ): Promise<void> {
    // 1. Update user settings
    await this.userSettings.update(userId, {
      defaultPostVisibility: newDefault,
    });
    
    // No retroactive changes needed - affects future posts only
  }
  
  async changeExistingPostVisibility(
    userId: string,
    postId: string,
    newVisibility: PostVisibility
  ): Promise<void> {
    const post = await this.postStore.get(postId);
    
    // Verify ownership
    if (post.authorId !== userId) {
      throw new UnauthorizedError('Cannot modify others posts');
    }
    
    // 1. Update post visibility (sync)
    await this.postStore.update(postId, {
      visibility: newVisibility,
    });
    
    // 2. Determine affected users
    const previouslyVisible = await this.getVisibleUsers(post.visibility);
    const nowVisible = await this.getVisibleUsers(newVisibility);
    
    // Users who can no longer see the post
    const newlyHidden = previouslyVisible.filter(u => !nowVisible.includes(u));
    
    // Users who can now see the post
    const newlyVisible = nowVisible.filter(u => !previouslyVisible.includes(u));
    
    // 3. Invalidate affected feed caches
    for (const affectedUser of newlyHidden) {
      await this.feedCache.removePost(affectedUser, postId);
    }
    
    // 4. Potentially add to newly visible users' feeds
    // (handled by next aggregation cycle - eventual consistency OK for additions)
  }
  
  async makePastPostsMorePrivate(
    userId: string,
    newLevel: VisibilityLevel
  ): Promise<void> {
    // "Limit past posts" feature - make all historical posts more private
    
    // 1. Queue batch job (too many posts to do synchronously)
    await this.jobQueue.enqueue('privacy:batch_update', {
      userId,
      newVisibility: newLevel,
      createdBefore: Date.now(),
    });
    
    // 2. Immediately invalidate all feed caches that might have old posts
    // This is expensive but necessary for privacy guarantees
    const allFriends = await this.socialGraph.getFriends(userId);
    await this.feedCache.invalidateBatch(allFriends);
    
    // 3. Return immediately - user gets confirmation
    // Batch job processes posts over minutes/hours
  }
}

Privacy vs Performance Trade-off

Privacy changes like blocking or visibility updates trigger expensive cache invalidations. A user with 1000 friends changing post visibility might invalidate 1000 feed caches. This is accepted as necessary—privacy correctness trumps performance.

Privacy-Preserving Machine Learning

Feed ranking requires machine learning, and ML requires data. But ML models can inadvertently memorize sensitive data, infer protected attributes, or create filter bubbles. Privacy-preserving ML techniques address these concerns.

4.1 Feature Privacy

Feature Design Principles for Privacy

•No Direct PII — Never use raw email, phone, or name as features
•Aggregation — Use aggregated signals (avg engagement) not individual actions
•Anonymization — Hash or tokenize identifiers; use opaque IDs
•k-Anonymity — Ensure at least k users share any feature combination
•Feature Auditing — Regularly review features for privacy implications
•Sensitive Attribute Protection — Don't use features that correlate with protected characteristics

privacy_aware_features.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Privacy-aware feature engineering
 
// ❌ Bad: Direct PII usage
const badFeatures = {
  user_email: user.email,           // Direct PII
  user_location: user.city,         // Precise location
  user_age: user.birthday,          // Age (protected attribute)
  user_political_views: user.views, // Sensitive category
};
 
// ✅ Good: Privacy-preserving alternatives
const goodFeatures = {
  // Hashed, opaque ID
  user_id_hash: hash(user.id + salt),
  
  // Bucketed, aggregated location (country-level only)
  user_region_bucket: getRegionBucket(user.country),
  
  // Age removed entirely (or very coarse buckets if needed)
  user_generation: getGenerationBucket(user.age),  // "GenZ", "Millennial", "GenX"
  
  // Behavioral signals instead of declared attributes
  user_engagement_rate: user.avgLikesPerDay,
  user_content_type_affinity: computeContentAffinity(user.interactions),
  
  // Differential privacy noise added
  user_session_count: addLaplacianNoise(user.sessionCount, epsilon=1.0),
};
 
// Feature audit checks
function auditFeature(feature: Feature): AuditResult {
  const issues = [];
  
  // Check 1: PII detection
  if (containsPII(feature.values)) {
    issues.push('Feature contains PII');
  }
  
  // Check 2: Protected attribute correlation
  const correlation = computeCorrelation(feature, protectedAttributes);
  if (correlation > 0.7) {
    issues.push(`High correlation (${correlation}) with protected attribute`);
  }
  
  // Check 3: K-anonymity
  const kValue = computeKAnonymity(feature);
  if (kValue < 100) {
    issues.push(`Low k-anonymity (k=${kValue})`);
  }
  
  return {
    approved: issues.length === 0,
    issues,
  };
}

4.2 Federated Learning and On-Device ML

On-Device ML Benefits

•Data Stays on Device — Model trains on device, only gradients sent to server
•Personalization Without Upload — User behavior never leaves device
•Federated Aggregation — Server aggregates gradients from many users, can't reconstruct individual data
•Secure Aggregation — Cryptographic protocols ensure server only sees sum, not individual contributions
•Differential Privacy — Noise added to gradients protects individual users

federated_ranking.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// Federated learning for ranking personalization
 
// On-device model: learns user preferences without uploading behavior
class OnDeviceRankingModel {
  private localModel: TFLiteModel;
  
  async personalizeLocally(
    impressions: Impression[],
    interactions: Interaction[]
  ): Promise<void> {
    // Train on local data only
    const examples = this.createTrainingExamples(impressions, interactions);
    
    await this.localModel.fit(examples, {
      epochs: 5,
      batchSize: 32,
    });
    
    // Model weights updated locally
    // Raw impression/interaction data never leaves device
  }
  
  async contributeToCentralModel(): Promise<void> {
    // Compute gradient update (not raw data)
    const gradients = this.localModel.computeGradients();
    
    // Add differential privacy noise
    const noisyGradients = addGaussianNoise(gradients, {
      epsilon: 2.0,
      delta: 1e-5,
      sensitivity: 1.0,
    });
    
    // Clip gradients to bound sensitivity
    const clippedGradients = clipGradients(noisyGradients, maxNorm=1.0);
    
    // Send to server using secure aggregation
    await this.secureAggregation.contribute(clippedGradients);
  }
}
 
// Server: aggregates without seeing individual contributions
class FederatedServer {
  async aggregateRound(contributors: string[]): Promise<ModelUpdate> {
    // Secure aggregation ensures server only sees sum
    const aggregatedGradients = await this.secureAggregation.aggregate(
      contributors,
      minContributors: 1000  // Privacy guarantee
    );
    
    // Average gradients
    const avgGradients = divideByScalar(aggregatedGradients, contributors.length);
    
    // Update global model
    await this.globalModel.applyGradients(avgGradients);
    
    return this.globalModel.getUpdate();
  }
}

Federated Learning Trade-offs

Federated learning provides strong privacy but at cost: slower model convergence, higher device battery usage, and model complexity limits. It works best for personalization layers on top of a centrally-trained base model.

Regulatory Compliance (GDPR, CCPA)

Modern privacy regulations impose strict requirements on data handling. Non-compliance results in massive fines (up to 4% of global revenue for GDPR). Feed systems must be designed to support these requirements.

5.1 Key Regulatory Requirements

GDPR vs CCPA Requirements
Requirement	GDPR	CCPA	Technical Implication
Right to Access	Required	Required	Export all user data within 30 days
Right to Deletion	Required	Required	Delete from all systems within 30 days
Right to Portability	Required	Not required	Machine-readable export format
Right to Rectification	Required	Limited	Allow user to correct their data
Consent Tracking	Explicit needed	Opt-out model	Track consent per data use
Data Minimization	Required	Implicit	Only collect necessary data
Purpose Limitation	Required	Required	Use data only for stated purposes
Breach Notification	72 hours	Timely	Incident detection and response

5.2 Right to Deletion Implementation

data_deletion.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
// GDPR Article 17 - Right to Erasure implementation
 
class DataDeletionService {
  // Track all data stores that hold user data
  private dataStores: DataStore[] = [
    profileStore,
    postStore,
    engagementStore,
    commentStore,
    messageStore,
    activityLogStore,
    mlFeatureStore,
    analyticsStore,
    backupStore,
    cdnCache,
    searchIndex,
    // ... hundreds more in practice
  ];
  
  async processDeleteRequest(userId: string): Promise<DeletionReport> {
    const report = new DeletionReport();
    report.requestId = generateId();
    report.userId = userId;
    report.requestedAt = Date.now();
    
    // 1. Validate request (verify identity, not duplicate)
    await this.validateRequest(userId);
    
    // 2. Immediate actions
    await this.disableAccount(userId);  // Prevent new data creation
    await this.invalidateTokens(userId);  // Force logout
    
    // 3. Queue deletion jobs for each data store
    for (const store of this.dataStores) {
      const job = await this.jobQueue.enqueue('data:delete', {
        userId,
        storeName: store.name,
        requestId: report.requestId,
      });
      report.jobs.push(job.id);
    }
    
    // 4. Schedule verification (GDPR requires 30-day deadline)
    await this.scheduler.schedule('deletion:verify', {
      requestId: report.requestId,
      deadline: Date.now() + 25 * 24 * 60 * 60 * 1000,  // Day 25 check
    });
    
    return report;
  }
  
  // Individual store deletion
  async deleteFromStore(
    userId: string, 
    store: DataStore
  ): Promise<StoreDeleteResult> {
    const result = new StoreDeleteResult();
    
    try {
      // Delete primary data
      const primaryCount = await store.deleteByUser(userId);
      result.primaryDeleted = primaryCount;
      
      // Delete from caches
      const cacheCount = await store.invalidateUserCaches(userId);
      result.cachesInvalidated = cacheCount;
      
      // Delete from indexes
      const indexCount = await store.removeFromIndexes(userId);
      result.indexEntriesRemoved = indexCount;
      
      // Trigger backup tombstone (for retention-protected backups)
      await this.backupService.markForDeletion(userId, store.name);
      
      result.success = true;
    } catch (e) {
      result.success = false;
      result.error = e.message;
      
      // Alert for manual remediation
      await this.alerting.trigger('deletion_failure', {
        userId,
        store: store.name,
        error: e.message,
      });
    }
    
    return result;
  }
  
  // Verify deletion completed
  async verifyDeletion(requestId: string): Promise<VerificationResult> {
    const report = await this.getReport(requestId);
    const failures = [];
    
    for (const store of this.dataStores) {
      const remaining = await store.checkForRemainingData(report.userId);
      if (remaining > 0) {
        failures.push({
          store: store.name,
          remainingRecords: remaining,
        });
      }
    }
    
    if (failures.length > 0) {
      // Escalate for manual cleanup
      await this.escalate('incomplete_deletion', {
        requestId,
        failures,
        daysRemaining: this.getDaysUntilDeadline(report),
      });
    }
    
    return {
      complete: failures.length === 0,
      failures,
    };
  }
}

5.3 Right to Access (Data Export)

data_export.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// GDPR Article 15 - Right of Access implementation
 
class DataExportService {
  
  async createExport(userId: string): Promise<ExportJob> {
    // This is expensive - queue as background job
    const job = await this.jobQueue.enqueue('data:export', {
      userId,
      format: 'json',  // Machine-readable (GDPR portability)
    });
    
    return {
      jobId: job.id,
      estimatedCompletion: Date.now() + 24 * 60 * 60 * 1000,  // 24 hours
      status: 'pending',
    };
  }
  
  async processExport(userId: string): Promise<string> {
    const exportData = {
      exportedAt: new Date().toISOString(),
      userId: userId,
      categories: {},
    };
    
    // Profile data
    exportData.categories.profile = await this.exportProfile(userId);
    
    // Posts and content
    exportData.categories.posts = await this.exportPosts(userId);
    
    // Comments
    exportData.categories.comments = await this.exportComments(userId);
    
    // Messages (only user's side)
    exportData.categories.messages = await this.exportMessages(userId);
    
    // Activity log
    exportData.categories.activity = await this.exportActivity(userId);
    
    // Ad interactions
    exportData.categories.adInteractions = await this.exportAdData(userId);
    
    // Inferred data (interests, predictions)
    exportData.categories.inferredData = await this.exportInferences(userId);
    
    // Write to secure storage
    const exportFile = await this.storage.write(
      `exports/${userId}/${Date.now()}.json`,
      JSON.stringify(exportData, null, 2)
    );
    
    // Generate secure download link (time-limited)
    const downloadUrl = await this.storage.createSignedUrl(exportFile, {
      expiresIn: '7d',
    });
    
    // Notify user
    await this.notify(userId, 'data_export_ready', { downloadUrl });
    
    return downloadUrl;
  }
  
  // Include inferred data (GDPR requires this)
  async exportInferences(userId: string): Promise<InferredData> {
    return {
      interests: await this.mlService.getInferredInterests(userId),
      ageRange: await this.mlService.getInferredAgeRange(userId),
      adCategories: await this.adService.getTargetingCategories(userId),
      relationshipStrengths: await this.socialService.getRelationshipScores(userId),
    };
  }
}

Deletion is Harder Than It Looks

Data spreads across databases, caches, logs, backups, ML training sets, and third-party systems. True deletion requires tracking every data flow and maintaining deletion capability at each store. Many organizations underestimate this complexity until faced with a regulator.

Transparency and User Controls

Beyond technical compliance, building user trust requires transparency about data usage and meaningful controls over the experience.

6.1 Algorithmic Transparency

"Why Am I Seeing This?" Implementation

•Primary Reason — Friend/page activity, topic interest, engagement history
•Ranking Factors — Which signals contributed to this post appearing
•Advertiser Targeting — For ads, show targeting criteria used
•Control Links — One-click actions to hide, unfollow, or adjust preferences
•Actionable Explanations — Not just 'we thought you'd like this', but 'because you often like posts about cooking'

explainability.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
// Generate explanation for why post appeared in feed
 
interface RankingExplanation {
  primaryReason: string;
  contributingFactors: Factor[];
  actions: Action[];
}
 
async function generateExplanation(
  viewerId: string,
  post: Post,
  rankingContext: RankingContext
): Promise<RankingExplanation> {
  
  const factors: Factor[] = [];
  const actions: Action[] = [];
  
  // Determine primary reason
  let primaryReason: string;
  
  if (post.authorId in await getFriends(viewerId)) {
    primaryReason = `${post.author.name} is your friend`;
    actions.push({
      label: 'Unfollow',
      action: `unfollow:${post.authorId}`,
    });
    
    // Check relationship strength
    const strength = await getRelationshipStrength(viewerId, post.authorId);
    if (strength > 0.8) {
      factors.push({
        type: 'relationship',
        description: 'You interact frequently with this person',
        weight: 'high',
      });
    }
  } else if (post.sourcePage) {
    primaryReason = `You follow ${post.sourcePage.name}`;
    actions.push({
      label: 'Unlike Page',
      action: `unlike_page:${post.sourcePage.id}`,
    });
  }
  
  // Content factors
  if (rankingContext.topicMatch > 0.7) {
    const matchedTopics = await getMatchedTopics(viewerId, post);
    factors.push({
      type: 'topic_interest',
      description: `You've shown interest in ${matchedTopics.join(', ')}`,
      weight: 'medium',
    });
    actions.push({
      label: 'See less of this topic',
      action: `reduce_topic:${matchedTopics[0]}`,
    });
  }
  
  // Engagement velocity
  if (rankingContext.engagementVelocity > 100) {
    factors.push({
      type: 'engagement',
      description: 'Many people are engaging with this post',
      weight: 'low',
    });
  }
  
  // Always offer hide action
  actions.push({
    label: 'Hide this post',
    action: `hide:${post.id}`,
  });
  
  return {
    primaryReason,
    contributingFactors: factors,
    actions,
  };
}

6.2 User Preference Controls

Feed Preference Controls
Control	User Action	System Response
Hide Post	Click hide on specific post	Negative signal for similar content
Unfollow	Stop seeing friend/page	Remove from candidate sources
Snooze	Hide for 30 days	Temporary removal from feed
See First	Prioritize this friend	Boost in ranking
Topic Preference	More/less of topic	Adjust topic affinity weights
Chronological View	Switch to Recent	Disable ranking, show by time
Reduce Frequency	See less from source	Decrease source affinity score

Meaningful Controls Build Trust

Users feel empowered when they understand why content appears and can control their experience. Even if few users use advanced controls, their availability signals that the platform respects user agency. Transparency is a competitive advantage.

Summary: Privacy Considerations

We've explored the privacy architecture required for responsible feed systems. Let's consolidate the key concepts:

Key Takeaways

•Privacy is an architectural constraint — Must be designed in from the start, not bolted on later
•Defense in depth for access control — Multiple enforcement points catch edge cases and bugs
•Block/privacy changes must propagate instantly — Strong consistency for privacy-critical operations
•ML can preserve privacy — Feature design, federated learning, and differential privacy protect user data
•Regulations impose real requirements — GDPR/CCPA require data export, deletion, and purpose limitation
•Transparency builds trust — Explain why content appears and provide meaningful controls

Module Complete:

Congratulations! You've completed the comprehensive Facebook Newsfeed design module. You now understand:

Requirements analysis for personalized feed systems
Ranking algorithms and ML architecture
Content aggregation with hybrid push-pull strategies
Real-time updates and connection management
Caching strategies for component-based systems
Privacy architecture for regulatory compliance and user trust

Module Complete

You now have a comprehensive understanding of how to design Facebook's Newsfeed-scale personalized content delivery systems. The principles you've learned—ranking funnels, hybrid aggregation, privacy-by-design—apply to any large-scale feed system you'll encounter or build.