Loading learning content...
"What's happening?" is Twitter's core value proposition. The trending topics feature surfaces the most-discussed subjects in real-time, helping users discover breaking news, viral moments, and cultural events as they unfold.
The Trending Challenge:
Detecting trends sounds simple: count hashtags, show the most popular ones. But at Twitter's scale, this naive approach fails spectacularly:
This page explores how to build a trending system that handles these challenges at scale.
By the end of this page, you'll understand streaming data pipelines for trend detection, approximate counting algorithms (Count-Min Sketch, HyperLogLog), velocity-based trend scoring, personalization strategies, and anti-gaming measures.
Before building the system, we must precisely define what makes something "trending." This definition drives our algorithm design.
A common mistake is equating "trending" with "popular." They're fundamentally different:
A topic is "trending" when its current velocity significantly exceeds its baseline:
Trending Score Formula====================== Core insight: Trending = Current Volume / Expected Volume trending_score = (current_count - expected_count) / sqrt(expected_count) Where:- current_count: tweets in the last N minutes (e.g., 15 min)- expected_count: typical tweets in this time window (historical baseline)- sqrt(expected_count): normalization factor (variance adjustment) Example calculations: Topic: #SuperBowl- Current (last 15 min): 50,000 tweets- Expected (typical Sunday 6pm, 15 min): 200 tweets- Score: (50000 - 200) / sqrt(200) = 3521 Topic: #love- Current (last 15 min): 10,000 tweets - Expected (typical 15 min): 9,800 tweets- Score: (10000 - 9800) / sqrt(9800) = 2.02 Result: #SuperBowl (score 3521) >> #love (score 2.02)#SuperBowl is trending despite #love having lower absolute volume.Trends have lifecycles. A topic that was trending 2 hours ago might no longer be newsworthy:
When asked to design trending topics, immediately clarify: 'Are we looking for absolute popularity or unusual velocity?' This demonstrates understanding of the core challenge. Most interviewers want velocity-based trending.
Processing 6,000+ tweets per second in real-time requires a streaming architecture. Batch processing is too slow—by the time batch results are ready, the trend might have passed.
Each tweet must be parsed to extract trendable entities (hashtags, keywords, named entities):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
"""Entity Extractor for Trend Detection Extracts trending-relevant entities from tweets. Goes beyondsimple hashtag extraction to capture keyword-based trends.""" import refrom typing import List, Set, Dictfrom dataclasses import dataclassimport spacy @dataclassclass TrendEntity: text: str entity_type: str # hashtag, keyword, person, location, event normalized: str # lowercase, stemmed version for counting weight: float # importance weight class TweetEntityExtractor: def __init__(self): self.nlp = spacy.load("en_core_web_sm") self.stopwords = self._load_stopwords() self.common_hashtags = self._load_common_hashtags() def extract_entities(self, tweet: str) -> List[TrendEntity]: """ Extract all trendable entities from a tweet. Entity types: - Hashtags: #SuperBowl - Mentions (for person trends): @elonmusk - Keywords: 2-3 word phrases that spike - Named entities: detected by NLP (people, orgs, events) """ entities = [] # 1. Extract hashtags hashtags = self._extract_hashtags(tweet) entities.extend(hashtags) # 2. Extract mentioned usernames (for person trends) mentions = self._extract_mentions(tweet) entities.extend(mentions) # 3. Extract named entities via NLP named_entities = self._extract_named_entities(tweet) entities.extend(named_entities) # 4. Extract key phrases phrases = self._extract_key_phrases(tweet) entities.extend(phrases) return entities def _extract_hashtags(self, tweet: str) -> List[TrendEntity]: """Extract hashtags with filtering.""" hashtag_pattern = r'#(w+)' matches = re.findall(hashtag_pattern, tweet) entities = [] for tag in matches: normalized = tag.lower() # Skip common/spammy hashtags if normalized in self.common_hashtags: continue entities.append(TrendEntity( text=f"#{tag}", entity_type="hashtag", normalized=normalized, weight=1.0 )) return entities def _extract_named_entities(self, tweet: str) -> List[TrendEntity]: """ Use NLP to extract named entities. Captures trends that don't use hashtags. Example: "RIP Steve Jobs" wouldn't have a hashtag but NLP detects "Steve Jobs" as a PERSON entity. """ doc = self.nlp(tweet) entities = [] for ent in doc.ents: if ent.label_ in {"PERSON", "ORG", "EVENT", "GPE", "LOC"}: entities.append(TrendEntity( text=ent.text, entity_type=ent.label_.lower(), normalized=ent.text.lower().strip(), weight=0.8 # Slightly lower weight than hashtags )) return entities def _extract_key_phrases(self, tweet: str) -> List[TrendEntity]: """ Extract significant 2-3 word phrases. Captures trends like "World Cup Final" without hashtags. """ doc = self.nlp(tweet) # Extract noun chunks entities = [] for chunk in doc.noun_chunks: words = [token.text for token in chunk if not token.is_stop and token.is_alpha] if 2 <= len(words) <= 3: phrase = " ".join(words) entities.append(TrendEntity( text=phrase, entity_type="phrase", normalized=phrase.lower(), weight=0.6 )) return entities123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
"""Trend Stream Processor Uses Apache Flink-style windowed aggregation to count entitiesin real-time sliding windows. Built for exactly-once processingsemantics and horizontal scaling.""" from typing import Dict, Listfrom collections import defaultdictimport time class TrendStreamProcessor: def __init__( self, window_size_seconds: int = 900, # 15-minute window slide_interval_seconds: int = 60 # Update every minute ): self.window_size = window_size_seconds self.slide_interval = slide_interval_seconds # In-memory counts for current window # In production: distributed state store (RocksDB, Redis) self.entity_counts: Dict[str, Dict[str, int]] = defaultdict( lambda: defaultdict(int) ) # Location-based counts for local trends self.location_counts: Dict[str, Dict[str, int]] = defaultdict( lambda: defaultdict(int) ) def process_tweet(self, tweet: Dict): """ Process a single tweet event. Called for each tweet as it arrives in the stream. """ timestamp = tweet["created_at"] window_key = self._get_window_key(timestamp) location = tweet.get("location", "global") # Extract entities entities = self.entity_extractor.extract_entities(tweet["text"]) # Update counts for entity in entities: # Global count self.entity_counts[window_key][entity.normalized] += 1 # Location-specific count loc_key = f"{window_key}:{location}" self.location_counts[loc_key][entity.normalized] += 1 def get_sliding_window_counts( self, location: str = "global" ) -> Dict[str, int]: """ Get aggregate counts across all windows in the sliding window. Example: For 15-min window, sum counts from last 15 one-minute buckets. """ current_time = time.time() counts = defaultdict(int) # Sum counts from all windows in the sliding range for bucket in range(0, self.window_size, 60): window_time = current_time - bucket window_key = self._get_window_key(window_time) if location == "global": for entity, count in self.entity_counts[window_key].items(): counts[entity] += count else: loc_key = f"{window_key}:{location}" for entity, count in self.location_counts[loc_key].items(): counts[entity] += count return dict(counts) def _get_window_key(self, timestamp: float) -> str: """Get bucket key for a timestamp (1-minute buckets).""" bucket = int(timestamp // 60) return f"w:{bucket}" def cleanup_old_windows(self): """ Remove windows older than retention period. Called periodically to prevent memory growth. """ current_bucket = int(time.time() // 60) retention_buckets = self.window_size // 60 + 10 # Keep some buffer old_keys = [ key for key in self.entity_counts.keys() if int(key.split(":")[1]) < current_bucket - retention_buckets ] for key in old_keys: del self.entity_counts[key]Real implementations use Apache Flink or Kafka Streams for stateful stream processing. These frameworks handle state checkpointing, exactly-once semantics, and fault recovery automatically. The concepts above illustrate the core logic.
With millions of unique hashtags and billions of tweets, exact counting becomes impractical. Approximate counting algorithms provide memory-efficient solutions with bounded error.
Count-Min Sketch provides frequency estimation using sublinear space:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
"""Count-Min Sketch Implementation A probabilistic data structure for frequency estimation.Uses O(width × depth) space regardless of number of elements. Properties:- Never underestimates: actual_count <= estimate- Overestimates by small factor with high probability- Additive error: estimate <= actual_count + ε × N""" import mmh3 # MurmurHash3 for fast hashingimport numpy as np class CountMinSketch: def __init__( self, width: int = 10000, # Number of counters per row depth: int = 5 # Number of hash functions ): """ Initialize Count-Min Sketch. Error guarantees (with high probability): - Relative error: ε = e / width ≈ 0.00027 for width=10000 - Probability of exceeding error: δ = e^(-depth) ≈ 0.0067 for depth=5 """ self.width = width self.depth = depth self.table = np.zeros((depth, width), dtype=np.int64) self.total_count = 0 def add(self, item: str, count: int = 1): """Add an item to the sketch.""" self.total_count += count for i in range(self.depth): # Each row uses a different hash function (seed) hash_val = mmh3.hash(item, seed=i) % self.width self.table[i][hash_val] += count def estimate(self, item: str) -> int: """ Estimate the count of an item. Returns the minimum across all hash functions. """ min_count = float('inf') for i in range(self.depth): hash_val = mmh3.hash(item, seed=i) % self.width min_count = min(min_count, self.table[i][hash_val]) return int(min_count) def merge(self, other: 'CountMinSketch'): """ Merge another sketch into this one. Enables parallel processing with distributed sketches. """ if self.width != other.width or self.depth != other.depth: raise ValueError("Sketches must have same dimensions") self.table += other.table self.total_count += other.total_count class TrendingCountMinSketch: """ Application of Count-Min Sketch for trending topics. Uses separate sketches for current and baseline periods. """ def __init__(self): # Current 15-minute window self.current_sketch = CountMinSketch(width=50000, depth=7) # Baseline (rolling 7-day average for this time window) self.baseline_sketch = CountMinSketch(width=50000, depth=7) # Unique user tracking (prevent spam) self.user_sketches: Dict[str, 'HyperLogLog'] = {} def record_entity(self, entity: str, user_id: str): """Record an entity mention from a user.""" # Add to current window self.current_sketch.add(entity) # Track unique users mentioning this entity if entity not in self.user_sketches: self.user_sketches[entity] = HyperLogLog() self.user_sketches[entity].add(user_id) def get_trending_score(self, entity: str) -> float: """Calculate trending score using sketch estimates.""" current = self.current_sketch.estimate(entity) baseline = self.baseline_sketch.estimate(entity) / (7 * 96) # 7 days × 96 windows/day if baseline < 1: baseline = 1 # Avoid division by zero # Z-score style calculation return (current - baseline) / np.sqrt(baseline) def get_unique_user_estimate(self, entity: str) -> int: """Estimate unique users mentioning entity.""" if entity in self.user_sketches: return self.user_sketches[entity].cardinality() return 0We also need to count unique users per topic (for anti-gaming). HyperLogLog provides cardinality estimation:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
"""HyperLogLog Implementation Estimates cardinality (count of unique elements) using O(log log N) space.Can estimate billions of unique items using only ~12KB of memory. Properties:- Space: O(m) where m = 2^p (typically 2^14 = 16K registers)- Error: Standard error ≈ 1.04 / sqrt(m) ≈ 0.81% for m=16384- Mergeable: Can union multiple HLLs without re-processing elements""" import mmh3import math class HyperLogLog: def __init__(self, precision: int = 14): """ Initialize HyperLogLog. precision = 14 means 2^14 = 16384 registers Standard error ≈ 0.81% """ self.precision = precision self.num_registers = 1 << precision # 2^p self.registers = [0] * self.num_registers # Alpha correction factor if self.num_registers >= 128: self.alpha = 0.7213 / (1 + 1.079 / self.num_registers) elif self.num_registers >= 64: self.alpha = 0.709 elif self.num_registers >= 32: self.alpha = 0.697 else: self.alpha = 0.673 def add(self, item: str): """Add an item to the HLL.""" hash_val = mmh3.hash64(item, signed=False)[0] # Use first p bits to select register register_idx = hash_val >> (64 - self.precision) # Use remaining bits to count leading zeros remaining = hash_val & ((1 << (64 - self.precision)) - 1) leading_zeros = self._count_leading_zeros(remaining, 64 - self.precision) # Store maximum observed leading zeros + 1 self.registers[register_idx] = max( self.registers[register_idx], leading_zeros + 1 ) def cardinality(self) -> int: """Estimate the number of unique items.""" # Harmonic mean of 2^(-register) sum_inv = sum(2 ** (-reg) for reg in self.registers) raw_estimate = (self.alpha * self.num_registers ** 2) / sum_inv # Bias correction for small and large cardinalities if raw_estimate <= 2.5 * self.num_registers: # Small range correction zeros = self.registers.count(0) if zeros > 0: return int( self.num_registers * math.log(self.num_registers / zeros) ) elif raw_estimate > (1 << 32) / 30: # Large range correction return int(-((1 << 32) * math.log(1 - raw_estimate / (1 << 32)))) return int(raw_estimate) def merge(self, other: 'HyperLogLog'): """Merge another HLL into this one.""" if self.precision != other.precision: raise ValueError("HLLs must have same precision") for i in range(self.num_registers): self.registers[i] = max(self.registers[i], other.registers[i]) def _count_leading_zeros(self, value: int, max_bits: int) -> int: if value == 0: return max_bits count = 0 for i in range(max_bits - 1, -1, -1): if value & (1 << i): break count += 1 return count| Algorithm | Purpose | Space | Error | Use Case |
|---|---|---|---|---|
| Count-Min Sketch | Frequency estimation | O(w × d) | ε × N | How many times #topic mentioned? |
| HyperLogLog | Cardinality estimation | O(2^p) | 1.04/√m | How many unique users? |
| Bloom Filter | Membership test | O(n) | False positive ε | Did user already count? |
| Top-K (Heavy Hitters) | Find most frequent | O(k) | Depends on skew | Top 10 trending topics |
Exact counting of 100 million unique hashtags would require ~800MB+ just for the hash map structure. A Count-Min Sketch with width=50000, depth=7 uses only ~2.6MB. HyperLogLog uses ~12KB per entity. These savings enable in-memory processing at scale.
With entity counts computed, we need to score and rank topics to determine what's truly trending.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147
"""Multi-Factor Trend Scorer Combines multiple signals to produce a final trending score.Each factor handles a different aspect of what makes somethingworth surfacing to users.""" from dataclasses import dataclassfrom typing import Dictimport math @dataclassclass TrendCandidate: entity: str entity_type: str current_count: int baseline_count: float unique_users: int unique_sources: int # Different apps/clients geographic_spread: float # 0-1, how globally distributed account_variety: float # Mix of verified/regular accounts first_seen: float # Timestamp when trend started peak_velocity: float # Maximum tweets per minute observed class TrendScorer: def __init__(self): # Weight for each factor (tuned via experimentation) self.weights = { "velocity": 0.35, "acceleration": 0.20, "diversity": 0.20, "quality": 0.15, "freshness": 0.10 } def score(self, candidate: TrendCandidate) -> float: """ Calculate overall trending score. Higher score = more likely to be shown as trending. """ scores = { "velocity": self._velocity_score(candidate), "acceleration": self._acceleration_score(candidate), "diversity": self._diversity_score(candidate), "quality": self._quality_score(candidate), "freshness": self._freshness_score(candidate), } # Weighted sum total = sum( scores[factor] * self.weights[factor] for factor in scores ) # Apply anti-gaming penalty penalty = self._gaming_penalty(candidate) return total * penalty def _velocity_score(self, c: TrendCandidate) -> float: """ Score based on current vs baseline volume. Higher velocity = more trending. """ if c.baseline_count < 1: c.baseline_count = 1 # Z-score calculation z = (c.current_count - c.baseline_count) / math.sqrt(c.baseline_count) # Normalize to 0-100 scale return min(100, max(0, z * 10)) def _acceleration_score(self, c: TrendCandidate) -> float: """ Score based on how quickly velocity is increasing. Rapid acceleration = breaking news. """ # Compare current velocity to velocity 5 minutes ago # Higher acceleration = more newsworthy if c.peak_velocity > 0: current_velocity = c.current_count / 15 # per minute acceleration = current_velocity / c.peak_velocity return min(100, acceleration * 50) return 0 def _diversity_score(self, c: TrendCandidate) -> float: """ Score based on diversity of users discussing topic. Real trends have diverse participants; bot campaigns don't. """ # Unique users / total mentions ratio if c.current_count == 0: return 0 user_ratio = c.unique_users / c.current_count # Geographic diversity geo_score = c.geographic_spread * 50 # Source diversity (different apps/clients) source_score = min(50, c.unique_sources * 5) return (user_ratio * 50) + (geo_score * 0.3) + (source_score * 0.2) def _quality_score(self, c: TrendCandidate) -> float: """ Score based on quality signals. Verified accounts, engagement, etc. """ # Mix of verified and regular accounts variety_score = c.account_variety * 100 return variety_score def _freshness_score(self, c: TrendCandidate) -> float: """ Score based on how recently trend started. Newer trends are more interesting. """ age_hours = (time.time() - c.first_seen) / 3600 # Decay over 24 hours if age_hours > 24: return 0 return 100 * (1 - age_hours / 24) def _gaming_penalty(self, c: TrendCandidate) -> float: """ Detect potential gaming and apply penalty. Returns multiplier between 0 (blocked) and 1 (no penalty). """ # Low user diversity = likely bot campaign if c.unique_users < c.current_count * 0.3: return 0.1 # Single geographic region = potential coordinated campaign if c.geographic_spread < 0.1: return 0.5 # Sudden spike from new accounts = suspicious # (would need additional signals in practice) return 1.0Trends vary dramatically by location. A local news event might trend in one city but be irrelevant globally:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
"""Geographic Trending Computes trends at multiple geographic granularities:- Global (worldwide)- Country- City/Metro area Uses user-specified location and tweet geolocation when available.""" from typing import List, Dict, Optionalfrom dataclasses import dataclass @dataclassclass GeoTrend: entity: str score: float volume: int rank: int class GeoTrendingService: # Geographic hierarchy for trend levels TREND_LEVELS = ["global", "country", "city"] def __init__( self, trend_scorer: TrendScorer, geo_counter: Dict[str, TrendingCountMinSketch] # Per-location counters ): self.scorer = trend_scorer self.counters = geo_counter def get_trends_for_location( self, location: str, limit: int = 10 ) -> List[GeoTrend]: """ Get trending topics for a specific location. Falls back to broader geography if insufficient local trends. """ trends = [] # Try to get trends at this location level local_trends = self._get_local_trends(location, limit) trends.extend(local_trends) # If not enough local trends, supplement with broader if len(trends) < limit: parent_location = self._get_parent_location(location) remaining = limit - len(trends) if parent_location: parent_trends = self._get_local_trends( parent_location, remaining ) # Avoid duplicates existing = {t.entity for t in trends} for t in parent_trends: if t.entity not in existing: trends.append(t) # Sort by score and assign ranks trends.sort(key=lambda t: t.score, reverse=True) for i, trend in enumerate(trends[:limit]): trend.rank = i + 1 return trends[:limit] def _get_local_trends( self, location: str, limit: int ) -> List[GeoTrend]: """Get trends specific to one location.""" if location not in self.counters: return [] counter = self.counters[location] # Get candidates above minimum threshold candidates = self._get_candidates_above_threshold(counter, location) # Score each candidate scored = [] for candidate in candidates: score = self.scorer.score(candidate) if score > 0: scored.append(GeoTrend( entity=candidate.entity, score=score, volume=candidate.current_count, rank=0 # Assigned later )) return scored def assign_user_location(self, user: Dict) -> str: """ Determine which location's trends to show a user. Uses profile location, IP geolocation, or activity patterns. """ if user.get("profile_location"): # Geocode profile location to normalized form return self._geocode(user["profile_location"]) if user.get("ip_location"): return user["ip_location"] # Default to global return "global"Geographic trending uses aggregate data only—individual user locations are never exposed. The system answers 'What's trending in New York?' without revealing which specific users are in New York discussing the topic.
Trending topics are valuable real estate. Advertisers, political campaigns, and bad actors will try to game the system. Robust anti-gaming measures are essential.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163
"""Anti-Gaming Detection System Multi-layer defense against trend manipulation.Each layer catches different types of gaming attempts.""" from dataclasses import dataclassfrom typing import List, Setimport numpy as np @dataclassclass GamingSignal: signal_type: str confidence: float # 0-1 details: str class AntiGamingDetector: def __init__( self, bot_detector, spam_classifier, account_scorer ): self.bot_detector = bot_detector self.spam_classifier = spam_classifier self.account_scorer = account_scorer def analyze_trend( self, entity: str, tweets: List[Dict], users: List[Dict] ) -> List[GamingSignal]: """ Analyze a potential trend for gaming signals. Returns list of detected signals with confidence scores. """ signals = [] # 1. Account age analysis signals.extend(self._analyze_account_ages(users)) # 2. Temporal pattern analysis signals.extend(self._analyze_temporal_patterns(tweets)) # 3. Bot probability analysis signals.extend(self._analyze_bot_probability(users)) # 4. Content similarity analysis signals.extend(self._analyze_content_similarity(tweets)) # 5. Engagement authenticity signals.extend(self._analyze_engagement_patterns(tweets)) return signals def _analyze_account_ages(self, users: List[Dict]) -> List[GamingSignal]: """ Detect suspicious concentration of new accounts. Bot campaigns often use freshly created accounts. """ signals = [] age_days = [u.get("account_age_days", 0) for u in users] new_account_ratio = sum(1 for a in age_days if a < 30) / len(age_days) if new_account_ratio > 0.5: signals.append(GamingSignal( signal_type="new_account_concentration", confidence=min(1.0, new_account_ratio), details=f"{new_account_ratio:.0%} of participants are <30 days old" )) return signals def _analyze_temporal_patterns( self, tweets: List[Dict] ) -> List[GamingSignal]: """ Detect unnatural timing patterns. Real trends have organic growth; gaming has suspicious spikes. """ signals = [] # Calculate inter-arrival times timestamps = sorted([t["created_at"] for t in tweets]) intervals = np.diff(timestamps) if len(intervals) > 10: # Check for suspiciously regular intervals cv = np.std(intervals) / np.mean(intervals) # Coefficient of variation if cv < 0.1: # Too regular = likely automated signals.append(GamingSignal( signal_type="regular_timing", confidence=1 - cv * 10, details=f"Suspiciously regular tweet intervals (CV={cv:.3f})" )) # Check for burst patterns burst_threshold = np.percentile(intervals, 5) if burst_threshold < 0.1: # Sub-second bursts signals.append(GamingSignal( signal_type="burst_pattern", confidence=0.9, details="Abnormal burst of activity detected" )) return signals def _analyze_content_similarity( self, tweets: List[Dict] ) -> List[GamingSignal]: """ Detect copy-paste or template-based tweets. Real discussion has variety; gaming often uses templates. """ signals = [] contents = [t["text"] for t in tweets] # Calculate text similarity matrix # (In production: use MinHash or SimHash for efficiency) unique_ratio = len(set(contents)) / len(contents) if unique_ratio < 0.5: signals.append(GamingSignal( signal_type="content_duplication", confidence=1 - unique_ratio, details=f"Only {unique_ratio:.0%} of tweets are unique" )) return signals def calculate_gaming_penalty( self, signals: List[GamingSignal] ) -> float: """ Calculate overall penalty multiplier based on detected signals. Returns 0-1 where 0 = definitely gaming, 1 = no signals. """ if not signals: return 1.0 # Weight signals by type severity severity_weights = { "new_account_concentration": 0.7, "regular_timing": 0.9, "burst_pattern": 0.8, "content_duplication": 0.85, "bot_probability": 0.95, } penalty = 1.0 for signal in signals: weight = severity_weights.get(signal.signal_type, 0.5) penalty *= (1 - signal.confidence * weight) return max(0, penalty)Beyond gaming detection, we filter trends for content quality and safety:
Anti-gaming is an ongoing arms race. Attackers adapt to defenses. Continuous monitoring, regular model updates, and human review of edge cases are essential. No automated system is perfect—but layered defenses raise the bar significantly.
You've now mastered the core concepts of real-time trend detection. Let's consolidate:
What's Next:
The final page in this module covers Scaling Considerations—how to take everything we've designed so far and make it work at Twitter's full scale. We'll discuss global distribution, caching at the edge, capacity planning, and graceful degradation strategies.
You now understand how to detect and surface trending topics in real-time. This knowledge applies to any system needing real-time signal detection—not just social media trends, but also anomaly detection, breaking news alerts, and viral content identification.