Loading learning content...
Music recommendation is perhaps the most impactful differentiator between streaming platforms. With over 100 million tracks in the catalog, users cannot possibly find all the music they'd love on their own. The recommendation system becomes a personal DJ, curator, and explorer—surfacing music that matches individual tastes, moods, and contexts.
Discover Weekly, Spotify's flagship personalized playlist, reaches 40 million users with custom-generated 30-track playlists every Monday. Daily Mixes adapt throughout the day based on listening patterns. Release Radar surfaces new releases from artists users follow. Behind these features lies a sophisticated machine learning infrastructure processing billions of events daily.
This page explores how to architect a recommendation system at this scale.
You will understand the core recommendation approaches (collaborative filtering, content-based, hybrid), the ML pipeline architecture, feature engineering at scale, real-time personalization, and how to balance exploration vs. exploitation in recommendations.
Before diving into architecture, let's understand the core approaches to recommendation. Each has distinct strengths and limitations.
The Three Pillars of Recommendation:
| Approach | How It Works | Strengths | Weaknesses |
|---|---|---|---|
| Collaborative Filtering | Users who liked X also liked Y | Discovers unexpected connections | Cold start problem for new users/items |
| Content-Based | This song sounds like that song | Works for new items immediately | Limited to similar content, no serendipity |
| Knowledge-Based | Rules: 'If user likes rock, suggest classic rock' | Explainable, controllable | Doesn't scale, misses nuance |
Spotify's Hybrid Approach:
Modern recommendation systems combine all three approaches. Spotify famously uses a hybrid model that leverages:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
class HybridRecommendationEngine: """ Combines multiple recommendation approaches for optimal results. """ def __init__(self): self.collaborative_model = CollaborativeFilteringModel() self.content_model = ContentBasedModel() self.audio_model = AudioFeatureModel() self.contextual_model = ContextualModel() def generate_recommendations( self, user_id: str, context: RecommendationContext, num_tracks: int = 30 ) -> List[RecommendedTrack]: """ Generate recommendations using ensemble of models. Each model produces candidate tracks with scores. Final ranking blends all scores based on learned weights. """ # Get user's listening profile user_profile = self.user_service.get_taste_profile(user_id) # Generate candidates from each model candidates = {} # Collaborative: "Users like you also liked..." collab_recs = self.collaborative_model.recommend( user_id, num_candidates=100 ) for track_id, score in collab_recs: candidates[track_id] = candidates.get(track_id, {}) candidates[track_id]['collaborative'] = score # Content-based: "Similar to what you listen to..." content_recs = self.content_model.recommend( user_profile.top_tracks, num_candidates=100 ) for track_id, score in content_recs: candidates[track_id] = candidates.get(track_id, {}) candidates[track_id]['content'] = score # Audio features: "Sounds like your taste..." audio_recs = self.audio_model.recommend( user_profile.audio_preferences, num_candidates=100 ) for track_id, score in audio_recs: candidates[track_id] = candidates.get(track_id, {}) candidates[track_id]['audio'] = score # Context-aware: "Right for this moment..." context_recs = self.contextual_model.recommend( user_id, context, num_candidates=50 ) for track_id, score in context_recs: candidates[track_id] = candidates.get(track_id, {}) candidates[track_id]['context'] = score # Blend scores using learned weights final_scores = self._blend_scores(candidates, user_profile, context) # Apply diversity filter to avoid too-similar tracks diverse_tracks = self._ensure_diversity(final_scores, num_tracks) return diverse_tracks def _blend_scores( self, candidates: Dict[str, Dict[str, float]], user_profile: UserProfile, context: RecommendationContext ) -> List[Tuple[str, float]]: """ Combine scores from different models. Weights can be: - Static (manually tuned) - User-specific (learned from feedback) - Context-specific (e.g., more audio-based for new user) """ weights = self._get_blending_weights(user_profile, context) final_scores = [] for track_id, scores in candidates.items(): combined = ( scores.get('collaborative', 0) * weights['collaborative'] + scores.get('content', 0) * weights['content'] + scores.get('audio', 0) * weights['audio'] + scores.get('context', 0) * weights['context'] ) final_scores.append((track_id, combined)) return sorted(final_scores, key=lambda x: x[1], reverse=True)New users have no listening history. New tracks have no plays. This 'cold start' problem is addressed by: onboarding flows asking genre preferences, leveraging registration data (age, location), defaulting to popularity, and quickly learning from initial interactions.
Collaborative filtering is the workhorse of recommendation systems. The core idea: if User A and User B have similar taste (they like many of the same tracks), then tracks that A likes but B hasn't heard are good recommendations for B.
Matrix Factorization Approach:
We model the user-track interaction as a sparse matrix and factorize it into user and track embeddings:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
class MatrixFactorizationModel: """ Collaborative filtering using matrix factorization. User-item interaction matrix R ≈ U × V^T Where: - R is the (users × tracks) interaction matrix - U is (users × k) user embedding matrix - V is (tracks × k) track embedding matrix - k is the embedding dimension (typically 100-300) For any (user, track) pair, predicted affinity is: score = dot_product(user_embedding, track_embedding) """ def __init__(self, embedding_dim=256): self.embedding_dim = embedding_dim self.user_embeddings = {} # user_id -> vector self.track_embeddings = {} # track_id -> vector def train(self, interactions: List[Interaction]): """ Train model on user-track interactions. Interactions include: - Plays (weighted by duration / track length) - Saves (strong positive signal) - Skips (negative signal) - Repeats (very strong positive) Uses implicit feedback (not explicit ratings). """ # Build interaction matrix user_ids = list(set(i.user_id for i in interactions)) track_ids = list(set(i.track_id for i in interactions)) user_to_idx = {uid: i for i, uid in enumerate(user_ids)} track_to_idx = {tid: i for i, tid in enumerate(track_ids)} # Confidence-weighted matrix for implicit feedback # Higher play counts = higher confidence, not higher rating R = scipy.sparse.lil_matrix((len(user_ids), len(track_ids))) for interaction in interactions: u = user_to_idx[interaction.user_id] t = track_to_idx[interaction.track_id] R[u, t] = self._compute_interaction_weight(interaction) # Alternative Least Squares (ALS) for training # Scales well for distributed computation (Spark MLlib) model = implicit.als.AlternatingLeastSquares( factors=self.embedding_dim, regularization=0.01, iterations=15 ) model.fit(R.tocsr()) # Store embeddings for user_id, idx in user_to_idx.items(): self.user_embeddings[user_id] = model.user_factors[idx] for track_id, idx in track_to_idx.items(): self.track_embeddings[track_id] = model.item_factors[idx] def recommend(self, user_id: str, num_candidates: int) -> List[Tuple[str, float]]: """ Recommend tracks for user based on embedding similarity. For a user, score all tracks by dot product with user embedding. Return top N that user hasn't already interacted with. """ if user_id not in self.user_embeddings: return [] # Cold start - defer to other models user_vec = self.user_embeddings[user_id] # Score all tracks scores = [] for track_id, track_vec in self.track_embeddings.items(): score = np.dot(user_vec, track_vec) scores.append((track_id, score)) # Filter already-heard tracks and sort heard_tracks = self._get_user_history(user_id) candidates = [ (tid, score) for tid, score in scores if tid not in heard_tracks ] candidates.sort(key=lambda x: x[1], reverse=True) return candidates[:num_candidates] def _compute_interaction_weight(self, interaction: Interaction) -> float: """ Convert interaction to confidence weight. Not all plays are equal: - Full listen > partial listen - Saved tracks get bonus - Skipped tracks get penalty - Recent interactions weighted higher """ base_weight = 1.0 # Duration factor (what % of track was played) duration_factor = min(1.0, interaction.listened_ms / interaction.track_duration_ms) # Action bonuses if interaction.saved: base_weight += 2.0 if interaction.repeated: base_weight += 1.5 if interaction.skipped and interaction.listened_ms < 30000: # Skip in first 30s base_weight = -0.5 # Time decay (recent interactions matter more) days_ago = (datetime.now() - interaction.timestamp).days time_decay = math.exp(-days_ago / 90) # ~3 month half-life return base_weight * duration_factor * time_decayScaling to Handle 615 Million Users:
Matrix factorization at this scale requires distributed computing. With 615M users and 100M tracks, even storing embeddings requires careful design:
| Component | Size Calculation | Storage Required |
|---|---|---|
| User embeddings | 615M users × 256 dims × 4 bytes | ~630 GB |
| Track embeddings | 100M tracks × 256 dims × 4 bytes | ~100 GB |
| Index for ANN search | ~2x raw embeddings | ~1.4 TB |
| Total | ~2.2 TB |
Finding similar tracks/users via brute-force dot product is O(n). For real-time recommendations, we use Approximate Nearest Neighbor (ANN) indices like FAISS, ScaNN, or Annoy. These provide ~O(log n) lookup with 95%+ recall on true nearest neighbors.
Content-based recommendation uses the actual audio content to find similar tracks. This solves the cold start problem—new tracks can be recommended immediately based on how they sound.
Audio Feature Extraction:
Spotify extracts a rich set of audio features from every track:
| Feature | Range | Description | Use in Recommendations |
|---|---|---|---|
| Tempo (BPM) | 50-200 | Beats per minute | Workout playlists, energy matching |
| Key | 0-11 (C to B) | Musical key | Harmonic mixing, DJ features |
| Mode | 0-1 | Major (1) or Minor (0) | Mood classification |
| Energy | 0-1 | Intensity and activity | Activity-based recommendations |
| Danceability | 0-1 | Suitability for dancing | Party/dance playlists |
| Valence | 0-1 | Musical positiveness/happiness | Mood matching |
| Acousticness | 0-1 | Likelihood of being acoustic | Genre classification |
| Instrumentalness | 0-1 | Predicts lack of vocals | Focus/study playlists |
| Speechiness | 0-1 | Presence of spoken words | Distinguish podcast/rap/instrumental |
| Loudness | -60 to 0 dB | Overall loudness | Normalization, energy |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
class AudioAnalysisPipeline: """ Extract audio features from tracks for content-based recommendations. Uses deep learning models trained on: - Mel spectrograms for genre/mood classification - Beat tracking for tempo/rhythm - Harmonic analysis for key/mode """ def __init__(self): # Pre-trained models self.genre_model = load_model("genre_classifier_v3") self.mood_model = load_model("mood_classifier_v2") self.audio_embedding_model = load_model("audio_embedder_v4") async def analyze_track(self, track_path: str) -> AudioFeatures: """ Full audio analysis of a track. Pipeline: 1. Load audio as waveform 2. Extract low-level features (spectrogram, MFCC) 3. Run feature extractors (tempo, key, rhythm) 4. Run deep learning classifiers (genre, mood) 5. Generate audio embedding for similarity search """ # Load audio waveform, sample_rate = librosa.load(track_path, sr=22050) # Low-level feature extraction spectogram = librosa.feature.melspectrogram(y=waveform, sr=sample_rate) mfcc = librosa.feature.mfcc(y=waveform, sr=sample_rate, n_mfcc=13) # Rhythm analysis tempo, beat_frames = librosa.beat.beat_track(y=waveform, sr=sample_rate) # Harmonic analysis chroma = librosa.feature.chroma_cqt(y=waveform, sr=sample_rate) key, mode = self._estimate_key_mode(chroma) # Deep learning features genre_probs = self.genre_model.predict(spectogram) mood_values = self.mood_model.predict(spectogram) # Audio embedding for similarity search (128-dim vector) audio_embedding = self.audio_embedding_model.encode(spectogram) return AudioFeatures( tempo=tempo, key=key, mode=mode, energy=self._compute_energy(waveform), danceability=self._compute_danceability(beat_frames, tempo), valence=mood_values['valence'], genres=genre_probs, embedding=audio_embedding ) def find_similar_tracks( self, seed_track_ids: List[str], num_results: int, exclude_ids: Set[str] = None ) -> List[Tuple[str, float]]: """ Find tracks similar to seeds based on audio features. Combines: 1. Audio embedding similarity (primary) 2. Feature matching (secondary filter) """ # Get seed embeddings seed_embeddings = [ self.embedding_store.get(tid) for tid in seed_track_ids ] seed_centroid = np.mean(seed_embeddings, axis=0) # ANN search for similar embeddings similar_ids, distances = self.ann_index.search( seed_centroid, num_results * 3 # Over-fetch for filtering ) # Filter by feature compatibility seed_features = self._aggregate_features(seed_track_ids) filtered = [] for track_id, distance in zip(similar_ids, distances): if exclude_ids and track_id in exclude_ids: continue if track_id in seed_track_ids: continue track_features = self.feature_store.get(track_id) if self._features_compatible(seed_features, track_features): similarity = 1 / (1 + distance) # Convert distance to similarity filtered.append((track_id, similarity)) return filtered[:num_results] def _features_compatible( self, seed: AggregateFeatures, candidate: AudioFeatures ) -> bool: """ Check if candidate track's features are compatible with seed. Allows some variance but filters extreme mismatches. """ # Tempo: ±20 BPM or half/double time if not self._tempo_compatible(seed.avg_tempo, candidate.tempo): return False # Energy: within 0.3 range if abs(seed.avg_energy - candidate.energy) > 0.3: return False return TrueAudio fingerprinting (Shazam-style) identifies exact tracks from short samples. Feature extraction analyzes overall characteristics. Both use spectrograms but serve different purposes. Fingerprinting answers 'what is this?' while features answer 'what is this like?'
Training recommendation models at Spotify's scale requires a sophisticated ML infrastructure. The pipeline processes billions of events, trains models on petabytes of data, and serves predictions with millisecond latency.
Pipeline Overview:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
┌─────────────────────────────────────────────────────────────────────┐│ ML PIPELINE ARCHITECTURE │├─────────────────────────────────────────────────────────────────────┤│ ││ ┌──────────────────────────────────────────────────────────────┐ ││ │ DATA COLLECTION │ ││ │ Event Streaming (Kafka/Pub-Sub) │ ││ │ • Play events (10B+/day) │ ││ │ • Skip/save/repeat events │ ││ │ • Session context (device, time, playlist) │ ││ │ • Search queries │ ││ └──────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌──────────────────────────────────────────────────────────────┐ ││ │ DATA LAKE │ ││ │ • Raw event storage (S3/GCS) │ ││ │ • Parquet format for efficient columnar access │ ││ │ • Partitioned by date for time-series queries │ ││ │ • Retention: years of historical data │ ││ └──────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌──────────────────────────────────────────────────────────────┐ ││ │ FEATURE ENGINEERING │ ││ │ Batch Processing (Spark/Beam) │ ││ │ • User taste profiles │ ││ │ • Track popularity metrics │ ││ │ • Interaction matrices │ ││ │ • Temporal patterns │ ││ └──────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌──────────────────────────────────────────────────────────────┐ ││ │ FEATURE STORE │ ││ │ • Online: Redis/DynamoDB (real-time serving) │ ││ │ • Offline: Hive/BigQuery (training) │ ││ │ • Unified feature definitions │ ││ │ • Point-in-time correctness for training │ ││ └──────────────────────────────────────────────────────────────┘ ││ │ ││ ┌───────────────────┴───────────────────┐ ││ ▼ ▼ ││ ┌──────────────────────┐ ┌──────────────────────────┐ ││ │ MODEL TRAINING │ │ REAL-TIME INFERENCE │ ││ │ (Batch) │ │ (Online) │ ││ │ │ │ │ ││ │ • Spark ML/TF/PyTorch │ • Feature lookup │ ││ │ • GPU clusters │ • Model serving (TF Serving) ││ │ • Weekly retraining │ • <50ms latency │ ││ │ • A/B test variants │ • Scale: 1M+ QPS │ ││ └──────────────────────┘ └──────────────────────────┘ ││ │ │ ││ ▼ ▼ ││ ┌──────────────────────────────────────────────────────────────┐ ││ │ MODEL REGISTRY │ ││ │ • Versioned model storage │ ││ │ • Deployment orchestration │ ││ │ • Rollback capability │ ││ │ • A/B experiment configuration │ ││ └──────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────────┘Feature Engineering:
Feature engineering transforms raw events into model inputs. Quality features often matter more than model architecture.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
class UserFeatureEngine: """ Compute user features for recommendation models. """ def compute_taste_profile(self, user_id: str) -> UserTasteProfile: """ Build a comprehensive taste profile from user's listening history. """ # Get listening history (last 90 days) history = self.event_store.get_user_plays( user_id, days=90, limit=10000 ) # Genre affinity - weighted by play count and recency genre_scores = defaultdict(float) for event in history: track_genres = self.track_metadata.get_genres(event.track_id) recency_weight = self._recency_weight(event.timestamp) for genre, confidence in track_genres: genre_scores[genre] += confidence * recency_weight # Normalize genre scores to distribution total = sum(genre_scores.values()) genre_distribution = {g: s/total for g, s in genre_scores.items()} # Artist affinity artist_scores = defaultdict(float) for event in history: artists = self.track_metadata.get_artists(event.track_id) recency_weight = self._recency_weight(event.timestamp) for artist_id in artists: artist_scores[artist_id] += recency_weight # Audio preference aggregates audio_preferences = self._compute_audio_preferences(history) # Listening patterns patterns = self._compute_temporal_patterns(history) return UserTasteProfile( user_id=user_id, genre_distribution=genre_distribution, top_artists=self._top_n(artist_scores, 100), audio_preferences=audio_preferences, listening_patterns=patterns, profile_age_days=self._profile_age(history), diversity_score=self._compute_diversity(genre_distribution), updated_at=datetime.utcnow() ) def _compute_audio_preferences(self, history: List[PlayEvent]) -> AudioPreferences: """ Aggregate audio features across listening history. Returns preferred ranges/distributions for each feature. """ features = [] weights = [] for event in history: track_features = self.audio_feature_store.get(event.track_id) if track_features: features.append(track_features) # Weight by engagement (full listens > skips) weights.append(self._engagement_weight(event)) if not features: return AudioPreferences.default() # Weighted aggregation return AudioPreferences( tempo_range=( np.average([f.tempo for f in features], weights=weights) - 20, np.average([f.tempo for f in features], weights=weights) + 20 ), energy_range=( max(0, np.average([f.energy for f in features], weights=weights) - 0.2), min(1, np.average([f.energy for f in features], weights=weights) + 0.2) ), valence_mean=np.average([f.valence for f in features], weights=weights), danceability_mean=np.average([f.danceability for f in features], weights=weights), # Distribution captures preference for acoustic vs electronic, etc. acousticness_dist=self._compute_distribution([f.acousticness for f in features], weights) ) def _compute_temporal_patterns(self, history: List[PlayEvent]) -> ListeningPatterns: """ Extract temporal listening patterns. When does user listen? What do they listen to at different times? """ hour_counts = np.zeros(24) weekday_counts = np.zeros(7) morning_genres = defaultdict(float) evening_genres = defaultdict(float) for event in history: hour = event.timestamp.hour weekday = event.timestamp.weekday() hour_counts[hour] += 1 weekday_counts[weekday] += 1 # Genre by time of day genres = self.track_metadata.get_genres(event.track_id) if 5 <= hour < 12: for genre, conf in genres: morning_genres[genre] += conf elif 18 <= hour < 24: for genre, conf in genres: evening_genres[genre] += conf return ListeningPatterns( hourly_distribution=hour_counts / hour_counts.sum(), weekday_distribution=weekday_counts / weekday_counts.sum(), peak_hours=np.argsort(hour_counts)[-3:].tolist(), morning_genre_preference=self._normalize_dict(morning_genres), evening_genre_preference=self._normalize_dict(evening_genres) )A feature store (Feast, Tecton, or custom) ensures training and serving use identical features. Without it, 'training-serving skew' causes models to perform worse in production than in offline evaluation. The feature store also enables feature reuse across models.
Batch-trained models capture long-term preferences, but user mood and context change throughout the day. Real-time personalization adapts recommendations to the current listening session.
Session-Based Adaptation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146
class RealTimePersonalizer: """ Adapts recommendations based on current session context. Key insight: What a user wants right now may differ from their long-term average preferences. """ def __init__(self): self.session_cache = SessionCache() # Recent session state self.context_model = ContextualBandit() # Real-time learning async def personalize_recommendations( self, user_id: str, base_recommendations: List[str], # From batch models context: StreamingContext ) -> List[str]: """ Re-rank base recommendations based on real-time signals. """ # Get current session state session = await self.session_cache.get_or_create(user_id) # Recent session signals recent_plays = session.last_n_tracks(10) recent_skips = session.last_n_skips(5) session_mood = self._infer_session_mood(recent_plays) # Re-score each recommendation scored_recs = [] for track_id in base_recommendations: track_features = await self.feature_store.get_track(track_id) # Session compatibility score session_score = self._compute_session_score( track_features, session_mood, recent_plays, recent_skips ) # Context score (time of day, device, activity) context_score = self._compute_context_score( track_features, context ) # Exploration bonus for diversity exploration_bonus = self._exploration_score( track_features, session ) final_score = ( 0.5 * session_score + 0.3 * context_score + 0.2 * exploration_bonus ) scored_recs.append((track_id, final_score)) # Sort by final score scored_recs.sort(key=lambda x: x[1], reverse=True) return [track_id for track_id, _ in scored_recs] def _infer_session_mood(self, recent_plays: List[TrackPlay]) -> SessionMood: """ Infer current mood from recent listening. If user is listening to high-energy tracks, recommend more energy. If mellow, stay mellow. """ if not recent_plays: return SessionMood.neutral() features = [ self.feature_store.get_track_features(play.track_id) for play in recent_plays ] avg_energy = np.mean([f.energy for f in features]) avg_valence = np.mean([f.valence for f in features]) avg_danceability = np.mean([f.danceability for f in features]) return SessionMood( energy_level=avg_energy, happiness_level=avg_valence, activity_level=avg_danceability ) def _compute_session_score( self, track_features: TrackFeatures, session_mood: SessionMood, recent_plays: List[TrackPlay], recent_skips: List[str] ) -> float: """ Score track compatibility with current session. """ score = 1.0 # Match current energy level energy_diff = abs(track_features.energy - session_mood.energy_level) score -= energy_diff * 0.5 # Penalize if similar track was just skipped skip_track_ids = {skip for skip in recent_skips} if self._is_similar_to_any(track_features.track_id, skip_track_ids): score -= 0.3 # Boost if artist was recently played (but not too recently) recent_artists = {play.artist_id for play in recent_plays[3:]} if track_features.artist_id in recent_artists: score += 0.1 return max(0, score) async def on_track_feedback( self, user_id: str, track_id: str, feedback: TrackFeedback ): """ Update session state and models based on feedback. This enables learning within a session. """ session = await self.session_cache.get(user_id) if feedback.action == 'play_complete': session.add_positive_signal(track_id) elif feedback.action == 'skip': session.add_negative_signal(track_id) elif feedback.action == 'save': session.add_strong_positive_signal(track_id) # Update contextual bandit for this (context, arm) pair await self.context_model.update( context=session.current_context, arm=track_id, reward=self._feedback_to_reward(feedback) )Contextual bandits balance exploration (trying new recommendations) with exploitation (using known preferences). They learn online from each user interaction, quickly adapting to changing preferences—essential for real-time personalization.
The explore-exploit dilemma is fundamental to recommendation systems. Should we recommend tracks we're confident the user will like (exploit) or introduce new music to expand their taste (explore)?
The Problem:
Balancing Strategy:
Spotify's approach varies by recommendation surface:
| Surface | Exploration Level | Rationale |
|---|---|---|
| Radio | Very Low (5-10%) | User expects familiar-sounding music |
| Your Top Mixes | Low (10-15%) | Based on established favorites |
| Daily Mix | Medium (20-30%) | Familiar + some discovery |
| Discover Weekly | High (40-50%) | Purpose is discovery |
| Release Radar | Low (10%) | New from followed artists |
| Made for You Home | Medium (25%) | Personalized variety |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
class ExplorationManager: """ Manages exploration-exploitation trade-off across recommendations. """ def __init__(self): # Base exploration rates by surface self.surface_exploration_rates = { 'radio': 0.08, 'daily_mix': 0.25, 'discover_weekly': 0.45, 'made_for_you': 0.25, } def inject_exploration( self, recommendations: List[str], user_profile: UserTasteProfile, surface: str ) -> List[str]: """ Inject exploration tracks into recommendations. Strategy: 1. Determine exploration rate for user and surface 2. Select exploration tracks (not in typical taste) 3. Inject at strategic positions """ base_rate = self.surface_exploration_rates.get(surface, 0.2) # Adjust rate based on user adjusted_rate = self._adjust_rate_for_user(base_rate, user_profile) num_explore = int(len(recommendations) * adjusted_rate) # Select exploration tracks explore_tracks = self._select_exploration_tracks( user_profile, num_explore ) # Inject at good positions (not first few, spread out) return self._inject_at_positions( recommendations, explore_tracks, start_position=3, # Never first 3 tracks spacing=4 # At least 4 tracks between explorations ) def _adjust_rate_for_user( self, base_rate: float, user_profile: UserTasteProfile ) -> float: """ Personalize exploration rate. New users: More exploration to learn preferences Diverse listeners: More exploration (they like variety) Narrow listeners: Less exploration (stick to favorites) Recently churned from exploration surface: Less exploration """ rate = base_rate # New users need more exploration if user_profile.profile_age_days < 30: rate *= 1.3 # High diversity score = user likes variety if user_profile.diversity_score > 0.7: rate *= 1.2 elif user_profile.diversity_score < 0.3: rate *= 0.7 # Check recent exploration success explore_success = self._get_exploration_success_rate(user_profile.user_id) if explore_success > 0.3: # User engages with explorations rate *= 1.1 elif explore_success < 0.1: # User skips most explorations rate *= 0.8 return min(0.5, max(0.05, rate)) # Bound between 5% and 50% def _select_exploration_tracks( self, user_profile: UserTasteProfile, num_tracks: int ) -> List[str]: """ Select tracks outside user's typical taste that might appeal. Strategy: - Adjacent genres (rock listener → try indie) - Breaking artists in preferred genres - Similar audio features, different genre - Social signals (friends listening) """ candidates = [] # Adjacent genre exploration adjacent_genres = self._get_adjacent_genres(user_profile.top_genres) genre_candidates = self.track_index.get_by_genres( genres=adjacent_genres, exclude_artists=user_profile.top_artists[:50], min_popularity=30 # Not too obscure for exploration ) candidates.extend(genre_candidates[:20]) # Audio-similar from different genres audio_similar = self.audio_model.find_similar_tracks( seed_features=user_profile.audio_preferences, exclude_genres=user_profile.top_genres[:5], num_results=20 ) candidates.extend(audio_similar) # Breaking artists (rising but not yet popular) breaking = self.trending_service.get_breaking_artists( genres=user_profile.top_genres, region=user_profile.region ) for artist in breaking[:10]: top_track = self.artist_service.get_top_track(artist.id) if top_track not in user_profile.played_tracks: candidates.append(top_track) # Score by likelihood of success scored = [] for track_id in candidates: score = self._exploration_success_likelihood( track_id, user_profile ) scored.append((track_id, score)) scored.sort(key=lambda x: x[1], reverse=True) return [track_id for track_id, _ in scored[:num_tracks]]Thompson Sampling is a mathematically elegant approach to exploration. It maintains probability distributions over the value of each option and samples from these distributions to decide what to recommend. Options with high uncertainty get explored more.
Discover Weekly is Spotify's flagship recommendation feature—a 30-track playlist generated fresh for every user every Monday. It reaches 40 million users and has become a cultural phenomenon. Let's examine its architecture.
Pipeline Overview:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
class DiscoverWeeklyPipeline: """ Generate personalized Discover Weekly playlists. Runs weekly as a batch job, generating playlists for all users. Target: Complete within 24 hours for 600M+ users. """ def generate_all_playlists(self): """ Weekly batch job to generate all Discover Weekly playlists. Architecture: - Distributed across 1000s of Spark executors - Users partitioned for parallel processing - Results written to playlist database """ # Partition users across workers user_partitions = self.partition_users(partition_count=10000) # Process in parallel (Spark/Beam) results = self.spark.parallelize(user_partitions).flatMap( self.generate_for_partition ) # Write to storage results.foreach(self.write_playlist) def generate_for_user(self, user_id: str) -> List[str]: """ Generate 30-track Discover Weekly for a single user. """ # Get user taste profile profile = self.user_feature_store.get_taste_profile(user_id) # Step 1: Candidate generation (cast wide net) candidates = self._generate_candidates(user_id, profile, num=500) # Step 2: Scoring (rank by predicted engagement) scored_candidates = self._score_candidates(user_id, candidates) # Step 3: Filtering (remove invalid tracks) valid_candidates = self._apply_filters(user_id, scored_candidates) # Step 4: Diversity optimization (ensure variety) diverse_tracks = self._optimize_diversity(valid_candidates, num=30) # Step 5: Ordering (flow and energy arc) ordered_tracks = self._optimize_order(diverse_tracks) return ordered_tracks def _generate_candidates( self, user_id: str, profile: UserTasteProfile, num: int ) -> List[str]: """ Generate candidate tracks from multiple sources. """ candidates = set() # Source 1: Collaborative filtering (users like you also liked...) cf_recs = self.collaborative_model.recommend(user_id, num // 3) candidates.update([track_id for track_id, _ in cf_recs]) # Source 2: Artist-based (other tracks from artists in taste profile) for artist_id in profile.adjacent_artists: artist_tracks = self.artist_service.get_non_hit_tracks(artist_id, limit=5) candidates.update(artist_tracks) # Source 3: Audio similarity (tracks that sound like favorites) seed_tracks = profile.get_seed_tracks(50) audio_similar = self.audio_model.find_similar_tracks(seed_tracks, num // 3) candidates.update([track_id for track_id, _ in audio_similar]) # Source 4: Genre exploration adjacent_genres = self._get_adjacent_genres(profile) genre_tracks = self.genre_index.sample_tracks( genres=adjacent_genres, popularity_range=(20, 80), # Not top hits, not too obscure num=num // 4 ) candidates.update(genre_tracks) return list(candidates) def _apply_filters( self, user_id: str, scored_candidates: List[Tuple[str, float]] ) -> List[Tuple[str, float]]: """ Filter out tracks that shouldn't be recommended. """ # Get user's played tracks (no repeats from last 4 weeks) recently_played = self.history_service.get_played_tracks( user_id, days=28 ) # Get tracks from previous Discover Weeklys (no recent repeats) previous_dw_tracks = self.dw_history.get_tracks(user_id, weeks=12) # Get blocked artists (explicit user blocks) blocked_artists = self.user_prefs.get_blocked_artists(user_id) # Get region restrictions user_region = self.user_service.get_region(user_id) filtered = [] for track_id, score in scored_candidates: # Already played recently if track_id in recently_played: continue # Already in recent DW if track_id in previous_dw_tracks: continue # Blocked artist track_artists = self.track_service.get_artists(track_id) if any(artist in blocked_artists for artist in track_artists): continue # Region restriction if not self.licensing.is_available(track_id, user_region): continue # Explicit content filter (if user enabled) if self.user_prefs.filter_explicit(user_id): if self.track_service.is_explicit(track_id): continue filtered.append((track_id, score)) return filtered def _optimize_diversity( self, candidates: List[Tuple[str, float]], num: int ) -> List[str]: """ Select diverse subset of highly-scored tracks. Balances: - High scores (user will like) - Diversity (avoid repetition) Uses Maximal Marginal Relevance (MMR). """ selected = [] remaining = list(candidates) while len(selected) < num and remaining: best_track = None best_mmr = -float('inf') for track_id, relevance in remaining: # Diversity: min similarity to already selected diversity = 1.0 for selected_id in selected: sim = self._track_similarity(track_id, selected_id) diversity = min(diversity, 1 - sim) # MMR score balances relevance and diversity lambda_param = 0.6 # Higher = more relevance focus mmr = lambda_param * relevance + (1 - lambda_param) * diversity if mmr > best_mmr: best_mmr = mmr best_track = track_id if best_track: selected.append(best_track) remaining = [(t, s) for t, s in remaining if t != best_track] return selectedDiscover Weekly's success isn't just algorithmic—it's also about presentation. Arriving every Monday creates anticipation. The 30-track format is digestible. And the art of 'just different enough' balances familiar comfort with exploratory discovery.
We've covered the complete recommendation system architecture. Let's consolidate the key design decisions:
| Component | Approach | Scale/Performance |
|---|---|---|
| Collaborative Filtering | Matrix factorization + ANN | 615M users, 100M tracks, weekly retraining |
| Content-Based | Audio features + embeddings | 100ms inference, FAISS for similarity |
| Feature Store | Online (Redis) + Offline (Hive) | Sub-10ms lookup, petabytes offline |
| Real-Time | Session-based contextual bandits | Updates per interaction |
| Diversity | MMR for playlist optimization | Balance familiarity and discovery |
| Exploration | Adaptive rates per surface | 5-50% based on context |
What's next:
With recommendations covered, we'll move to Offline Mode—how to enable users to download and play music without network connectivity while respecting DRM requirements.
You now understand how to architect a recommendation system at scale: from collaborative filtering and audio analysis, through ML pipelines and feature stores, to real-time personalization and exploration strategies. This is the intelligence that transforms a music library into a personalized discovery engine.