System DesignSpotify Music Streaming

Designing Spotify: Music Streaming at Scale

LevelAdvanced

Duration180 mins

TopicSpotify Music Streaming

4 / 6

Recommendation System

The Art of Musical Discovery

Music recommendation is perhaps the most impactful differentiator between streaming platforms. With over 100 million tracks in the catalog, users cannot possibly find all the music they'd love on their own. The recommendation system becomes a personal DJ, curator, and explorer—surfacing music that matches individual tastes, moods, and contexts.

Discover Weekly, Spotify's flagship personalized playlist, reaches 40 million users with custom-generated 30-track playlists every Monday. Daily Mixes adapt throughout the day based on listening patterns. Release Radar surfaces new releases from artists users follow. Behind these features lies a sophisticated machine learning infrastructure processing billions of events daily.

This page explores how to architect a recommendation system at this scale.

What You Will Learn

You will understand the core recommendation approaches (collaborative filtering, content-based, hybrid), the ML pipeline architecture, feature engineering at scale, real-time personalization, and how to balance exploration vs. exploitation in recommendations.

Recommendation Fundamentals

Before diving into architecture, let's understand the core approaches to recommendation. Each has distinct strengths and limitations.

The Three Pillars of Recommendation:

Recommendation Approaches
Approach	How It Works	Strengths	Weaknesses
Collaborative Filtering	Users who liked X also liked Y	Discovers unexpected connections	Cold start problem for new users/items
Content-Based	This song sounds like that song	Works for new items immediately	Limited to similar content, no serendipity
Knowledge-Based	Rules: 'If user likes rock, suggest classic rock'	Explainable, controllable	Doesn't scale, misses nuance

Spotify's Hybrid Approach:

Modern recommendation systems combine all three approaches. Spotify famously uses a hybrid model that leverages:

Recommendation Signals

•Listening history — What users play, skip, repeat, and save. The most direct signal of preference.
•Collaborative signals — Users with similar taste histories tend to have similar future preferences.
•Audio features — Acoustic analysis of tracks: tempo, key, danceability, energy, valence (mood).
•Cultural context — Blogs, reviews, social media mentions of artists and songs.
•Temporal patterns — Time of day, day of week, seasonal preferences.
•Context signals — Device type, location, activity (workout, focus, party).

hybrid-recommendation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
class HybridRecommendationEngine:
    """
    Combines multiple recommendation approaches for optimal results.
    """
    
    def __init__(self):
        self.collaborative_model = CollaborativeFilteringModel()
        self.content_model = ContentBasedModel()
        self.audio_model = AudioFeatureModel()
        self.contextual_model = ContextualModel()
    
    def generate_recommendations(
        self,
        user_id: str,
        context: RecommendationContext,
        num_tracks: int = 30
    ) -> List[RecommendedTrack]:
        """
        Generate recommendations using ensemble of models.
        
        Each model produces candidate tracks with scores.
        Final ranking blends all scores based on learned weights.
        """
        # Get user's listening profile
        user_profile = self.user_service.get_taste_profile(user_id)
        
        # Generate candidates from each model
        candidates = {}
        
        # Collaborative: "Users like you also liked..."
        collab_recs = self.collaborative_model.recommend(
            user_id, 
            num_candidates=100
        )
        for track_id, score in collab_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['collaborative'] = score
        
        # Content-based: "Similar to what you listen to..."
        content_recs = self.content_model.recommend(
            user_profile.top_tracks,
            num_candidates=100
        )
        for track_id, score in content_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['content'] = score
        
        # Audio features: "Sounds like your taste..."
        audio_recs = self.audio_model.recommend(
            user_profile.audio_preferences,
            num_candidates=100
        )
        for track_id, score in audio_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['audio'] = score
        
        # Context-aware: "Right for this moment..."
        context_recs = self.contextual_model.recommend(
            user_id,
            context,
            num_candidates=50
        )
        for track_id, score in context_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['context'] = score
        
        # Blend scores using learned weights
        final_scores = self._blend_scores(candidates, user_profile, context)
        
        # Apply diversity filter to avoid too-similar tracks
        diverse_tracks = self._ensure_diversity(final_scores, num_tracks)
        
        return diverse_tracks
    
    def _blend_scores(
        self,
        candidates: Dict[str, Dict[str, float]],
        user_profile: UserProfile,
        context: RecommendationContext
    ) -> List[Tuple[str, float]]:
        """
        Combine scores from different models.
        
        Weights can be:
        - Static (manually tuned)
        - User-specific (learned from feedback)
        - Context-specific (e.g., more audio-based for new user)
        """
        weights = self._get_blending_weights(user_profile, context)
        
        final_scores = []
        for track_id, scores in candidates.items():
            combined = (
                scores.get('collaborative', 0) * weights['collaborative'] +
                scores.get('content', 0) * weights['content'] +
                scores.get('audio', 0) * weights['audio'] +
                scores.get('context', 0) * weights['context']
            )
            final_scores.append((track_id, combined))
        
        return sorted(final_scores, key=lambda x: x[1], reverse=True)

The Cold Start Problem

New users have no listening history. New tracks have no plays. This 'cold start' problem is addressed by: onboarding flows asking genre preferences, leveraging registration data (age, location), defaulting to popularity, and quickly learning from initial interactions.

Collaborative Filtering at Scale

Collaborative filtering is the workhorse of recommendation systems. The core idea: if User A and User B have similar taste (they like many of the same tracks), then tracks that A likes but B hasn't heard are good recommendations for B.

Matrix Factorization Approach:

We model the user-track interaction as a sparse matrix and factorize it into user and track embeddings:

matrix-factorization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
class MatrixFactorizationModel:
    """
    Collaborative filtering using matrix factorization.
    
    User-item interaction matrix R ≈ U × V^T
    
    Where:
    - R is the (users × tracks) interaction matrix
    - U is (users × k) user embedding matrix  
    - V is (tracks × k) track embedding matrix
    - k is the embedding dimension (typically 100-300)
    
    For any (user, track) pair, predicted affinity is:
    score = dot_product(user_embedding, track_embedding)
    """
    
    def __init__(self, embedding_dim=256):
        self.embedding_dim = embedding_dim
        self.user_embeddings = {}  # user_id -> vector
        self.track_embeddings = {}  # track_id -> vector
    
    def train(self, interactions: List[Interaction]):
        """
        Train model on user-track interactions.
        
        Interactions include:
        - Plays (weighted by duration / track length)
        - Saves (strong positive signal)
        - Skips (negative signal)
        - Repeats (very strong positive)
        
        Uses implicit feedback (not explicit ratings).
        """
        # Build interaction matrix
        user_ids = list(set(i.user_id for i in interactions))
        track_ids = list(set(i.track_id for i in interactions))
        
        user_to_idx = {uid: i for i, uid in enumerate(user_ids)}
        track_to_idx = {tid: i for i, tid in enumerate(track_ids)}
        
        # Confidence-weighted matrix for implicit feedback
        # Higher play counts = higher confidence, not higher rating
        R = scipy.sparse.lil_matrix((len(user_ids), len(track_ids)))
        
        for interaction in interactions:
            u = user_to_idx[interaction.user_id]
            t = track_to_idx[interaction.track_id]
            R[u, t] = self._compute_interaction_weight(interaction)
        
        # Alternative Least Squares (ALS) for training
        # Scales well for distributed computation (Spark MLlib)
        model = implicit.als.AlternatingLeastSquares(
            factors=self.embedding_dim,
            regularization=0.01,
            iterations=15
        )
        model.fit(R.tocsr())
        
        # Store embeddings
        for user_id, idx in user_to_idx.items():
            self.user_embeddings[user_id] = model.user_factors[idx]
        
        for track_id, idx in track_to_idx.items():
            self.track_embeddings[track_id] = model.item_factors[idx]
    
    def recommend(self, user_id: str, num_candidates: int) -> List[Tuple[str, float]]:
        """
        Recommend tracks for user based on embedding similarity.
        
        For a user, score all tracks by dot product with user embedding.
        Return top N that user hasn't already interacted with.
        """
        if user_id not in self.user_embeddings:
            return []  # Cold start - defer to other models
        
        user_vec = self.user_embeddings[user_id]
        
        # Score all tracks
        scores = []
        for track_id, track_vec in self.track_embeddings.items():
            score = np.dot(user_vec, track_vec)
            scores.append((track_id, score))
        
        # Filter already-heard tracks and sort
        heard_tracks = self._get_user_history(user_id)
        candidates = [
            (tid, score) for tid, score in scores 
            if tid not in heard_tracks
        ]
        candidates.sort(key=lambda x: x[1], reverse=True)
        
        return candidates[:num_candidates]
    
    def _compute_interaction_weight(self, interaction: Interaction) -> float:
        """
        Convert interaction to confidence weight.
        
        Not all plays are equal:
        - Full listen > partial listen
        - Saved tracks get bonus
        - Skipped tracks get penalty
        - Recent interactions weighted higher
        """
        base_weight = 1.0
        
        # Duration factor (what % of track was played)
        duration_factor = min(1.0, interaction.listened_ms / interaction.track_duration_ms)
        
        # Action bonuses
        if interaction.saved:
            base_weight += 2.0
        if interaction.repeated:
            base_weight += 1.5
        if interaction.skipped and interaction.listened_ms < 30000:  # Skip in first 30s
            base_weight = -0.5
        
        # Time decay (recent interactions matter more)
        days_ago = (datetime.now() - interaction.timestamp).days
        time_decay = math.exp(-days_ago / 90)  # ~3 month half-life
        
        return base_weight * duration_factor * time_decay

Scaling to Handle 615 Million Users:

Matrix factorization at this scale requires distributed computing. With 615M users and 100M tracks, even storing embeddings requires careful design:

Embedding Storage Scale
Component	Size Calculation	Storage Required
User embeddings	615M users × 256 dims × 4 bytes	~630 GB
Track embeddings	100M tracks × 256 dims × 4 bytes	~100 GB
Index for ANN search	~2x raw embeddings	~1.4 TB
Total		~2.2 TB

Approximate Nearest Neighbors

Finding similar tracks/users via brute-force dot product is O(n). For real-time recommendations, we use Approximate Nearest Neighbor (ANN) indices like FAISS, ScaNN, or Annoy. These provide ~O(log n) lookup with 95%+ recall on true nearest neighbors.

Audio Feature Analysis

Content-based recommendation uses the actual audio content to find similar tracks. This solves the cold start problem—new tracks can be recommended immediately based on how they sound.

Audio Feature Extraction:

Spotify extracts a rich set of audio features from every track:

Audio Features Extracted
Feature	Range	Description	Use in Recommendations
Tempo (BPM)	50-200	Beats per minute	Workout playlists, energy matching
Key	0-11 (C to B)	Musical key	Harmonic mixing, DJ features
Mode	0-1	Major (1) or Minor (0)	Mood classification
Energy	0-1	Intensity and activity	Activity-based recommendations
Danceability	0-1	Suitability for dancing	Party/dance playlists
Valence	0-1	Musical positiveness/happiness	Mood matching
Acousticness	0-1	Likelihood of being acoustic	Genre classification
Instrumentalness	0-1	Predicts lack of vocals	Focus/study playlists
Speechiness	0-1	Presence of spoken words	Distinguish podcast/rap/instrumental
Loudness	-60 to 0 dB	Overall loudness	Normalization, energy

audio-analysis-pipeline.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
class AudioAnalysisPipeline:
    """
    Extract audio features from tracks for content-based recommendations.
    
    Uses deep learning models trained on:
    - Mel spectrograms for genre/mood classification
    - Beat tracking for tempo/rhythm
    - Harmonic analysis for key/mode
    """
    
    def __init__(self):
        # Pre-trained models
        self.genre_model = load_model("genre_classifier_v3")
        self.mood_model = load_model("mood_classifier_v2")
        self.audio_embedding_model = load_model("audio_embedder_v4")
    
    async def analyze_track(self, track_path: str) -> AudioFeatures:
        """
        Full audio analysis of a track.
        
        Pipeline:
        1. Load audio as waveform
        2. Extract low-level features (spectrogram, MFCC)
        3. Run feature extractors (tempo, key, rhythm)
        4. Run deep learning classifiers (genre, mood)
        5. Generate audio embedding for similarity search
        """
        # Load audio
        waveform, sample_rate = librosa.load(track_path, sr=22050)
        
        # Low-level feature extraction
        spectogram = librosa.feature.melspectrogram(y=waveform, sr=sample_rate)
        mfcc = librosa.feature.mfcc(y=waveform, sr=sample_rate, n_mfcc=13)
        
        # Rhythm analysis
        tempo, beat_frames = librosa.beat.beat_track(y=waveform, sr=sample_rate)
        
        # Harmonic analysis
        chroma = librosa.feature.chroma_cqt(y=waveform, sr=sample_rate)
        key, mode = self._estimate_key_mode(chroma)
        
        # Deep learning features
        genre_probs = self.genre_model.predict(spectogram)
        mood_values = self.mood_model.predict(spectogram)
        
        # Audio embedding for similarity search (128-dim vector)
        audio_embedding = self.audio_embedding_model.encode(spectogram)
        
        return AudioFeatures(
            tempo=tempo,
            key=key,
            mode=mode,
            energy=self._compute_energy(waveform),
            danceability=self._compute_danceability(beat_frames, tempo),
            valence=mood_values['valence'],
            genres=genre_probs,
            embedding=audio_embedding
        )
    
    def find_similar_tracks(
        self,
        seed_track_ids: List[str],
        num_results: int,
        exclude_ids: Set[str] = None
    ) -> List[Tuple[str, float]]:
        """
        Find tracks similar to seeds based on audio features.
        
        Combines:
        1. Audio embedding similarity (primary)
        2. Feature matching (secondary filter)
        """
        # Get seed embeddings
        seed_embeddings = [
            self.embedding_store.get(tid) for tid in seed_track_ids
        ]
        seed_centroid = np.mean(seed_embeddings, axis=0)
        
        # ANN search for similar embeddings
        similar_ids, distances = self.ann_index.search(
            seed_centroid,
            num_results * 3  # Over-fetch for filtering
        )
        
        # Filter by feature compatibility
        seed_features = self._aggregate_features(seed_track_ids)
        
        filtered = []
        for track_id, distance in zip(similar_ids, distances):
            if exclude_ids and track_id in exclude_ids:
                continue
            if track_id in seed_track_ids:
                continue
            
            track_features = self.feature_store.get(track_id)
            if self._features_compatible(seed_features, track_features):
                similarity = 1 / (1 + distance)  # Convert distance to similarity
                filtered.append((track_id, similarity))
        
        return filtered[:num_results]
    
    def _features_compatible(
        self,
        seed: AggregateFeatures,
        candidate: AudioFeatures
    ) -> bool:
        """
        Check if candidate track's features are compatible with seed.
        
        Allows some variance but filters extreme mismatches.
        """
        # Tempo: ±20 BPM or half/double time
        if not self._tempo_compatible(seed.avg_tempo, candidate.tempo):
            return False
        
        # Energy: within 0.3 range
        if abs(seed.avg_energy - candidate.energy) > 0.3:
            return False
        
        return True

Audio Fingerprinting vs. Feature Extraction

Audio fingerprinting (Shazam-style) identifies exact tracks from short samples. Feature extraction analyzes overall characteristics. Both use spectrograms but serve different purposes. Fingerprinting answers 'what is this?' while features answer 'what is this like?'

ML Pipeline Architecture

Training recommendation models at Spotify's scale requires a sophisticated ML infrastructure. The pipeline processes billions of events, trains models on petabytes of data, and serves predictions with millisecond latency.

Pipeline Overview:

ml-pipeline-architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
┌─────────────────────────────────────────────────────────────────────┐
│                    ML PIPELINE ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    DATA COLLECTION                            │   │
│  │  Event Streaming (Kafka/Pub-Sub)                              │   │
│  │  • Play events (10B+/day)                                     │   │
│  │  • Skip/save/repeat events                                    │   │
│  │  • Session context (device, time, playlist)                   │   │
│  │  • Search queries                                             │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    DATA LAKE                                  │   │
│  │  • Raw event storage (S3/GCS)                                 │   │
│  │  • Parquet format for efficient columnar access               │   │
│  │  • Partitioned by date for time-series queries                │   │
│  │  • Retention: years of historical data                        │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    FEATURE ENGINEERING                        │   │
│  │  Batch Processing (Spark/Beam)                                │   │
│  │  • User taste profiles                                        │   │
│  │  • Track popularity metrics                                   │   │
│  │  • Interaction matrices                                       │   │
│  │  • Temporal patterns                                          │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    FEATURE STORE                              │   │
│  │  • Online: Redis/DynamoDB (real-time serving)                 │   │
│  │  • Offline: Hive/BigQuery (training)                          │   │
│  │  • Unified feature definitions                                │   │
│  │  • Point-in-time correctness for training                     │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│            ┌───────────────────┴───────────────────┐                │
│            ▼                                       ▼                │
│  ┌──────────────────────┐            ┌──────────────────────────┐   │
│  │   MODEL TRAINING     │            │   REAL-TIME INFERENCE    │   │
│  │   (Batch)            │            │   (Online)               │   │
│  │                      │            │                          │   │
│  │   • Spark ML/TF/PyTorch         │   • Feature lookup         │   │
│  │   • GPU clusters                │   • Model serving (TF Serving) │
│  │   • Weekly retraining           │   • <50ms latency           │   │
│  │   • A/B test variants           │   • Scale: 1M+ QPS          │   │
│  └──────────────────────┘            └──────────────────────────┘   │
│            │                                       │                │
│            ▼                                       ▼                │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    MODEL REGISTRY                             │   │
│  │  • Versioned model storage                                    │   │
│  │  • Deployment orchestration                                   │   │
│  │  • Rollback capability                                        │   │
│  │  • A/B experiment configuration                               │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Feature Engineering:

Feature engineering transforms raw events into model inputs. Quality features often matter more than model architecture.

feature-engineering.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
class UserFeatureEngine:
    """
    Compute user features for recommendation models.
    """
    
    def compute_taste_profile(self, user_id: str) -> UserTasteProfile:
        """
        Build a comprehensive taste profile from user's listening history.
        """
        # Get listening history (last 90 days)
        history = self.event_store.get_user_plays(
            user_id,
            days=90,
            limit=10000
        )
        
        # Genre affinity - weighted by play count and recency
        genre_scores = defaultdict(float)
        for event in history:
            track_genres = self.track_metadata.get_genres(event.track_id)
            recency_weight = self._recency_weight(event.timestamp)
            for genre, confidence in track_genres:
                genre_scores[genre] += confidence * recency_weight
        
        # Normalize genre scores to distribution
        total = sum(genre_scores.values())
        genre_distribution = {g: s/total for g, s in genre_scores.items()}
        
        # Artist affinity
        artist_scores = defaultdict(float)
        for event in history:
            artists = self.track_metadata.get_artists(event.track_id)
            recency_weight = self._recency_weight(event.timestamp)
            for artist_id in artists:
                artist_scores[artist_id] += recency_weight
        
        # Audio preference aggregates
        audio_preferences = self._compute_audio_preferences(history)
        
        # Listening patterns
        patterns = self._compute_temporal_patterns(history)
        
        return UserTasteProfile(
            user_id=user_id,
            genre_distribution=genre_distribution,
            top_artists=self._top_n(artist_scores, 100),
            audio_preferences=audio_preferences,
            listening_patterns=patterns,
            profile_age_days=self._profile_age(history),
            diversity_score=self._compute_diversity(genre_distribution),
            updated_at=datetime.utcnow()
        )
    
    def _compute_audio_preferences(self, history: List[PlayEvent]) -> AudioPreferences:
        """
        Aggregate audio features across listening history.
        
        Returns preferred ranges/distributions for each feature.
        """
        features = []
        weights = []
        
        for event in history:
            track_features = self.audio_feature_store.get(event.track_id)
            if track_features:
                features.append(track_features)
                # Weight by engagement (full listens > skips)
                weights.append(self._engagement_weight(event))
        
        if not features:
            return AudioPreferences.default()
        
        # Weighted aggregation
        return AudioPreferences(
            tempo_range=(
                np.average([f.tempo for f in features], weights=weights) - 20,
                np.average([f.tempo for f in features], weights=weights) + 20
            ),
            energy_range=(
                max(0, np.average([f.energy for f in features], weights=weights) - 0.2),
                min(1, np.average([f.energy for f in features], weights=weights) + 0.2)
            ),
            valence_mean=np.average([f.valence for f in features], weights=weights),
            danceability_mean=np.average([f.danceability for f in features], weights=weights),
            # Distribution captures preference for acoustic vs electronic, etc.
            acousticness_dist=self._compute_distribution([f.acousticness for f in features], weights)
        )
    
    def _compute_temporal_patterns(self, history: List[PlayEvent]) -> ListeningPatterns:
        """
        Extract temporal listening patterns.
        
        When does user listen? What do they listen to at different times?
        """
        hour_counts = np.zeros(24)
        weekday_counts = np.zeros(7)
        morning_genres = defaultdict(float)
        evening_genres = defaultdict(float)
        
        for event in history:
            hour = event.timestamp.hour
            weekday = event.timestamp.weekday()
            hour_counts[hour] += 1
            weekday_counts[weekday] += 1
            
            # Genre by time of day
            genres = self.track_metadata.get_genres(event.track_id)
            if 5 <= hour < 12:
                for genre, conf in genres:
                    morning_genres[genre] += conf
            elif 18 <= hour < 24:
                for genre, conf in genres:
                    evening_genres[genre] += conf
        
        return ListeningPatterns(
            hourly_distribution=hour_counts / hour_counts.sum(),
            weekday_distribution=weekday_counts / weekday_counts.sum(),
            peak_hours=np.argsort(hour_counts)[-3:].tolist(),
            morning_genre_preference=self._normalize_dict(morning_genres),
            evening_genre_preference=self._normalize_dict(evening_genres)
        )

Feature Store Importance

A feature store (Feast, Tecton, or custom) ensures training and serving use identical features. Without it, 'training-serving skew' causes models to perform worse in production than in offline evaluation. The feature store also enables feature reuse across models.

Real-Time Personalization

Batch-trained models capture long-term preferences, but user mood and context change throughout the day. Real-time personalization adapts recommendations to the current listening session.

Session-Based Adaptation:

realtime-personalization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
class RealTimePersonalizer:
    """
    Adapts recommendations based on current session context.
    
    Key insight: What a user wants right now may differ from
    their long-term average preferences.
    """
    
    def __init__(self):
        self.session_cache = SessionCache()  # Recent session state
        self.context_model = ContextualBandit()  # Real-time learning
    
    async def personalize_recommendations(
        self,
        user_id: str,
        base_recommendations: List[str],  # From batch models
        context: StreamingContext
    ) -> List[str]:
        """
        Re-rank base recommendations based on real-time signals.
        """
        # Get current session state
        session = await self.session_cache.get_or_create(user_id)
        
        # Recent session signals
        recent_plays = session.last_n_tracks(10)
        recent_skips = session.last_n_skips(5)
        session_mood = self._infer_session_mood(recent_plays)
        
        # Re-score each recommendation
        scored_recs = []
        for track_id in base_recommendations:
            track_features = await self.feature_store.get_track(track_id)
            
            # Session compatibility score
            session_score = self._compute_session_score(
                track_features,
                session_mood,
                recent_plays,
                recent_skips
            )
            
            # Context score (time of day, device, activity)
            context_score = self._compute_context_score(
                track_features,
                context
            )
            
            # Exploration bonus for diversity
            exploration_bonus = self._exploration_score(
                track_features,
                session
            )
            
            final_score = (
                0.5 * session_score +
                0.3 * context_score +
                0.2 * exploration_bonus
            )
            
            scored_recs.append((track_id, final_score))
        
        # Sort by final score
        scored_recs.sort(key=lambda x: x[1], reverse=True)
        
        return [track_id for track_id, _ in scored_recs]
    
    def _infer_session_mood(self, recent_plays: List[TrackPlay]) -> SessionMood:
        """
        Infer current mood from recent listening.
        
        If user is listening to high-energy tracks, recommend more energy.
        If mellow, stay mellow.
        """
        if not recent_plays:
            return SessionMood.neutral()
        
        features = [
            self.feature_store.get_track_features(play.track_id)
            for play in recent_plays
        ]
        
        avg_energy = np.mean([f.energy for f in features])
        avg_valence = np.mean([f.valence for f in features])
        avg_danceability = np.mean([f.danceability for f in features])
        
        return SessionMood(
            energy_level=avg_energy,
            happiness_level=avg_valence,
            activity_level=avg_danceability
        )
    
    def _compute_session_score(
        self,
        track_features: TrackFeatures,
        session_mood: SessionMood,
        recent_plays: List[TrackPlay],
        recent_skips: List[str]
    ) -> float:
        """
        Score track compatibility with current session.
        """
        score = 1.0
        
        # Match current energy level
        energy_diff = abs(track_features.energy - session_mood.energy_level)
        score -= energy_diff * 0.5
        
        # Penalize if similar track was just skipped
        skip_track_ids = {skip for skip in recent_skips}
        if self._is_similar_to_any(track_features.track_id, skip_track_ids):
            score -= 0.3
        
        # Boost if artist was recently played (but not too recently)
        recent_artists = {play.artist_id for play in recent_plays[3:]}
        if track_features.artist_id in recent_artists:
            score += 0.1
        
        return max(0, score)
    
    async def on_track_feedback(
        self,
        user_id: str,
        track_id: str,
        feedback: TrackFeedback
    ):
        """
        Update session state and models based on feedback.
        
        This enables learning within a session.
        """
        session = await self.session_cache.get(user_id)
        
        if feedback.action == 'play_complete':
            session.add_positive_signal(track_id)
        elif feedback.action == 'skip':
            session.add_negative_signal(track_id)
        elif feedback.action == 'save':
            session.add_strong_positive_signal(track_id)
        
        # Update contextual bandit for this (context, arm) pair
        await self.context_model.update(
            context=session.current_context,
            arm=track_id,
            reward=self._feedback_to_reward(feedback)
        )

Contextual Bandits

Contextual bandits balance exploration (trying new recommendations) with exploitation (using known preferences). They learn online from each user interaction, quickly adapting to changing preferences—essential for real-time personalization.

Exploration vs. Exploitation

The explore-exploit dilemma is fundamental to recommendation systems. Should we recommend tracks we're confident the user will like (exploit) or introduce new music to expand their taste (explore)?

The Problem:

Pure Exploitation

•Always recommend most-likely-to-like
•High immediate satisfaction
•Echo chamber: same artists forever
•Missed opportunities for discovery
•User boredom over time

Pure Exploration

•Always recommend new/unknown music
•Maximum discovery potential
•Many irrelevant recommendations
•User frustration and churn
•Poor short-term engagement

Balancing Strategy:

Spotify's approach varies by recommendation surface:

Exploration Balance by Surface
Surface	Exploration Level	Rationale
Radio	Very Low (5-10%)	User expects familiar-sounding music
Your Top Mixes	Low (10-15%)	Based on established favorites
Daily Mix	Medium (20-30%)	Familiar + some discovery
Discover Weekly	High (40-50%)	Purpose is discovery
Release Radar	Low (10%)	New from followed artists
Made for You Home	Medium (25%)	Personalized variety

exploration-strategy.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
class ExplorationManager:
    """
    Manages exploration-exploitation trade-off across recommendations.
    """
    
    def __init__(self):
        # Base exploration rates by surface
        self.surface_exploration_rates = {
            'radio': 0.08,
            'daily_mix': 0.25,
            'discover_weekly': 0.45,
            'made_for_you': 0.25,
        }
    
    def inject_exploration(
        self,
        recommendations: List[str],
        user_profile: UserTasteProfile,
        surface: str
    ) -> List[str]:
        """
        Inject exploration tracks into recommendations.
        
        Strategy:
        1. Determine exploration rate for user and surface
        2. Select exploration tracks (not in typical taste)
        3. Inject at strategic positions
        """
        base_rate = self.surface_exploration_rates.get(surface, 0.2)
        
        # Adjust rate based on user
        adjusted_rate = self._adjust_rate_for_user(base_rate, user_profile)
        
        num_explore = int(len(recommendations) * adjusted_rate)
        
        # Select exploration tracks
        explore_tracks = self._select_exploration_tracks(
            user_profile,
            num_explore
        )
        
        # Inject at good positions (not first few, spread out)
        return self._inject_at_positions(
            recommendations,
            explore_tracks,
            start_position=3,  # Never first 3 tracks
            spacing=4  # At least 4 tracks between explorations
        )
    
    def _adjust_rate_for_user(
        self,
        base_rate: float,
        user_profile: UserTasteProfile
    ) -> float:
        """
        Personalize exploration rate.
        
        New users: More exploration to learn preferences
        Diverse listeners: More exploration (they like variety)
        Narrow listeners: Less exploration (stick to favorites)
        Recently churned from exploration surface: Less exploration
        """
        rate = base_rate
        
        # New users need more exploration
        if user_profile.profile_age_days < 30:
            rate *= 1.3
        
        # High diversity score = user likes variety
        if user_profile.diversity_score > 0.7:
            rate *= 1.2
        elif user_profile.diversity_score < 0.3:
            rate *= 0.7
        
        # Check recent exploration success
        explore_success = self._get_exploration_success_rate(user_profile.user_id)
        if explore_success > 0.3:  # User engages with explorations
            rate *= 1.1
        elif explore_success < 0.1:  # User skips most explorations
            rate *= 0.8
        
        return min(0.5, max(0.05, rate))  # Bound between 5% and 50%
    
    def _select_exploration_tracks(
        self,
        user_profile: UserTasteProfile,
        num_tracks: int
    ) -> List[str]:
        """
        Select tracks outside user's typical taste that might appeal.
        
        Strategy:
        - Adjacent genres (rock listener → try indie)
        - Breaking artists in preferred genres
        - Similar audio features, different genre
        - Social signals (friends listening)
        """
        candidates = []
        
        # Adjacent genre exploration
        adjacent_genres = self._get_adjacent_genres(user_profile.top_genres)
        genre_candidates = self.track_index.get_by_genres(
            genres=adjacent_genres,
            exclude_artists=user_profile.top_artists[:50],
            min_popularity=30  # Not too obscure for exploration
        )
        candidates.extend(genre_candidates[:20])
        
        # Audio-similar from different genres
        audio_similar = self.audio_model.find_similar_tracks(
            seed_features=user_profile.audio_preferences,
            exclude_genres=user_profile.top_genres[:5],
            num_results=20
        )
        candidates.extend(audio_similar)
        
        # Breaking artists (rising but not yet popular)
        breaking = self.trending_service.get_breaking_artists(
            genres=user_profile.top_genres,
            region=user_profile.region
        )
        for artist in breaking[:10]:
            top_track = self.artist_service.get_top_track(artist.id)
            if top_track not in user_profile.played_tracks:
                candidates.append(top_track)
        
        # Score by likelihood of success
        scored = []
        for track_id in candidates:
            score = self._exploration_success_likelihood(
                track_id,
                user_profile
            )
            scored.append((track_id, score))
        
        scored.sort(key=lambda x: x[1], reverse=True)
        return [track_id for track_id, _ in scored[:num_tracks]]

Thompson Sampling

Thompson Sampling is a mathematically elegant approach to exploration. It maintains probability distributions over the value of each option and samples from these distributions to decide what to recommend. Options with high uncertainty get explored more.

Discover Weekly: A Case Study

Discover Weekly is Spotify's flagship recommendation feature—a 30-track playlist generated fresh for every user every Monday. It reaches 40 million users and has become a cultural phenomenon. Let's examine its architecture.

Pipeline Overview:

discover-weekly-pipeline.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
class DiscoverWeeklyPipeline:
    """
    Generate personalized Discover Weekly playlists.
    
    Runs weekly as a batch job, generating playlists for all users.
    Target: Complete within 24 hours for 600M+ users.
    """
    
    def generate_all_playlists(self):
        """
        Weekly batch job to generate all Discover Weekly playlists.
        
        Architecture:
        - Distributed across 1000s of Spark executors
        - Users partitioned for parallel processing
        - Results written to playlist database
        """
        # Partition users across workers
        user_partitions = self.partition_users(partition_count=10000)
        
        # Process in parallel (Spark/Beam)
        results = self.spark.parallelize(user_partitions).flatMap(
            self.generate_for_partition
        )
        
        # Write to storage
        results.foreach(self.write_playlist)
    
    def generate_for_user(self, user_id: str) -> List[str]:
        """
        Generate 30-track Discover Weekly for a single user.
        """
        # Get user taste profile
        profile = self.user_feature_store.get_taste_profile(user_id)
        
        # Step 1: Candidate generation (cast wide net)
        candidates = self._generate_candidates(user_id, profile, num=500)
        
        # Step 2: Scoring (rank by predicted engagement)
        scored_candidates = self._score_candidates(user_id, candidates)
        
        # Step 3: Filtering (remove invalid tracks)
        valid_candidates = self._apply_filters(user_id, scored_candidates)
        
        # Step 4: Diversity optimization (ensure variety)
        diverse_tracks = self._optimize_diversity(valid_candidates, num=30)
        
        # Step 5: Ordering (flow and energy arc)
        ordered_tracks = self._optimize_order(diverse_tracks)
        
        return ordered_tracks
    
    def _generate_candidates(
        self,
        user_id: str,
        profile: UserTasteProfile,
        num: int
    ) -> List[str]:
        """
        Generate candidate tracks from multiple sources.
        """
        candidates = set()
        
        # Source 1: Collaborative filtering (users like you also liked...)
        cf_recs = self.collaborative_model.recommend(user_id, num // 3)
        candidates.update([track_id for track_id, _ in cf_recs])
        
        # Source 2: Artist-based (other tracks from artists in taste profile)
        for artist_id in profile.adjacent_artists:
            artist_tracks = self.artist_service.get_non_hit_tracks(artist_id, limit=5)
            candidates.update(artist_tracks)
        
        # Source 3: Audio similarity (tracks that sound like favorites)
        seed_tracks = profile.get_seed_tracks(50)
        audio_similar = self.audio_model.find_similar_tracks(seed_tracks, num // 3)
        candidates.update([track_id for track_id, _ in audio_similar])
        
        # Source 4: Genre exploration
        adjacent_genres = self._get_adjacent_genres(profile)
        genre_tracks = self.genre_index.sample_tracks(
            genres=adjacent_genres,
            popularity_range=(20, 80),  # Not top hits, not too obscure
            num=num // 4
        )
        candidates.update(genre_tracks)
        
        return list(candidates)
    
    def _apply_filters(
        self,
        user_id: str,
        scored_candidates: List[Tuple[str, float]]
    ) -> List[Tuple[str, float]]:
        """
        Filter out tracks that shouldn't be recommended.
        """
        # Get user's played tracks (no repeats from last 4 weeks)
        recently_played = self.history_service.get_played_tracks(
            user_id, 
            days=28
        )
        
        # Get tracks from previous Discover Weeklys (no recent repeats)
        previous_dw_tracks = self.dw_history.get_tracks(user_id, weeks=12)
        
        # Get blocked artists (explicit user blocks)
        blocked_artists = self.user_prefs.get_blocked_artists(user_id)
        
        # Get region restrictions
        user_region = self.user_service.get_region(user_id)
        
        filtered = []
        for track_id, score in scored_candidates:
            # Already played recently
            if track_id in recently_played:
                continue
            
            # Already in recent DW
            if track_id in previous_dw_tracks:
                continue
            
            # Blocked artist
            track_artists = self.track_service.get_artists(track_id)
            if any(artist in blocked_artists for artist in track_artists):
                continue
            
            # Region restriction
            if not self.licensing.is_available(track_id, user_region):
                continue
            
            # Explicit content filter (if user enabled)
            if self.user_prefs.filter_explicit(user_id):
                if self.track_service.is_explicit(track_id):
                    continue
            
            filtered.append((track_id, score))
        
        return filtered
    
    def _optimize_diversity(
        self,
        candidates: List[Tuple[str, float]],
        num: int
    ) -> List[str]:
        """
        Select diverse subset of highly-scored tracks.
        
        Balances:
        - High scores (user will like)
        - Diversity (avoid repetition)
        
        Uses Maximal Marginal Relevance (MMR).
        """
        selected = []
        remaining = list(candidates)
        
        while len(selected) < num and remaining:
            best_track = None
            best_mmr = -float('inf')
            
            for track_id, relevance in remaining:
                # Diversity: min similarity to already selected
                diversity = 1.0
                for selected_id in selected:
                    sim = self._track_similarity(track_id, selected_id)
                    diversity = min(diversity, 1 - sim)
                
                # MMR score balances relevance and diversity
                lambda_param = 0.6  # Higher = more relevance focus
                mmr = lambda_param * relevance + (1 - lambda_param) * diversity
                
                if mmr > best_mmr:
                    best_mmr = mmr
                    best_track = track_id
            
            if best_track:
                selected.append(best_track)
                remaining = [(t, s) for t, s in remaining if t != best_track]
        
        return selected

The Magic of Discover Weekly

Discover Weekly's success isn't just algorithmic—it's also about presentation. Arriving every Monday creates anticipation. The 30-track format is digestible. And the art of 'just different enough' balances familiar comfort with exploratory discovery.

Recommendation System Summary

We've covered the complete recommendation system architecture. Let's consolidate the key design decisions:

Recommendation Architecture Summary
Component	Approach	Scale/Performance
Collaborative Filtering	Matrix factorization + ANN	615M users, 100M tracks, weekly retraining
Content-Based	Audio features + embeddings	100ms inference, FAISS for similarity
Feature Store	Online (Redis) + Offline (Hive)	Sub-10ms lookup, petabytes offline
Real-Time	Session-based contextual bandits	Updates per interaction
Diversity	MMR for playlist optimization	Balance familiarity and discovery
Exploration	Adaptive rates per surface	5-50% based on context

Key Takeaways

•Hybrid approaches combine collaborative, content-based, and contextual signals for best results.
•Matrix factorization with ANN enables efficient similarity at billion-user scale.
•Audio feature analysis solves cold start and enables 'sounds like' recommendations.
•Feature engineering often matters more than model architecture.
•Real-time personalization adapts to session context within interactions.
•Exploration-exploitation balance varies by surface and user profile.
•Diversity optimization (MMR) ensures recommendations aren't too similar.

What's next:

With recommendations covered, we'll move to Offline Mode—how to enable users to download and play music without network connectivity while respecting DRM requirements.

Page Complete

You now understand how to architect a recommendation system at scale: from collaborative filtering and audio analysis, through ML pipelines and feature stores, to real-time personalization and exploration strategies. This is the intelligence that transforms a music library into a personalized discovery engine.

4 / 6

Loading learning content...

System DesignSpotify Music Streaming

Designing Spotify: Music Streaming at Scale

LevelAdvanced

Duration180 mins

TopicSpotify Music Streaming

4 / 6

Recommendation System

The Art of Musical Discovery

This page explores how to architect a recommendation system at this scale.

What You Will Learn

Recommendation Fundamentals

Before diving into architecture, let's understand the core approaches to recommendation. Each has distinct strengths and limitations.

The Three Pillars of Recommendation:

Recommendation Approaches
Approach	How It Works	Strengths	Weaknesses
Collaborative Filtering	Users who liked X also liked Y	Discovers unexpected connections	Cold start problem for new users/items
Content-Based	This song sounds like that song	Works for new items immediately	Limited to similar content, no serendipity
Knowledge-Based	Rules: 'If user likes rock, suggest classic rock'	Explainable, controllable	Doesn't scale, misses nuance

Spotify's Hybrid Approach:

Modern recommendation systems combine all three approaches. Spotify famously uses a hybrid model that leverages:

Recommendation Signals

•Listening history — What users play, skip, repeat, and save. The most direct signal of preference.
•Collaborative signals — Users with similar taste histories tend to have similar future preferences.
•Audio features — Acoustic analysis of tracks: tempo, key, danceability, energy, valence (mood).
•Cultural context — Blogs, reviews, social media mentions of artists and songs.
•Temporal patterns — Time of day, day of week, seasonal preferences.
•Context signals — Device type, location, activity (workout, focus, party).

hybrid-recommendation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
class HybridRecommendationEngine:
    """
    Combines multiple recommendation approaches for optimal results.
    """
    
    def __init__(self):
        self.collaborative_model = CollaborativeFilteringModel()
        self.content_model = ContentBasedModel()
        self.audio_model = AudioFeatureModel()
        self.contextual_model = ContextualModel()
    
    def generate_recommendations(
        self,
        user_id: str,
        context: RecommendationContext,
        num_tracks: int = 30
    ) -> List[RecommendedTrack]:
        """
        Generate recommendations using ensemble of models.
        
        Each model produces candidate tracks with scores.
        Final ranking blends all scores based on learned weights.
        """
        # Get user's listening profile
        user_profile = self.user_service.get_taste_profile(user_id)
        
        # Generate candidates from each model
        candidates = {}
        
        # Collaborative: "Users like you also liked..."
        collab_recs = self.collaborative_model.recommend(
            user_id, 
            num_candidates=100
        )
        for track_id, score in collab_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['collaborative'] = score
        
        # Content-based: "Similar to what you listen to..."
        content_recs = self.content_model.recommend(
            user_profile.top_tracks,
            num_candidates=100
        )
        for track_id, score in content_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['content'] = score
        
        # Audio features: "Sounds like your taste..."
        audio_recs = self.audio_model.recommend(
            user_profile.audio_preferences,
            num_candidates=100
        )
        for track_id, score in audio_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['audio'] = score
        
        # Context-aware: "Right for this moment..."
        context_recs = self.contextual_model.recommend(
            user_id,
            context,
            num_candidates=50
        )
        for track_id, score in context_recs:
            candidates[track_id] = candidates.get(track_id, {})
            candidates[track_id]['context'] = score
        
        # Blend scores using learned weights
        final_scores = self._blend_scores(candidates, user_profile, context)
        
        # Apply diversity filter to avoid too-similar tracks
        diverse_tracks = self._ensure_diversity(final_scores, num_tracks)
        
        return diverse_tracks
    
    def _blend_scores(
        self,
        candidates: Dict[str, Dict[str, float]],
        user_profile: UserProfile,
        context: RecommendationContext
    ) -> List[Tuple[str, float]]:
        """
        Combine scores from different models.
        
        Weights can be:
        - Static (manually tuned)
        - User-specific (learned from feedback)
        - Context-specific (e.g., more audio-based for new user)
        """
        weights = self._get_blending_weights(user_profile, context)
        
        final_scores = []
        for track_id, scores in candidates.items():
            combined = (
                scores.get('collaborative', 0) * weights['collaborative'] +
                scores.get('content', 0) * weights['content'] +
                scores.get('audio', 0) * weights['audio'] +
                scores.get('context', 0) * weights['context']
            )
            final_scores.append((track_id, combined))
        
        return sorted(final_scores, key=lambda x: x[1], reverse=True)

The Cold Start Problem

Collaborative Filtering at Scale

Matrix Factorization Approach:

We model the user-track interaction as a sparse matrix and factorize it into user and track embeddings:

matrix-factorization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
class MatrixFactorizationModel:
    """
    Collaborative filtering using matrix factorization.
    
    User-item interaction matrix R ≈ U × V^T
    
    Where:
    - R is the (users × tracks) interaction matrix
    - U is (users × k) user embedding matrix  
    - V is (tracks × k) track embedding matrix
    - k is the embedding dimension (typically 100-300)
    
    For any (user, track) pair, predicted affinity is:
    score = dot_product(user_embedding, track_embedding)
    """
    
    def __init__(self, embedding_dim=256):
        self.embedding_dim = embedding_dim
        self.user_embeddings = {}  # user_id -> vector
        self.track_embeddings = {}  # track_id -> vector
    
    def train(self, interactions: List[Interaction]):
        """
        Train model on user-track interactions.
        
        Interactions include:
        - Plays (weighted by duration / track length)
        - Saves (strong positive signal)
        - Skips (negative signal)
        - Repeats (very strong positive)
        
        Uses implicit feedback (not explicit ratings).
        """
        # Build interaction matrix
        user_ids = list(set(i.user_id for i in interactions))
        track_ids = list(set(i.track_id for i in interactions))
        
        user_to_idx = {uid: i for i, uid in enumerate(user_ids)}
        track_to_idx = {tid: i for i, tid in enumerate(track_ids)}
        
        # Confidence-weighted matrix for implicit feedback
        # Higher play counts = higher confidence, not higher rating
        R = scipy.sparse.lil_matrix((len(user_ids), len(track_ids)))
        
        for interaction in interactions:
            u = user_to_idx[interaction.user_id]
            t = track_to_idx[interaction.track_id]
            R[u, t] = self._compute_interaction_weight(interaction)
        
        # Alternative Least Squares (ALS) for training
        # Scales well for distributed computation (Spark MLlib)
        model = implicit.als.AlternatingLeastSquares(
            factors=self.embedding_dim,
            regularization=0.01,
            iterations=15
        )
        model.fit(R.tocsr())
        
        # Store embeddings
        for user_id, idx in user_to_idx.items():
            self.user_embeddings[user_id] = model.user_factors[idx]
        
        for track_id, idx in track_to_idx.items():
            self.track_embeddings[track_id] = model.item_factors[idx]
    
    def recommend(self, user_id: str, num_candidates: int) -> List[Tuple[str, float]]:
        """
        Recommend tracks for user based on embedding similarity.
        
        For a user, score all tracks by dot product with user embedding.
        Return top N that user hasn't already interacted with.
        """
        if user_id not in self.user_embeddings:
            return []  # Cold start - defer to other models
        
        user_vec = self.user_embeddings[user_id]
        
        # Score all tracks
        scores = []
        for track_id, track_vec in self.track_embeddings.items():
            score = np.dot(user_vec, track_vec)
            scores.append((track_id, score))
        
        # Filter already-heard tracks and sort
        heard_tracks = self._get_user_history(user_id)
        candidates = [
            (tid, score) for tid, score in scores 
            if tid not in heard_tracks
        ]
        candidates.sort(key=lambda x: x[1], reverse=True)
        
        return candidates[:num_candidates]
    
    def _compute_interaction_weight(self, interaction: Interaction) -> float:
        """
        Convert interaction to confidence weight.
        
        Not all plays are equal:
        - Full listen > partial listen
        - Saved tracks get bonus
        - Skipped tracks get penalty
        - Recent interactions weighted higher
        """
        base_weight = 1.0
        
        # Duration factor (what % of track was played)
        duration_factor = min(1.0, interaction.listened_ms / interaction.track_duration_ms)
        
        # Action bonuses
        if interaction.saved:
            base_weight += 2.0
        if interaction.repeated:
            base_weight += 1.5
        if interaction.skipped and interaction.listened_ms < 30000:  # Skip in first 30s
            base_weight = -0.5
        
        # Time decay (recent interactions matter more)
        days_ago = (datetime.now() - interaction.timestamp).days
        time_decay = math.exp(-days_ago / 90)  # ~3 month half-life
        
        return base_weight * duration_factor * time_decay

Scaling to Handle 615 Million Users:

Matrix factorization at this scale requires distributed computing. With 615M users and 100M tracks, even storing embeddings requires careful design:

Embedding Storage Scale
Component	Size Calculation	Storage Required
User embeddings	615M users × 256 dims × 4 bytes	~630 GB
Track embeddings	100M tracks × 256 dims × 4 bytes	~100 GB
Index for ANN search	~2x raw embeddings	~1.4 TB
Total		~2.2 TB

Approximate Nearest Neighbors

Audio Feature Analysis

Content-based recommendation uses the actual audio content to find similar tracks. This solves the cold start problem—new tracks can be recommended immediately based on how they sound.

Audio Feature Extraction:

Spotify extracts a rich set of audio features from every track:

Audio Features Extracted
Feature	Range	Description	Use in Recommendations
Tempo (BPM)	50-200	Beats per minute	Workout playlists, energy matching
Key	0-11 (C to B)	Musical key	Harmonic mixing, DJ features
Mode	0-1	Major (1) or Minor (0)	Mood classification
Energy	0-1	Intensity and activity	Activity-based recommendations
Danceability	0-1	Suitability for dancing	Party/dance playlists
Valence	0-1	Musical positiveness/happiness	Mood matching
Acousticness	0-1	Likelihood of being acoustic	Genre classification
Instrumentalness	0-1	Predicts lack of vocals	Focus/study playlists
Speechiness	0-1	Presence of spoken words	Distinguish podcast/rap/instrumental
Loudness	-60 to 0 dB	Overall loudness	Normalization, energy

audio-analysis-pipeline.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
class AudioAnalysisPipeline:
    """
    Extract audio features from tracks for content-based recommendations.
    
    Uses deep learning models trained on:
    - Mel spectrograms for genre/mood classification
    - Beat tracking for tempo/rhythm
    - Harmonic analysis for key/mode
    """
    
    def __init__(self):
        # Pre-trained models
        self.genre_model = load_model("genre_classifier_v3")
        self.mood_model = load_model("mood_classifier_v2")
        self.audio_embedding_model = load_model("audio_embedder_v4")
    
    async def analyze_track(self, track_path: str) -> AudioFeatures:
        """
        Full audio analysis of a track.
        
        Pipeline:
        1. Load audio as waveform
        2. Extract low-level features (spectrogram, MFCC)
        3. Run feature extractors (tempo, key, rhythm)
        4. Run deep learning classifiers (genre, mood)
        5. Generate audio embedding for similarity search
        """
        # Load audio
        waveform, sample_rate = librosa.load(track_path, sr=22050)
        
        # Low-level feature extraction
        spectogram = librosa.feature.melspectrogram(y=waveform, sr=sample_rate)
        mfcc = librosa.feature.mfcc(y=waveform, sr=sample_rate, n_mfcc=13)
        
        # Rhythm analysis
        tempo, beat_frames = librosa.beat.beat_track(y=waveform, sr=sample_rate)
        
        # Harmonic analysis
        chroma = librosa.feature.chroma_cqt(y=waveform, sr=sample_rate)
        key, mode = self._estimate_key_mode(chroma)
        
        # Deep learning features
        genre_probs = self.genre_model.predict(spectogram)
        mood_values = self.mood_model.predict(spectogram)
        
        # Audio embedding for similarity search (128-dim vector)
        audio_embedding = self.audio_embedding_model.encode(spectogram)
        
        return AudioFeatures(
            tempo=tempo,
            key=key,
            mode=mode,
            energy=self._compute_energy(waveform),
            danceability=self._compute_danceability(beat_frames, tempo),
            valence=mood_values['valence'],
            genres=genre_probs,
            embedding=audio_embedding
        )
    
    def find_similar_tracks(
        self,
        seed_track_ids: List[str],
        num_results: int,
        exclude_ids: Set[str] = None
    ) -> List[Tuple[str, float]]:
        """
        Find tracks similar to seeds based on audio features.
        
        Combines:
        1. Audio embedding similarity (primary)
        2. Feature matching (secondary filter)
        """
        # Get seed embeddings
        seed_embeddings = [
            self.embedding_store.get(tid) for tid in seed_track_ids
        ]
        seed_centroid = np.mean(seed_embeddings, axis=0)
        
        # ANN search for similar embeddings
        similar_ids, distances = self.ann_index.search(
            seed_centroid,
            num_results * 3  # Over-fetch for filtering
        )
        
        # Filter by feature compatibility
        seed_features = self._aggregate_features(seed_track_ids)
        
        filtered = []
        for track_id, distance in zip(similar_ids, distances):
            if exclude_ids and track_id in exclude_ids:
                continue
            if track_id in seed_track_ids:
                continue
            
            track_features = self.feature_store.get(track_id)
            if self._features_compatible(seed_features, track_features):
                similarity = 1 / (1 + distance)  # Convert distance to similarity
                filtered.append((track_id, similarity))
        
        return filtered[:num_results]
    
    def _features_compatible(
        self,
        seed: AggregateFeatures,
        candidate: AudioFeatures
    ) -> bool:
        """
        Check if candidate track's features are compatible with seed.
        
        Allows some variance but filters extreme mismatches.
        """
        # Tempo: ±20 BPM or half/double time
        if not self._tempo_compatible(seed.avg_tempo, candidate.tempo):
            return False
        
        # Energy: within 0.3 range
        if abs(seed.avg_energy - candidate.energy) > 0.3:
            return False
        
        return True

Audio Fingerprinting vs. Feature Extraction

ML Pipeline Architecture

Pipeline Overview:

ml-pipeline-architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
┌─────────────────────────────────────────────────────────────────────┐
│                    ML PIPELINE ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    DATA COLLECTION                            │   │
│  │  Event Streaming (Kafka/Pub-Sub)                              │   │
│  │  • Play events (10B+/day)                                     │   │
│  │  • Skip/save/repeat events                                    │   │
│  │  • Session context (device, time, playlist)                   │   │
│  │  • Search queries                                             │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    DATA LAKE                                  │   │
│  │  • Raw event storage (S3/GCS)                                 │   │
│  │  • Parquet format for efficient columnar access               │   │
│  │  • Partitioned by date for time-series queries                │   │
│  │  • Retention: years of historical data                        │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    FEATURE ENGINEERING                        │   │
│  │  Batch Processing (Spark/Beam)                                │   │
│  │  • User taste profiles                                        │   │
│  │  • Track popularity metrics                                   │   │
│  │  • Interaction matrices                                       │   │
│  │  • Temporal patterns                                          │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    FEATURE STORE                              │   │
│  │  • Online: Redis/DynamoDB (real-time serving)                 │   │
│  │  • Offline: Hive/BigQuery (training)                          │   │
│  │  • Unified feature definitions                                │   │
│  │  • Point-in-time correctness for training                     │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│            ┌───────────────────┴───────────────────┐                │
│            ▼                                       ▼                │
│  ┌──────────────────────┐            ┌──────────────────────────┐   │
│  │   MODEL TRAINING     │            │   REAL-TIME INFERENCE    │   │
│  │   (Batch)            │            │   (Online)               │   │
│  │                      │            │                          │   │
│  │   • Spark ML/TF/PyTorch         │   • Feature lookup         │   │
│  │   • GPU clusters                │   • Model serving (TF Serving) │
│  │   • Weekly retraining           │   • <50ms latency           │   │
│  │   • A/B test variants           │   • Scale: 1M+ QPS          │   │
│  └──────────────────────┘            └──────────────────────────┘   │
│            │                                       │                │
│            ▼                                       ▼                │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    MODEL REGISTRY                             │   │
│  │  • Versioned model storage                                    │   │
│  │  • Deployment orchestration                                   │   │
│  │  • Rollback capability                                        │   │
│  │  • A/B experiment configuration                               │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Feature Engineering:

Feature engineering transforms raw events into model inputs. Quality features often matter more than model architecture.

feature-engineering.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
class UserFeatureEngine:
    """
    Compute user features for recommendation models.
    """
    
    def compute_taste_profile(self, user_id: str) -> UserTasteProfile:
        """
        Build a comprehensive taste profile from user's listening history.
        """
        # Get listening history (last 90 days)
        history = self.event_store.get_user_plays(
            user_id,
            days=90,
            limit=10000
        )
        
        # Genre affinity - weighted by play count and recency
        genre_scores = defaultdict(float)
        for event in history:
            track_genres = self.track_metadata.get_genres(event.track_id)
            recency_weight = self._recency_weight(event.timestamp)
            for genre, confidence in track_genres:
                genre_scores[genre] += confidence * recency_weight
        
        # Normalize genre scores to distribution
        total = sum(genre_scores.values())
        genre_distribution = {g: s/total for g, s in genre_scores.items()}
        
        # Artist affinity
        artist_scores = defaultdict(float)
        for event in history:
            artists = self.track_metadata.get_artists(event.track_id)
            recency_weight = self._recency_weight(event.timestamp)
            for artist_id in artists:
                artist_scores[artist_id] += recency_weight
        
        # Audio preference aggregates
        audio_preferences = self._compute_audio_preferences(history)
        
        # Listening patterns
        patterns = self._compute_temporal_patterns(history)
        
        return UserTasteProfile(
            user_id=user_id,
            genre_distribution=genre_distribution,
            top_artists=self._top_n(artist_scores, 100),
            audio_preferences=audio_preferences,
            listening_patterns=patterns,
            profile_age_days=self._profile_age(history),
            diversity_score=self._compute_diversity(genre_distribution),
            updated_at=datetime.utcnow()
        )
    
    def _compute_audio_preferences(self, history: List[PlayEvent]) -> AudioPreferences:
        """
        Aggregate audio features across listening history.
        
        Returns preferred ranges/distributions for each feature.
        """
        features = []
        weights = []
        
        for event in history:
            track_features = self.audio_feature_store.get(event.track_id)
            if track_features:
                features.append(track_features)
                # Weight by engagement (full listens > skips)
                weights.append(self._engagement_weight(event))
        
        if not features:
            return AudioPreferences.default()
        
        # Weighted aggregation
        return AudioPreferences(
            tempo_range=(
                np.average([f.tempo for f in features], weights=weights) - 20,
                np.average([f.tempo for f in features], weights=weights) + 20
            ),
            energy_range=(
                max(0, np.average([f.energy for f in features], weights=weights) - 0.2),
                min(1, np.average([f.energy for f in features], weights=weights) + 0.2)
            ),
            valence_mean=np.average([f.valence for f in features], weights=weights),
            danceability_mean=np.average([f.danceability for f in features], weights=weights),
            # Distribution captures preference for acoustic vs electronic, etc.
            acousticness_dist=self._compute_distribution([f.acousticness for f in features], weights)
        )
    
    def _compute_temporal_patterns(self, history: List[PlayEvent]) -> ListeningPatterns:
        """
        Extract temporal listening patterns.
        
        When does user listen? What do they listen to at different times?
        """
        hour_counts = np.zeros(24)
        weekday_counts = np.zeros(7)
        morning_genres = defaultdict(float)
        evening_genres = defaultdict(float)
        
        for event in history:
            hour = event.timestamp.hour
            weekday = event.timestamp.weekday()
            hour_counts[hour] += 1
            weekday_counts[weekday] += 1
            
            # Genre by time of day
            genres = self.track_metadata.get_genres(event.track_id)
            if 5 <= hour < 12:
                for genre, conf in genres:
                    morning_genres[genre] += conf
            elif 18 <= hour < 24:
                for genre, conf in genres:
                    evening_genres[genre] += conf
        
        return ListeningPatterns(
            hourly_distribution=hour_counts / hour_counts.sum(),
            weekday_distribution=weekday_counts / weekday_counts.sum(),
            peak_hours=np.argsort(hour_counts)[-3:].tolist(),
            morning_genre_preference=self._normalize_dict(morning_genres),
            evening_genre_preference=self._normalize_dict(evening_genres)
        )

Feature Store Importance

Real-Time Personalization

Batch-trained models capture long-term preferences, but user mood and context change throughout the day. Real-time personalization adapts recommendations to the current listening session.

Session-Based Adaptation:

realtime-personalization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
class RealTimePersonalizer:
    """
    Adapts recommendations based on current session context.
    
    Key insight: What a user wants right now may differ from
    their long-term average preferences.
    """
    
    def __init__(self):
        self.session_cache = SessionCache()  # Recent session state
        self.context_model = ContextualBandit()  # Real-time learning
    
    async def personalize_recommendations(
        self,
        user_id: str,
        base_recommendations: List[str],  # From batch models
        context: StreamingContext
    ) -> List[str]:
        """
        Re-rank base recommendations based on real-time signals.
        """
        # Get current session state
        session = await self.session_cache.get_or_create(user_id)
        
        # Recent session signals
        recent_plays = session.last_n_tracks(10)
        recent_skips = session.last_n_skips(5)
        session_mood = self._infer_session_mood(recent_plays)
        
        # Re-score each recommendation
        scored_recs = []
        for track_id in base_recommendations:
            track_features = await self.feature_store.get_track(track_id)
            
            # Session compatibility score
            session_score = self._compute_session_score(
                track_features,
                session_mood,
                recent_plays,
                recent_skips
            )
            
            # Context score (time of day, device, activity)
            context_score = self._compute_context_score(
                track_features,
                context
            )
            
            # Exploration bonus for diversity
            exploration_bonus = self._exploration_score(
                track_features,
                session
            )
            
            final_score = (
                0.5 * session_score +
                0.3 * context_score +
                0.2 * exploration_bonus
            )
            
            scored_recs.append((track_id, final_score))
        
        # Sort by final score
        scored_recs.sort(key=lambda x: x[1], reverse=True)
        
        return [track_id for track_id, _ in scored_recs]
    
    def _infer_session_mood(self, recent_plays: List[TrackPlay]) -> SessionMood:
        """
        Infer current mood from recent listening.
        
        If user is listening to high-energy tracks, recommend more energy.
        If mellow, stay mellow.
        """
        if not recent_plays:
            return SessionMood.neutral()
        
        features = [
            self.feature_store.get_track_features(play.track_id)
            for play in recent_plays
        ]
        
        avg_energy = np.mean([f.energy for f in features])
        avg_valence = np.mean([f.valence for f in features])
        avg_danceability = np.mean([f.danceability for f in features])
        
        return SessionMood(
            energy_level=avg_energy,
            happiness_level=avg_valence,
            activity_level=avg_danceability
        )
    
    def _compute_session_score(
        self,
        track_features: TrackFeatures,
        session_mood: SessionMood,
        recent_plays: List[TrackPlay],
        recent_skips: List[str]
    ) -> float:
        """
        Score track compatibility with current session.
        """
        score = 1.0
        
        # Match current energy level
        energy_diff = abs(track_features.energy - session_mood.energy_level)
        score -= energy_diff * 0.5
        
        # Penalize if similar track was just skipped
        skip_track_ids = {skip for skip in recent_skips}
        if self._is_similar_to_any(track_features.track_id, skip_track_ids):
            score -= 0.3
        
        # Boost if artist was recently played (but not too recently)
        recent_artists = {play.artist_id for play in recent_plays[3:]}
        if track_features.artist_id in recent_artists:
            score += 0.1
        
        return max(0, score)
    
    async def on_track_feedback(
        self,
        user_id: str,
        track_id: str,
        feedback: TrackFeedback
    ):
        """
        Update session state and models based on feedback.
        
        This enables learning within a session.
        """
        session = await self.session_cache.get(user_id)
        
        if feedback.action == 'play_complete':
            session.add_positive_signal(track_id)
        elif feedback.action == 'skip':
            session.add_negative_signal(track_id)
        elif feedback.action == 'save':
            session.add_strong_positive_signal(track_id)
        
        # Update contextual bandit for this (context, arm) pair
        await self.context_model.update(
            context=session.current_context,
            arm=track_id,
            reward=self._feedback_to_reward(feedback)
        )

Contextual Bandits

Exploration vs. Exploitation

The explore-exploit dilemma is fundamental to recommendation systems. Should we recommend tracks we're confident the user will like (exploit) or introduce new music to expand their taste (explore)?

The Problem:

Pure Exploitation

•Always recommend most-likely-to-like
•High immediate satisfaction
•Echo chamber: same artists forever
•Missed opportunities for discovery
•User boredom over time

Pure Exploration

•Always recommend new/unknown music
•Maximum discovery potential
•Many irrelevant recommendations
•User frustration and churn
•Poor short-term engagement

Balancing Strategy:

Spotify's approach varies by recommendation surface:

Exploration Balance by Surface
Surface	Exploration Level	Rationale
Radio	Very Low (5-10%)	User expects familiar-sounding music
Your Top Mixes	Low (10-15%)	Based on established favorites
Daily Mix	Medium (20-30%)	Familiar + some discovery
Discover Weekly	High (40-50%)	Purpose is discovery
Release Radar	Low (10%)	New from followed artists
Made for You Home	Medium (25%)	Personalized variety

exploration-strategy.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
class ExplorationManager:
    """
    Manages exploration-exploitation trade-off across recommendations.
    """
    
    def __init__(self):
        # Base exploration rates by surface
        self.surface_exploration_rates = {
            'radio': 0.08,
            'daily_mix': 0.25,
            'discover_weekly': 0.45,
            'made_for_you': 0.25,
        }
    
    def inject_exploration(
        self,
        recommendations: List[str],
        user_profile: UserTasteProfile,
        surface: str
    ) -> List[str]:
        """
        Inject exploration tracks into recommendations.
        
        Strategy:
        1. Determine exploration rate for user and surface
        2. Select exploration tracks (not in typical taste)
        3. Inject at strategic positions
        """
        base_rate = self.surface_exploration_rates.get(surface, 0.2)
        
        # Adjust rate based on user
        adjusted_rate = self._adjust_rate_for_user(base_rate, user_profile)
        
        num_explore = int(len(recommendations) * adjusted_rate)
        
        # Select exploration tracks
        explore_tracks = self._select_exploration_tracks(
            user_profile,
            num_explore
        )
        
        # Inject at good positions (not first few, spread out)
        return self._inject_at_positions(
            recommendations,
            explore_tracks,
            start_position=3,  # Never first 3 tracks
            spacing=4  # At least 4 tracks between explorations
        )
    
    def _adjust_rate_for_user(
        self,
        base_rate: float,
        user_profile: UserTasteProfile
    ) -> float:
        """
        Personalize exploration rate.
        
        New users: More exploration to learn preferences
        Diverse listeners: More exploration (they like variety)
        Narrow listeners: Less exploration (stick to favorites)
        Recently churned from exploration surface: Less exploration
        """
        rate = base_rate
        
        # New users need more exploration
        if user_profile.profile_age_days < 30:
            rate *= 1.3
        
        # High diversity score = user likes variety
        if user_profile.diversity_score > 0.7:
            rate *= 1.2
        elif user_profile.diversity_score < 0.3:
            rate *= 0.7
        
        # Check recent exploration success
        explore_success = self._get_exploration_success_rate(user_profile.user_id)
        if explore_success > 0.3:  # User engages with explorations
            rate *= 1.1
        elif explore_success < 0.1:  # User skips most explorations
            rate *= 0.8
        
        return min(0.5, max(0.05, rate))  # Bound between 5% and 50%
    
    def _select_exploration_tracks(
        self,
        user_profile: UserTasteProfile,
        num_tracks: int
    ) -> List[str]:
        """
        Select tracks outside user's typical taste that might appeal.
        
        Strategy:
        - Adjacent genres (rock listener → try indie)
        - Breaking artists in preferred genres
        - Similar audio features, different genre
        - Social signals (friends listening)
        """
        candidates = []
        
        # Adjacent genre exploration
        adjacent_genres = self._get_adjacent_genres(user_profile.top_genres)
        genre_candidates = self.track_index.get_by_genres(
            genres=adjacent_genres,
            exclude_artists=user_profile.top_artists[:50],
            min_popularity=30  # Not too obscure for exploration
        )
        candidates.extend(genre_candidates[:20])
        
        # Audio-similar from different genres
        audio_similar = self.audio_model.find_similar_tracks(
            seed_features=user_profile.audio_preferences,
            exclude_genres=user_profile.top_genres[:5],
            num_results=20
        )
        candidates.extend(audio_similar)
        
        # Breaking artists (rising but not yet popular)
        breaking = self.trending_service.get_breaking_artists(
            genres=user_profile.top_genres,
            region=user_profile.region
        )
        for artist in breaking[:10]:
            top_track = self.artist_service.get_top_track(artist.id)
            if top_track not in user_profile.played_tracks:
                candidates.append(top_track)
        
        # Score by likelihood of success
        scored = []
        for track_id in candidates:
            score = self._exploration_success_likelihood(
                track_id,
                user_profile
            )
            scored.append((track_id, score))
        
        scored.sort(key=lambda x: x[1], reverse=True)
        return [track_id for track_id, _ in scored[:num_tracks]]

Thompson Sampling

Discover Weekly: A Case Study

Pipeline Overview:

discover-weekly-pipeline.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
class DiscoverWeeklyPipeline:
    """
    Generate personalized Discover Weekly playlists.
    
    Runs weekly as a batch job, generating playlists for all users.
    Target: Complete within 24 hours for 600M+ users.
    """
    
    def generate_all_playlists(self):
        """
        Weekly batch job to generate all Discover Weekly playlists.
        
        Architecture:
        - Distributed across 1000s of Spark executors
        - Users partitioned for parallel processing
        - Results written to playlist database
        """
        # Partition users across workers
        user_partitions = self.partition_users(partition_count=10000)
        
        # Process in parallel (Spark/Beam)
        results = self.spark.parallelize(user_partitions).flatMap(
            self.generate_for_partition
        )
        
        # Write to storage
        results.foreach(self.write_playlist)
    
    def generate_for_user(self, user_id: str) -> List[str]:
        """
        Generate 30-track Discover Weekly for a single user.
        """
        # Get user taste profile
        profile = self.user_feature_store.get_taste_profile(user_id)
        
        # Step 1: Candidate generation (cast wide net)
        candidates = self._generate_candidates(user_id, profile, num=500)
        
        # Step 2: Scoring (rank by predicted engagement)
        scored_candidates = self._score_candidates(user_id, candidates)
        
        # Step 3: Filtering (remove invalid tracks)
        valid_candidates = self._apply_filters(user_id, scored_candidates)
        
        # Step 4: Diversity optimization (ensure variety)
        diverse_tracks = self._optimize_diversity(valid_candidates, num=30)
        
        # Step 5: Ordering (flow and energy arc)
        ordered_tracks = self._optimize_order(diverse_tracks)
        
        return ordered_tracks
    
    def _generate_candidates(
        self,
        user_id: str,
        profile: UserTasteProfile,
        num: int
    ) -> List[str]:
        """
        Generate candidate tracks from multiple sources.
        """
        candidates = set()
        
        # Source 1: Collaborative filtering (users like you also liked...)
        cf_recs = self.collaborative_model.recommend(user_id, num // 3)
        candidates.update([track_id for track_id, _ in cf_recs])
        
        # Source 2: Artist-based (other tracks from artists in taste profile)
        for artist_id in profile.adjacent_artists:
            artist_tracks = self.artist_service.get_non_hit_tracks(artist_id, limit=5)
            candidates.update(artist_tracks)
        
        # Source 3: Audio similarity (tracks that sound like favorites)
        seed_tracks = profile.get_seed_tracks(50)
        audio_similar = self.audio_model.find_similar_tracks(seed_tracks, num // 3)
        candidates.update([track_id for track_id, _ in audio_similar])
        
        # Source 4: Genre exploration
        adjacent_genres = self._get_adjacent_genres(profile)
        genre_tracks = self.genre_index.sample_tracks(
            genres=adjacent_genres,
            popularity_range=(20, 80),  # Not top hits, not too obscure
            num=num // 4
        )
        candidates.update(genre_tracks)
        
        return list(candidates)
    
    def _apply_filters(
        self,
        user_id: str,
        scored_candidates: List[Tuple[str, float]]
    ) -> List[Tuple[str, float]]:
        """
        Filter out tracks that shouldn't be recommended.
        """
        # Get user's played tracks (no repeats from last 4 weeks)
        recently_played = self.history_service.get_played_tracks(
            user_id, 
            days=28
        )
        
        # Get tracks from previous Discover Weeklys (no recent repeats)
        previous_dw_tracks = self.dw_history.get_tracks(user_id, weeks=12)
        
        # Get blocked artists (explicit user blocks)
        blocked_artists = self.user_prefs.get_blocked_artists(user_id)
        
        # Get region restrictions
        user_region = self.user_service.get_region(user_id)
        
        filtered = []
        for track_id, score in scored_candidates:
            # Already played recently
            if track_id in recently_played:
                continue
            
            # Already in recent DW
            if track_id in previous_dw_tracks:
                continue
            
            # Blocked artist
            track_artists = self.track_service.get_artists(track_id)
            if any(artist in blocked_artists for artist in track_artists):
                continue
            
            # Region restriction
            if not self.licensing.is_available(track_id, user_region):
                continue
            
            # Explicit content filter (if user enabled)
            if self.user_prefs.filter_explicit(user_id):
                if self.track_service.is_explicit(track_id):
                    continue
            
            filtered.append((track_id, score))
        
        return filtered
    
    def _optimize_diversity(
        self,
        candidates: List[Tuple[str, float]],
        num: int
    ) -> List[str]:
        """
        Select diverse subset of highly-scored tracks.
        
        Balances:
        - High scores (user will like)
        - Diversity (avoid repetition)
        
        Uses Maximal Marginal Relevance (MMR).
        """
        selected = []
        remaining = list(candidates)
        
        while len(selected) < num and remaining:
            best_track = None
            best_mmr = -float('inf')
            
            for track_id, relevance in remaining:
                # Diversity: min similarity to already selected
                diversity = 1.0
                for selected_id in selected:
                    sim = self._track_similarity(track_id, selected_id)
                    diversity = min(diversity, 1 - sim)
                
                # MMR score balances relevance and diversity
                lambda_param = 0.6  # Higher = more relevance focus
                mmr = lambda_param * relevance + (1 - lambda_param) * diversity
                
                if mmr > best_mmr:
                    best_mmr = mmr
                    best_track = track_id
            
            if best_track:
                selected.append(best_track)
                remaining = [(t, s) for t, s in remaining if t != best_track]
        
        return selected

The Magic of Discover Weekly

Recommendation System Summary

We've covered the complete recommendation system architecture. Let's consolidate the key design decisions:

Recommendation Architecture Summary
Component	Approach	Scale/Performance
Collaborative Filtering	Matrix factorization + ANN	615M users, 100M tracks, weekly retraining
Content-Based	Audio features + embeddings	100ms inference, FAISS for similarity
Feature Store	Online (Redis) + Offline (Hive)	Sub-10ms lookup, petabytes offline
Real-Time	Session-based contextual bandits	Updates per interaction
Diversity	MMR for playlist optimization	Balance familiarity and discovery
Exploration	Adaptive rates per surface	5-50% based on context

Key Takeaways

•Hybrid approaches combine collaborative, content-based, and contextual signals for best results.
•Matrix factorization with ANN enables efficient similarity at billion-user scale.
•Audio feature analysis solves cold start and enables 'sounds like' recommendations.
•Feature engineering often matters more than model architecture.
•Real-time personalization adapts to session context within interactions.
•Exploration-exploitation balance varies by surface and user profile.
•Diversity optimization (MMR) ensures recommendations aren't too similar.

What's next:

With recommendations covered, we'll move to Offline Mode—how to enable users to download and play music without network connectivity while respecting DRM requirements.

Page Complete

4 / 6