Loading content...
In 2006, Netflix offered a $1 million prize to anyone who could improve their recommendation algorithm by 10%. The winning solution, achieved in 2009, relied heavily on Singular Value Decomposition (SVD) and its extensions. This established matrix factorization as the gold standard for collaborative filtering.
SVD provides the mathematical foundation for understanding latent factor models. While pure SVD cannot handle missing data directly, the techniques inspired by it—often called 'SVD-style' or 'Funk SVD'—became the backbone of modern recommendation systems. The extension SVD++ further improved predictions by incorporating implicit feedback signals.
This page covers classical SVD and its relationship to matrix factorization, the practical 'Funk SVD' approach for sparse matrices, SVD++ which incorporates implicit feedback, and the mathematical intuition connecting these techniques.
Singular Value Decomposition is a fundamental matrix factorization from linear algebra. For any m × n matrix R:
R = U × Σ × V^T
Where:
Low-rank approximation:
The power of SVD lies in the Eckart-Young theorem: the best rank-k approximation to R (minimizing Frobenius norm) is obtained by keeping only the top k singular values:
R_k = U_k × Σ_k × V_k^T
This gives the optimal low-rank compression of R, capturing maximum information in k dimensions.
1234567891011121314151617181920212223242526272829303132
import numpy as np def demonstrate_svd_approximation(): """Show how SVD provides optimal low-rank approximation.""" # Full rating matrix (no missing values for demonstration) R = np.array([ [5, 3, 0, 1], [4, 0, 0, 1], [1, 1, 0, 5], [1, 0, 0, 4], [0, 1, 5, 4], ], dtype=float) # Compute full SVD U, sigma, Vt = np.linalg.svd(R, full_matrices=False) print("Singular values:", sigma) print("\nEnergy captured by each component:") energy = sigma**2 / np.sum(sigma**2) cumulative = np.cumsum(energy) for i, (e, c) in enumerate(zip(energy, cumulative)): print(f" σ_{i+1}: {e:.1%} (cumulative: {c:.1%})") # Reconstruct with different ranks for k in [1, 2, 3]: R_k = U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :] error = np.linalg.norm(R - R_k, 'fro') print(f"\nRank-{k} approximation error: {error:.3f}") if __name__ == "__main__": demonstrate_svd_approximation()Classical SVD requires a complete matrix. In recommendations, 99%+ of entries are missing. Treating missing values as zeros drastically biases the decomposition (a missing rating isn't a zero rating). This motivated the development of specialized algorithms.
Simon Funk, competing in the Netflix Prize, popularized a practical approach: instead of computing SVD on the full matrix, learn the factor matrices directly by minimizing error only on observed ratings.
This is the optimization we introduced earlier:
min_{P,Q} Σ_{(u,i) ∈ K} (r_ui - p_u · q_i)² + λ(||P||² + ||Q||²)
This 'Funk SVD' (despite not being true SVD) became the standard approach:
The key insight: we only need predictions for observed positions during training, and we want predictions for unobserved positions. Ignoring missing data during optimization solves the sparsity problem elegantly.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
import numpy as npfrom dataclasses import dataclassfrom typing import List, Tuple @dataclassclass Rating: user_id: int item_id: int rating: float class FunkSVD: """ Funk's SVD-style matrix factorization. Trains only on observed ratings, ignoring missing entries. """ def __init__(self, n_users: int, n_items: int, n_factors: int = 50): self.n_factors = n_factors # Initialize with small random values self.global_mean = 0.0 self.user_bias = np.zeros(n_users) self.item_bias = np.zeros(n_items) self.P = np.random.normal(0, 0.1, (n_users, n_factors)) self.Q = np.random.normal(0, 0.1, (n_items, n_factors)) def predict(self, user_id: int, item_id: int) -> float: pred = (self.global_mean + self.user_bias[user_id] + self.item_bias[item_id] + np.dot(self.P[user_id], self.Q[item_id])) return np.clip(pred, 1.0, 5.0) def fit(self, ratings: List[Rating], n_epochs: int = 20, lr: float = 0.005, reg: float = 0.02): """Train using stochastic gradient descent.""" self.global_mean = np.mean([r.rating for r in ratings]) for epoch in range(n_epochs): np.random.shuffle(ratings) total_error = 0.0 for r in ratings: pred = self.predict(r.user_id, r.item_id) err = r.rating - pred total_error += err ** 2 # Update biases self.user_bias[r.user_id] += lr * (err - reg * self.user_bias[r.user_id]) self.item_bias[r.item_id] += lr * (err - reg * self.item_bias[r.item_id]) # Update latent factors pu, qi = self.P[r.user_id].copy(), self.Q[r.item_id].copy() self.P[r.user_id] += lr * (err * qi - reg * pu) self.Q[r.item_id] += lr * (err * pu - reg * qi) rmse = np.sqrt(total_error / len(ratings)) print(f"Epoch {epoch+1}: RMSE = {rmse:.4f}")SVD++, proposed by Yehuda Koren, extends the basic model by incorporating implicit feedback—the fact that a user rated an item at all.
The insight: even before knowing the rating value, the act of rating reveals something about the user. Users who rate many indie films differ from those who rate only blockbusters, even if their explicit ratings are similar.
The SVD++ model:
r̂_ui = μ + b_u + b_i + q_i^T (p_u + |N(u)|^{-1/2} Σ_{j∈N(u)} y_j)
Where:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
import numpy as npfrom collections import defaultdictfrom typing import List, Set, Dict class SVDPlusPlus: """ SVD++ model incorporating implicit feedback. User representation combines explicit factors (p_u) with implicit factors (sum of y_j for items user has rated). """ def __init__(self, n_users: int, n_items: int, n_factors: int = 50): self.n_factors = n_factors scale = 0.1 self.global_mean = 0.0 self.user_bias = np.zeros(n_users) self.item_bias = np.zeros(n_items) self.P = np.random.normal(0, scale, (n_users, n_factors)) self.Q = np.random.normal(0, scale, (n_items, n_factors)) self.Y = np.random.normal(0, scale, (n_items, n_factors)) # Implicit factors # User's rated items (implicit feedback) self.user_items: Dict[int, Set[int]] = defaultdict(set) def _get_implicit_feedback(self, user_id: int) -> np.ndarray: """Compute implicit feedback contribution for user.""" items = self.user_items[user_id] if not items: return np.zeros(self.n_factors) # Sum of y vectors, normalized by sqrt of count y_sum = np.sum([self.Y[j] for j in items], axis=0) return y_sum / np.sqrt(len(items)) def predict(self, user_id: int, item_id: int) -> float: """Predict with both explicit and implicit factors.""" implicit = self._get_implicit_feedback(user_id) user_vec = self.P[user_id] + implicit pred = (self.global_mean + self.user_bias[user_id] + self.item_bias[item_id] + np.dot(self.Q[item_id], user_vec)) return np.clip(pred, 1.0, 5.0) def fit(self, ratings: list, n_epochs: int = 20, lr: float = 0.007, reg: float = 0.02): """Train SVD++ with SGD.""" # Build user-item implicit feedback sets for r in ratings: self.user_items[r.user_id].add(r.item_id) self.global_mean = np.mean([r.rating for r in ratings]) for epoch in range(n_epochs): np.random.shuffle(ratings) sq_error = 0.0 for r in ratings: items_u = self.user_items[r.user_id] sqrt_nu = np.sqrt(len(items_u)) if items_u else 1.0 # Implicit contribution implicit = self._get_implicit_feedback(r.user_id) user_vec = self.P[r.user_id] + implicit pred = (self.global_mean + self.user_bias[r.user_id] + self.item_bias[r.item_id] + np.dot(self.Q[r.item_id], user_vec)) err = r.rating - pred sq_error += err ** 2 # Update biases self.user_bias[r.user_id] += lr * (err - reg * self.user_bias[r.user_id]) self.item_bias[r.item_id] += lr * (err - reg * self.item_bias[r.item_id]) # Update factors qi = self.Q[r.item_id].copy() self.P[r.user_id] += lr * (err * qi - reg * self.P[r.user_id]) self.Q[r.item_id] += lr * (err * user_vec - reg * qi) # Update implicit factors for j in items_u: self.Y[j] += lr * (err * qi / sqrt_nu - reg * self.Y[j]) print(f"Epoch {epoch+1}: RMSE = {np.sqrt(sq_error/len(ratings)):.4f}")SVD++ typically improves RMSE by 1-2% over basic SVD. The implicit feedback helps especially for users with few ratings—even one rating places them in a 'neighborhood' based on that item's y vector. This provides a better starting point than an untrained p_u vector alone.
| Parameter | Typical Range | Netflix Prize Values |
|---|---|---|
| n_factors (k) | 20-200 | 50-200 |
| learning_rate | 0.001-0.02 | 0.007 |
| regularization | 0.01-0.1 | 0.015-0.05 |
| n_epochs | 20-100 | 30-50 |
You now understand the SVD family of algorithms: from classical SVD to practical Funk SVD to the implicit-aware SVD++. Next, we'll explore Alternating Least Squares (ALS), a different optimization approach that enables parallelization and scales to massive datasets.