Matrix Factorization - Learning Module

Loading content...

0/245

Implicit Feedback

Beyond Explicit Ratings: Learning from Behavior

Most users never rate items. They click, browse, purchase, listen, and watch—but rarely provide explicit feedback. For every Netflix rating, there are hundreds of plays. For every product review, thousands of purchases.

Implicit feedback is this behavioral data: watch time, click sequences, purchase history, playlist additions. It's abundant but fundamentally different from ratings:

No negative signal: We observe what users did, not what they avoided
Noisy positive signal: A click doesn't mean the user liked it
Varying confidence: Watching a movie 5 times means more than watching once

Matrix factorization for implicit feedback requires fundamentally different formulations.

What You Will Learn

This page covers the differences between explicit and implicit feedback, the weighted matrix factorization formulation, confidence weighting schemes, ALS for implicit feedback, and handling negative sampling.

Explicit vs. Implicit Feedback

Comparison of Feedback Types
Aspect	Explicit Feedback	Implicit Feedback
Examples	Star ratings, thumbs up/down	Clicks, views, purchases, time spent
Sparsity	Very sparse (~1% filled)	Dense (can observe all interactions)
Interpretation	Direct preference expression	Indirect preference signal
Negative signal	Low ratings indicate dislike	No negative—just absence
Confidence	Generally uniform	Varies (5 purchases > 1 purchase)
Scale	Millions of ratings	Billions of interactions

The absence problem:

With explicit ratings, a missing entry means 'user hasn't rated this item.' With implicit feedback, what does absence mean?

User doesn't know the item exists?
User knows but isn't interested?
User is interested but hasn't gotten to it yet?
Item wasn't shown to the user?

We can't distinguish these cases. This fundamentally changes the optimization problem—we can't simply ignore missing entries.

The Key Insight

For implicit feedback, we must model ALL user-item pairs, not just observed interactions. Missing entries aren't 'unknown'—they're likely negative (low preference), but with low confidence. We need to weight observations by our confidence in them.

Weighted Matrix Factorization for Implicit Feedback

The seminal work by Hu, Koren, and Volinsky (2008) introduced a weighted formulation:

Variables:

r_ui: Raw observation (e.g., number of plays, purchases)
p_ui: Binary preference (1 if r_ui > 0, else 0)
c_ui: Confidence in the preference (higher for more observations)

Objective:

min Σ_{u,i} c_ui × (p_ui - x_u · y_i)² + λ(||X||² + ||Y||²)

Note: We sum over all (u,i) pairs, not just observed ones. The confidence weights control how much each pair matters.

implicit_mf.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import numpy as np
from typing import Dict, Tuple
 
class ImplicitMF:
    """
    Matrix factorization for implicit feedback data.
    
    Following Hu, Koren, Volinsky (2008):
    - p_ui: binary preference (did user interact?)
    - c_ui: confidence (based on interaction count)
    """
    
    def __init__(self, n_users: int, n_items: int, n_factors: int = 50,
                 reg: float = 0.1, alpha: float = 40.0):
        self.n_users = n_users
        self.n_items = n_items
        self.n_factors = n_factors
        self.reg = reg
        self.alpha = alpha  # Confidence scaling factor
        
        # User and item factors
        self.X = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Y = np.random.normal(0, 0.1, (n_items, n_factors))
    
    def compute_confidence(self, r_ui: float) -> float:
        """
        Compute confidence for observation.
        Common choices:
        - Linear: c = 1 + α * r
        - Log: c = 1 + α * log(1 + r/ε)
        """
        return 1.0 + self.alpha * r_ui
    
    def compute_preference(self, r_ui: float) -> float:
        """Binary preference: 1 if positive interaction, 0 otherwise."""
        return 1.0 if r_ui > 0 else 0.0
    
    def predict(self, user_id: int, item_id: int) -> float:
        """Predict preference score."""
        return np.dot(self.X[user_id], self.Y[item_id])
    
    def predict_all_for_user(self, user_id: int) -> np.ndarray:
        """Predict scores for all items for ranking."""
        return self.Y @ self.X[user_id]
 
 
def compute_confidence_levels(interactions: Dict[Tuple[int, int], float],
                               alpha: float = 40.0,
                               use_log: bool = False) -> Dict[Tuple[int, int], float]:
    """
    Compute confidence weights from raw interaction counts.
    
    Args:
        interactions: {(user_id, item_id): count}
        alpha: Confidence scaling factor
        use_log: Use logarithmic scaling (better for heavy-tailed distributions)
    """
    confidences = {}
    
    for (u, i), count in interactions.items():
        if use_log:
            # Log scaling: prevents very frequent items from dominating
            c = 1.0 + alpha * np.log(1.0 + count)
        else:
            # Linear scaling
            c = 1.0 + alpha * count
        
        confidences[(u, i)] = c
    
    return confidences

The α Hyperparameter

The confidence scaling α controls how much we trust observations. α=40 (common default) means a user with 1 listen has confidence 41, while 10 listens gives 401. This dramatically upweights repeated interactions. Tune α via validation metrics like precision@k or MAP.

ALS for Implicit Feedback

ALS is particularly well-suited for implicit feedback because:

All pairs matter: We can't use SGD efficiently when loss includes all O(mn) pairs
Closed-form solution exists: Despite confidence weights, the subproblems remain ridge regression

Update for user u (given fixed Y):

x_u = (Y^T C^u Y + λI)^{-1} Y^T C^u p^u

Where C^u is diagonal with c_ui on the diagonal, and p^u is the preference vector.

The computational trick:

Direct computation is O(n × k²) per user (summing over all items). But we can rewrite:

Y^T C^u Y = Y^T Y + Y^T (C^u - I) Y

Y^T Y: Precompute once per iteration, O(nk²)
(C^u - I): Only non-zero for items user interacted with

This reduces to O(|I_u| × k²) per user—huge savings!

als_implicit.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
from scipy import sparse
 
class ALSImplicit:
    """
    Efficient ALS for implicit feedback.
    Uses the computational trick to avoid O(n) per user.
    """
    
    def __init__(self, n_users: int, n_items: int, n_factors: int = 50,
                 reg: float = 0.1, alpha: float = 40.0):
        self.n_users = n_users
        self.n_items = n_items
        self.n_factors = n_factors
        self.reg = reg
        self.alpha = alpha
        
        self.X = np.random.normal(0, 0.01, (n_users, n_factors))
        self.Y = np.random.normal(0, 0.01, (n_items, n_factors))
    
    def fit(self, interactions: sparse.csr_matrix, n_epochs: int = 15):
        """
        Train on implicit feedback data.
        
        Args:
            interactions: Sparse matrix of (user, item) -> count
        """
        # Convert to confidence and preference
        C = interactions.copy()
        C.data = 1.0 + self.alpha * C.data  # Confidence
        P = (interactions > 0).astype(float)  # Preference
        
        for epoch in range(n_epochs):
            # Update users
            self._update_users(C, P)
            # Update items  
            self._update_items(C.T.tocsr(), P.T.tocsr())
            
            loss = self._compute_loss(C, P)
            print(f"Epoch {epoch+1}: Loss = {loss:.4f}")
    
    def _update_users(self, C, P):
        """Update all user factors."""
        YtY = self.Y.T @ self.Y  # Precompute
        
        for u in range(self.n_users):
            # Items this user interacted with
            items = C[u].indices
            if len(items) == 0:
                continue
            
            # Confidence weights (subtract 1 for the difference)
            c_u = np.array(C[u, items].todense()).flatten()
            p_u = np.array(P[u, items].todense()).flatten()
            
            # Y_u: items user interacted with
            Y_u = self.Y[items]
            
            # A = Y^T Y + Y_u^T (C_u - I) Y_u + λI
            A = YtY + Y_u.T @ np.diag(c_u - 1) @ Y_u
            A += self.reg * np.eye(self.n_factors)
            
            # b = Y_u^T C_u p_u
            b = Y_u.T @ (c_u * p_u)
            
            self.X[u] = np.linalg.solve(A, b)
    
    def _update_items(self, Ct, Pt):
        """Update all item factors (transpose of user update)."""
        XtX = self.X.T @ self.X
        
        for i in range(self.n_items):
            users = Ct[i].indices
            if len(users) == 0:
                continue
            
            c_i = np.array(Ct[i, users].todense()).flatten()
            p_i = np.array(Pt[i, users].todense()).flatten()
            X_i = self.X[users]
            
            A = XtX + X_i.T @ np.diag(c_i - 1) @ X_i
            A += self.reg * np.eye(self.n_factors)
            b = X_i.T @ (c_i * p_i)
            
            self.Y[i] = np.linalg.solve(A, b)
    
    def _compute_loss(self, C, P):
        """Compute weighted squared error (sampled for efficiency)."""
        # Sample some entries for loss estimation
        loss = 0.0
        for u in range(min(1000, self.n_users)):
            items = C[u].indices
            c_u = np.array(C[u, items].todense()).flatten()
            p_u = np.array(P[u, items].todense()).flatten()
            pred = self.Y[items] @ self.X[u]
            loss += np.sum(c_u * (p_u - pred) ** 2)
        return loss

Negative Sampling Strategies

An alternative to weighting all pairs is negative sampling: treat observed interactions as positive examples and sample unobserved pairs as negatives.

Approaches:

Uniform sampling: Random items not interacted with
Popularity-based: Sample negatives proportional to item popularity (harder negatives)
In-batch negatives: Other items in the same training batch serve as negatives

This dramatically reduces computation but introduces sampling bias.

negative_sampling.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
 
class BPRMatrixFactorization:
    """
    Bayesian Personalized Ranking (BPR) with negative sampling.
    
    Instead of regression, optimizes pairwise ranking:
    User should prefer observed items over unobserved items.
    """
    
    def __init__(self, n_users, n_items, n_factors=50, lr=0.01, reg=0.01):
        self.n_users = n_users
        self.n_items = n_items
        self.n_factors = n_factors
        self.lr = lr
        self.reg = reg
        
        self.X = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Y = np.random.normal(0, 0.1, (n_items, n_factors))
    
    def sample_negative(self, user_id, positive_items):
        """Sample a random item the user hasn't interacted with."""
        while True:
            neg_item = np.random.randint(self.n_items)
            if neg_item not in positive_items:
                return neg_item
    
    def bpr_loss(self, x_uij):
        """BPR loss: -log sigmoid(x_uij)"""
        return -np.log(1.0 / (1.0 + np.exp(-x_uij)) + 1e-10)
    
    def train_step(self, user_id, pos_item, neg_item):
        """
        BPR update: optimize P(i >_u j) for positive i, negative j.
        """
        x_u = self.X[user_id]
        y_i = self.Y[pos_item]
        y_j = self.Y[neg_item]
        
        # Preference difference
        x_uij = np.dot(x_u, y_i - y_j)
        
        # Sigmoid derivative
        sigmoid = 1.0 / (1.0 + np.exp(x_uij))
        
        # Gradients
        self.X[user_id] += self.lr * (sigmoid * (y_i - y_j) - self.reg * x_u)
        self.Y[pos_item] += self.lr * (sigmoid * x_u - self.reg * y_i)
        self.Y[neg_item] += self.lr * (-sigmoid * x_u - self.reg * y_j)
        
        return self.bpr_loss(x_uij)

BPR vs Weighted MF

BPR optimizes ranking directly (pairwise comparisons), while weighted MF optimizes pointwise predictions. BPR is often better for top-k recommendation quality, while weighted MF provides calibrated preference scores.

Summary

Key Takeaways

•Implicit feedback lacks negatives: We observe actions, not avoidance. Missing ≠ dislike.
•Confidence weighting: More interactions = higher confidence. c_ui = 1 + α × r_ui.
•Model all pairs: Unlike explicit MF, implicit includes unobserved pairs (as low-confidence negatives).
•Efficient ALS: Precompute Y^T Y, only update for observed interactions → O(|I_u| × k²).
•Negative sampling alternatives: BPR and sampled approaches trade exactness for speed.

Module Complete

Congratulations! You've completed the Matrix Factorization module. You now understand latent factor models, SVD variants, ALS optimization, regularization strategies, and implicit feedback handling—the complete toolkit for building production-grade collaborative filtering systems.