Production Considerations - Learning Module

Loading content...

0/245

Feedback Loops

The System That Trains Itself

You deploy a recommendation model. Users interact with its recommendations. You collect that data and retrain. The new model generates new recommendations. Users interact. You retrain again.

Congratulations—you've created a feedback loop.

This loop is simultaneously your greatest asset and your greatest risk. On one hand, it enables continuous learning from user behavior. On the other hand, it can create self-reinforcing dynamics that amplify biases, create filter bubbles, or degrade system quality in subtle, hard-to-detect ways.

The Fundamental Challenge:

Unlike traditional ML where training and inference are separate, recommendation systems create their own training data. The data you train on tomorrow is influenced by the model you deploy today. This circularity has profound implications:

Confounding: You can't distinguish "users like this" from "we showed this, so users clicked it"
Distribution Shift: Training distribution is shaped by past models, not true user preferences
Amplification: Small biases compound over retraining cycles

What You Will Learn

By the end of this page, you will understand how feedback loops emerge and why they're dangerous, master techniques to detect feedback loop problems, learn counterfactual evaluation methods for unbiased estimation, and implement strategies to break harmful loops while preserving beneficial ones.

Anatomy of Recommendation Feedback Loops

Let's dissect how feedback loops form and what dynamics they create.

The Basic Loop:

Model → Recommendations → User Interactions → Data → Model (retrain)

Types of Feedback Loops:

1. Popularity Reinforcement Loop

Popular items get recommended more → They get more clicks → They appear even more popular → Recommended even more...

Result: Winner-take-all dynamics. Top 1% of items get 90% of impressions.

2. Preference Narrowing Loop

Model recommends based on past behavior → User clicks (because that's what's shown) → Model learns to recommend more of the same → Even narrower recommendations...

Result: Filter bubbles. Users never discover new interests.

3. Quality Degradation Loop

Model optimizes for clicks → Clickbait gets more clicks → Model learns clickbait patterns → More clickbait surfaces...

Result: Race to the bottom in content quality.

4. Bias Amplification Loop

Model has slight bias (e.g., demographic) → Biased recommendations → Biased feedback data → Bias amplified in retrained model...

Result: Discrimination compounds over time.

Converting Mermaid diagram...

Feedback Loop Effects by System Type
System	Primary Loop Risk	Manifestation	User Impact
News/Social	Polarization	Echo chambers, extreme content	Radicalization, division
E-commerce	Popularity bias	Long tail items invisible	Missed discoveries
Streaming	Taste narrowing	Genre/artist bubbles	Boredom, churn
Job/Dating	Demographic bias	Discrimination patterns	Unfair outcomes
Search	Position bias	Top results reinforced	SEO gaming

The Compound Effect

Feedback loop problems compound exponentially. A 1% bias in model v1 might be 2% in v2, 4% in v3, and so on. By the time you notice the problem, reversing it requires significant intervention. Early detection is critical.

Position Bias and Selection Bias

Two specific biases dominate feedback loop problems in recommendations: position bias and selection bias.

Position Bias:

Users are more likely to click items in prominent positions, regardless of true relevance.

$$P(\text{click} | \text{position}=1) >> P(\text{click} | \text{position}=10)$$

This creates a confound: did the user click because the item was relevant, or because it was in position 1?

Position Bias Effects:

Items at top get clicks → look more relevant → stay at top
Items at bottom get fewer clicks → look less relevant → stay at bottom
True relevance is confounded with position

The Cascade Model:

Users scan from top to bottom and stop when they find something satisfactory:

$$P(\text{click}_i) = P(\text{examine}_i) \cdot P(\text{click} | \text{examine})$$

$$P(\text{examine}_i) \approx \frac{1}{i^{\alpha}}$$

With $\alpha \approx 1$, position 10 has 10x fewer examinations than position 1.

Selection Bias:

We only observe feedback for items that were recommended. Items not shown have no data.

Item	Shown?	Clicked?	True Relevance
A	Yes	Yes	High
B	Yes	No	Low
C	No	?	Unknown

We can learn about A and B, but C remains a mystery. If C was never shown because an early model ranked it poorly, we'll never discover its true quality.

Missing Not at Random (MNAR):

Critically, which items are missing isn't random—it's determined by past models. This means:

Training data is biased sample of item space
Items with low historical impressions are systematically underrepresented
New items and niche items suffer most

The Logging Policy Problem:

When you train a model on logged data, you're learning what worked for the logging policy, not what would work in general. This is the off-policy learning challenge.

bias_estimation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
import numpy as np
from typing import List, Dict, Tuple
from dataclasses import dataclass
from scipy.optimize import minimize
 
@dataclass
class Impression:
    """Single impression with position and outcome."""
    item_id: str
    position: int
    clicked: bool
    user_id: str
 
 
class PositionBiasEstimator:
    """
    Estimate position bias from click data.
    
    Uses Expectation-Maximization to jointly estimate:
    - Position examination probabilities
    - Item relevance probabilities
    """
    
    def __init__(self, n_positions: int = 10):
        self.n_positions = n_positions
        self.examination_probs: np.ndarray = None
        self.relevance_scores: Dict[str, float] = {}
    
    def fit(
        self,
        impressions: List[Impression],
        max_iterations: int = 100,
        tol: float = 1e-4,
    ):
        """
        Estimate position bias using EM algorithm.
        
        Click probability = P(examine) × P(relevant)
        
        E-step: estimate click was due to examination given model
        M-step: update examination and relevance parameters
        """
        # Initialize
        self.examination_probs = np.array([
            1.0 / (i + 1) for i in range(self.n_positions)
        ])
        self.examination_probs /= self.examination_probs[0]  # Normalize
        
        # Initial relevance: empirical CTR per item
        item_clicks = {}
        item_impressions = {}
        for imp in impressions:
            item_clicks[imp.item_id] = item_clicks.get(imp.item_id, 0) + imp.clicked
            item_impressions[imp.item_id] = item_impressions.get(imp.item_id, 0) + 1
        
        for item_id in item_impressions:
            self.relevance_scores[item_id] = (
                item_clicks.get(item_id, 0) / item_impressions[item_id]
            )
        
        # EM iterations
        prev_ll = float('-inf')
        for iteration in range(max_iterations):
            # E-step: compute expected examination given click
            exam_expectations = self._e_step(impressions)
            
            # M-step: update parameters
            self._m_step(impressions, exam_expectations)
            
            # Check convergence
            ll = self._log_likelihood(impressions)
            if ll - prev_ll < tol:
                print(f"Converged at iteration {iteration}")
                break
            prev_ll = ll
    
    def _e_step(
        self,
        impressions: List[Impression],
    ) -> List[float]:
        """Compute expected examination probability."""
        expectations = []
        
        for imp in impressions:
            exam_prob = self.examination_probs[min(imp.position, self.n_positions - 1)]
            rel_prob = self.relevance_scores.get(imp.item_id, 0.5)
            
            if imp.clicked:
                # Click implies examination
                expectations.append(1.0)
            else:
                # No click: could be not examined or not relevant
                # P(not examined | not clicked) using Bayes
                p_not_click_not_exam = 1 - exam_prob
                p_not_click_exam_not_rel = exam_prob * (1 - rel_prob)
                p_not_click = p_not_click_not_exam + p_not_click_exam_not_rel
                
                p_exam_given_not_click = p_not_click_exam_not_rel / max(p_not_click, 1e-10)
                expectations.append(p_exam_given_not_click)
        
        return expectations
    
    def _m_step(
        self,
        impressions: List[Impression],
        exam_expectations: List[float],
    ):
        """Update examination and relevance parameters."""
        # Update examination probabilities
        position_exam_sum = np.zeros(self.n_positions)
        position_count = np.zeros(self.n_positions)
        
        for imp, exam_exp in zip(impressions, exam_expectations):
            pos = min(imp.position, self.n_positions - 1)
            position_exam_sum[pos] += exam_exp
            position_count[pos] += 1
        
        for pos in range(self.n_positions):
            if position_count[pos] > 0:
                self.examination_probs[pos] = position_exam_sum[pos] / position_count[pos]
        
        # Update relevance scores
        item_click_sum = {}
        item_exam_sum = {}
        
        for imp, exam_exp in zip(impressions, exam_expectations):
            item_click_sum[imp.item_id] = (
                item_click_sum.get(imp.item_id, 0) + imp.clicked
            )
            item_exam_sum[imp.item_id] = (
                item_exam_sum.get(imp.item_id, 0) + exam_exp
            )
        
        for item_id in item_exam_sum:
            if item_exam_sum[item_id] > 0:
                self.relevance_scores[item_id] = (
                    item_click_sum.get(item_id, 0) / item_exam_sum[item_id]
                )
    
    def _log_likelihood(self, impressions: List[Impression]) -> float:
        """Compute log likelihood of data given current parameters."""
        ll = 0.0
        for imp in impressions:
            pos = min(imp.position, self.n_positions - 1)
            exam_prob = self.examination_probs[pos]
            rel_prob = self.relevance_scores.get(imp.item_id, 0.5)
            
            click_prob = exam_prob * rel_prob
            
            if imp.clicked:
                ll += np.log(max(click_prob, 1e-10))
            else:
                ll += np.log(max(1 - click_prob, 1e-10))
        
        return ll
    
    def get_debiased_relevance(self, item_id: str) -> float:
        """Get position-debiased relevance estimate."""
        return self.relevance_scores.get(item_id, 0.5)
    
    def get_position_bias_curve(self) -> np.ndarray:
        """Get estimated examination probability by position."""
        return self.examination_probs.copy()
 
 
class IPSEstimator:
    """
    Inverse Propensity Scoring for selection bias correction.
    
    Reweights observations by inverse of selection probability
    to get unbiased estimates.
    """
    
    def __init__(self, propensity_model=None):
        """
        Args:
            propensity_model: Model that predicts P(item shown | context)
                             If None, uses historical frequencies.
        """
        self.propensity_model = propensity_model
        self.item_propensities: Dict[str, float] = {}
    
    def estimate_propensities_from_logs(
        self,
        impressions: List[Impression],
        total_opportunities: int = None,
    ):
        """
        Estimate propensities from historical impression logs.
        
        P(item shown) ≈ impressions / total_opportunities
        """
        item_counts = {}
        for imp in impressions:
            item_counts[imp.item_id] = item_counts.get(imp.item_id, 0) + 1
        
        if total_opportunities is None:
            total_opportunities = len(impressions)
        
        for item_id, count in item_counts.items():
            self.item_propensities[item_id] = count / total_opportunities
    
    def compute_ips_estimate(
        self,
        impressions: List[Impression],
        clip_threshold: float = 0.01,
    ) -> Dict[str, float]:
        """
        Compute IPS-weighted relevance estimates.
        
        IPS weights: w_i = 1 / P(shown_i)
        
        With clipping to avoid extreme weights for rare items.
        """
        item_weighted_clicks = {}
        item_weights = {}
        
        for imp in impressions:
            propensity = max(
                self.item_propensities.get(imp.item_id, 0.5),
                clip_threshold  # Clip to avoid huge weights
            )
            weight = 1.0 / propensity
            
            item_weighted_clicks[imp.item_id] = (
                item_weighted_clicks.get(imp.item_id, 0) + 
                weight * imp.clicked
            )
            item_weights[imp.item_id] = (
                item_weights.get(imp.item_id, 0) + weight
            )
        
        # Normalize
        estimates = {}
        for item_id in item_weights:
            estimates[item_id] = (
                item_weighted_clicks[item_id] / item_weights[item_id]
            )
        
        return estimates

Counterfactual Evaluation Methods

How do you evaluate a new recommendation policy using data collected from an old policy? This is the counterfactual or off-policy evaluation problem.

The Challenge:

You have:

Logged data from production policy $\pi_0$ (the "logging policy")
A new policy $\pi_1$ you want to evaluate (the "target policy")

You want to estimate: "How would $\pi_1$ perform if deployed?"

Naive Approach Fails:

Simply computing metrics on logged data doesn't work because:

$\pi_1$ would recommend different items than $\pi_0$
Logged clicks are for $\pi_0$'s recommendations, not $\pi_1$'s
Data is biased toward $\pi_0$'s choices

Inverse Propensity Scoring (IPS):

Reweight observed rewards by inverse of logging policy's probability:

$$\hat{V}{IPS}(\pi_1) = \frac{1}{n} \sum{i=1}^{n} \frac{\pi_1(a_i | x_i)}{\pi_0(a_i | x_i)} \cdot r_i$$

Intuition: Actions that $\pi_0$ rarely takes but $\pi_1$ would take often get upweighted.

IPS Challenges:

High Variance: If $\pi_0(a) \approx 0$ but $\pi_1(a) > 0$, the weight is huge
Need Propensities: Must know or estimate $\pi_0(a|x)$
Support Requirement: $\pi_0$ must sometimes take actions $\pi_1$ would take

Variance Reduction Techniques:

1. Clipping/Capping

$$w_i = \min\left(\frac{\pi_1(a_i | x_i)}{\pi_0(a_i | x_i)}, M\right)$$

Cap weights at maximum $M$ (e.g., 10) to reduce variance. Introduces bias but often worthwhile.

2. Self-Normalized IPS (SNIPS)

$$\hat{V}{SNIPS}(\pi_1) = \frac{\sum{i} w_i \cdot r_i}{\sum_{i} w_i}$$

Normalize by sum of weights. More stable, always in valid range.

3. Doubly Robust (DR)

$$\hat{V}{DR} = \frac{1}{n} \sum{i} \left[ \hat{r}(x_i, \pi_1(x_i)) + w_i (r_i - \hat{r}(x_i, a_i)) \right]$$

Combines direct method (model $\hat{r}$) with IPS correction. Robust to either model or propensity estimation errors.

counterfactual_evaluation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import numpy as np
from typing import List, Dict, Tuple, Callable
from dataclasses import dataclass
 
@dataclass
class LoggedInteraction:
    """Single logged interaction from production."""
    context: np.ndarray  # User/session features
    action: str  # Item shown
    reward: float  # Click/conversion
    propensity: float  # P(action | context) under logging policy
 
 
class CounterfactualEvaluator:
    """
    Evaluate new recommendation policies using logged data.
    
    Implements IPS, SNIPS, and Doubly Robust estimators.
    """
    
    def __init__(
        self,
        weight_clip: float = 10.0,
        min_propensity: float = 0.01,
    ):
        self.weight_clip = weight_clip
        self.min_propensity = min_propensity
    
    def evaluate_ips(
        self,
        logged_data: List[LoggedInteraction],
        target_policy: Callable[[np.ndarray, str], float],
    ) -> Tuple[float, float]:
        """
        Inverse Propensity Scoring estimation.
        
        Args:
            logged_data: Interactions from logging policy
            target_policy: Function(context, action) -> P(action | context)
                          under the target policy
        
        Returns:
            (estimate, standard_error)
        """
        weighted_rewards = []
        
        for interaction in logged_data:
            # Propensity ratio
            target_prob = target_policy(interaction.context, interaction.action)
            logging_prob = max(interaction.propensity, self.min_propensity)
            
            weight = target_prob / logging_prob
            weight = min(weight, self.weight_clip)  # Clip
            
            weighted_rewards.append(weight * interaction.reward)
        
        estimate = np.mean(weighted_rewards)
        std_error = np.std(weighted_rewards) / np.sqrt(len(weighted_rewards))
        
        return estimate, std_error
    
    def evaluate_snips(
        self,
        logged_data: List[LoggedInteraction],
        target_policy: Callable[[np.ndarray, str], float],
    ) -> Tuple[float, float]:
        """
        Self-Normalized IPS estimation.
        
        Normalizes by sum of weights for lower variance.
        """
        weights = []
        weighted_rewards = []
        
        for interaction in logged_data:
            target_prob = target_policy(interaction.context, interaction.action)
            logging_prob = max(interaction.propensity, self.min_propensity)
            
            weight = min(target_prob / logging_prob, self.weight_clip)
            weights.append(weight)
            weighted_rewards.append(weight * interaction.reward)
        
        total_weight = sum(weights)
        estimate = sum(weighted_rewards) / total_weight if total_weight > 0 else 0
        
        # Approximate standard error
        n = len(logged_data)
        variance = np.var([
            w * (r - estimate) for w, r 
            in zip(weights, [i.reward for i in logged_data])
        ]) / (total_weight ** 2)
        std_error = np.sqrt(variance * n)
        
        return estimate, std_error
    
    def evaluate_doubly_robust(
        self,
        logged_data: List[LoggedInteraction],
        target_policy: Callable[[np.ndarray, str], float],
        reward_model: Callable[[np.ndarray, str], float],
    ) -> Tuple[float, float]:
        """
        Doubly Robust estimation.
        
        Combines direct method with IPS correction.
        Consistent if either reward_model or propensities are correct.
        
        Args:
            reward_model: Function(context, action) -> E[reward]
        """
        estimates = []
        
        for interaction in logged_data:
            target_prob = target_policy(interaction.context, interaction.action)
            logging_prob = max(interaction.propensity, self.min_propensity)
            
            weight = min(target_prob / logging_prob, self.weight_clip)
            
            # Predicted reward for logged action
            predicted_reward = reward_model(interaction.context, interaction.action)
            
            # Expected reward under target policy (approximate)
            # In practice, would sum over all actions weighted by target_prob
            expected_under_target = reward_model(
                interaction.context, 
                interaction.action  # Simplified: same action
            )
            
            # DR estimate
            dr_term = expected_under_target + weight * (interaction.reward - predicted_reward)
            estimates.append(dr_term)
        
        estimate = np.mean(estimates)
        std_error = np.std(estimates) / np.sqrt(len(estimates))
        
        return estimate, std_error
 
 
class PolicyComparisonFramework:
    """
    Framework for comparing multiple recommendation policies
    using counterfactual evaluation.
    """
    
    def __init__(self):
        self.evaluator = CounterfactualEvaluator()
    
    def compare_policies(
        self,
        logged_data: List[LoggedInteraction],
        policies: Dict[str, Callable],
        methods: List[str] = ['ips', 'snips'],
    ) -> Dict[str, Dict[str, Tuple[float, float]]]:
        """
        Compare multiple policies using multiple estimators.
        
        Returns:
            {policy_name: {method: (estimate, stderr)}}
        """
        results = {}
        
        for policy_name, policy_fn in policies.items():
            results[policy_name] = {}
            
            for method in methods:
                if method == 'ips':
                    est, se = self.evaluator.evaluate_ips(logged_data, policy_fn)
                elif method == 'snips':
                    est, se = self.evaluator.evaluate_snips(logged_data, policy_fn)
                else:
                    continue
                
                results[policy_name][method] = (est, se)
                print(f"{policy_name} ({method}): {est:.4f} ± {se:.4f}")
        
        return results
    
    def compute_confidence_interval(
        self,
        estimate: float,
        std_error: float,
        confidence: float = 0.95,
    ) -> Tuple[float, float]:
        """Compute confidence interval for estimate."""
        from scipy import stats
        z = stats.norm.ppf((1 + confidence) / 2)
        return (estimate - z * std_error, estimate + z * std_error)

Breaking Harmful Feedback Loops

Now that we understand feedback loops and how to evaluate them, let's explore strategies to prevent and break harmful loops.

Strategy 1: Principled Exploration

As covered in the exploration-exploitation section, systematic exploration prevents the loop from converging to local optima:

Random exploration injects diversity
Bandits focus exploration on uncertain items
Ensures all items get minimum impressions

Strategy 2: Regularization Toward Uniform

Regularize the model to not deviate too far from a uniform (or prior) distribution:

$$L = L_{\text{prediction}} + \lambda D_{KL}(\pi_{\text{model}} || \pi_{\text{uniform}})$$

This prevents extreme concentration on a few items.

Strategy 3: Importance Weighting in Training

Weight training examples by inverse propensity to debias toward selection policy:

$$L = \sum_i \frac{1}{p(\text{shown}_i)} \cdot \ell(y_i, \hat{y}_i)$$

Items that were rarely shown get higher weight in training.

Additional Loop-Breaking Strategies

•Delayed Feedback Integration — Don't immediately retrain on new data. Use a holdout period to verify patterns are stable.
•Diversity Constraints — Hard constraints on minimum catalog coverage, category diversity, or provider representation.
•Counterfactual Training Data — Use logged data from multiple policies to create more diverse training distribution.
•Periodic Random Slots — Reserve some recommendation positions for random or uniform sampling.
•Freshness Bonuses — Boost new items that haven't accumulated data yet.
•Human Curation Injection — Mix human-curated recommendations with algorithmic ones to break pure optimization loops.

loop_mitigation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
import numpy as np
from typing import List, Dict, Tuple
from collections import defaultdict
 
class FeedbackLoopMonitor:
    """
    Monitor for detecting feedback loop symptoms.
    
    Tracks metrics over time to identify concerning trends.
    """
    
    def __init__(self, window_size: int = 7):
        self.window_size = window_size
        self.metric_history: Dict[str, List[float]] = defaultdict(list)
    
    def log_metrics(self, metrics: Dict[str, float], timestamp: str):
        """Log daily/periodic metrics for trend analysis."""
        for name, value in metrics.items():
            self.metric_history[name].append(value)
            # Keep only recent window
            if len(self.metric_history[name]) > self.window_size * 4:
                self.metric_history[name] = self.metric_history[name][-self.window_size * 4:]
    
    def detect_concentration(self) -> Dict[str, float]:
        """
        Detect increasing concentration (decreasing diversity).
        
        Returns trend coefficients for concentration metrics.
        """
        alerts = {}
        
        for metric in ['gini_coefficient', 'top_10_concentration', 'catalog_coverage']:
            if metric in self.metric_history and len(self.metric_history[metric]) >= self.window_size:
                values = self.metric_history[metric][-self.window_size:]
                
                # Compute trend
                x = np.arange(len(values))
                slope, _ = np.polyfit(x, values, 1)
                
                # Alert thresholds
                if metric == 'catalog_coverage' and slope < -0.01:  # Decreasing coverage
                    alerts[metric] = {'trend': slope, 'alert': 'Coverage decreasing'}
                elif metric == 'gini_coefficient' and slope > 0.01:  # Increasing inequality
                    alerts[metric] = {'trend': slope, 'alert': 'Concentration increasing'}
                elif metric == 'top_10_concentration' and slope > 0.005:
                    alerts[metric] = {'trend': slope, 'alert': 'Winner-take-all emerging'}
        
        return alerts
    
    def detect_filter_bubble(
        self,
        user_category_history: Dict[str, List[str]],
    ) -> Dict[str, float]:
        """
        Detect filter bubble formation.
        
        Measures user-level category diversity over time.
        """
        bubble_scores = {}
        
        for user_id, categories in user_category_history.items():
            if len(categories) < 10:
                continue
            
            # Split into early and recent
            early = categories[:len(categories)//2]
            recent = categories[len(categories)//2:]
            
            # Measure diversity (entropy)
            def entropy(items):
                if not items:
                    return 0
                unique, counts = np.unique(items, return_counts=True)
                probs = counts / len(items)
                return -np.sum(probs * np.log2(probs + 1e-10))
            
            early_entropy = entropy(early)
            recent_entropy = entropy(recent)
            
            # Declining entropy suggests bubble formation
            if early_entropy > 0:
                bubble_scores[user_id] = (early_entropy - recent_entropy) / early_entropy
        
        return bubble_scores
 
 
class LoopMitigationReranker:
    """
    Reranker that applies loop mitigation strategies.
    """
    
    def __init__(
        self,
        diversity_boost: float = 0.1,
        freshness_boost: float = 0.2,
        min_provider_slots: int = 3,
    ):
        self.diversity_boost = diversity_boost
        self.freshness_boost = freshness_boost
        self.min_provider_slots = min_provider_slots
        
        # Track impressions to boost underexposed items
        self.item_impressions: Dict[str, int] = defaultdict(int)
        self.total_impressions = 0
    
    def rerank(
        self,
        items: List[Dict],
        k: int,
    ) -> List[Dict]:
        """
        Rerank items with loop mitigation boosts.
        
        Items should have: item_id, score, provider_id, age_days
        """
        if not items:
            return []
        
        # Compute mitigation boosts
        for item in items:
            item['original_score'] = item['score']
            
            # Underexposure boost
            expected_impressions = self.total_impressions / max(len(self.item_impressions), 1)
            actual_impressions = self.item_impressions.get(item['item_id'], 0)
            
            if actual_impressions < expected_impressions * 0.5:
                underexposure_boost = self.diversity_boost
            else:
                underexposure_boost = 0
            
            # Freshness boost for new items
            age_days = item.get('age_days', 30)
            if age_days < 7:
                freshness_boost = self.freshness_boost * (1 - age_days / 7)
            else:
                freshness_boost = 0
            
            item['score'] = (
                item['original_score'] + 
                underexposure_boost + 
                freshness_boost
            )
        
        # Sort by boosted score
        items.sort(key=lambda x: x['score'], reverse=True)
        
        # Ensure provider diversity
        selected = []
        provider_counts = defaultdict(int)
        remaining = list(items)
        
        while len(selected) < k and remaining:
            for item in remaining:
                provider = item.get('provider_id')
                
                # Enforce max per provider
                if provider_counts[provider] < self.min_provider_slots:
                    selected.append(item)
                    remaining.remove(item)
                    provider_counts[provider] += 1
                    break
            else:
                # All providers at limit, just take next best
                if remaining:
                    selected.append(remaining.pop(0))
        
        return selected
    
    def record_impressions(self, item_ids: List[str]):
        """Record that items were shown."""
        for item_id in item_ids:
            self.item_impressions[item_id] += 1
            self.total_impressions += 1
 
 
class HistoricalDiversityMaintainer:
    """
    Maintain diversity by ensuring user sees items from
    categories they haven't seen recently.
    """
    
    def __init__(self, history_window: int = 50):
        self.history_window = history_window
        self.user_history: Dict[str, List[str]] = defaultdict(list)
    
    def boost_novel_categories(
        self,
        user_id: str,
        items: List[Dict],
    ) -> List[Dict]:
        """Boost items from categories user hasn't seen recently."""
        recent_categories = set(self.user_history[user_id][-self.history_window:])
        
        for item in items:
            category = item.get('category')
            if category and category not in recent_categories:
                item['score'] = item.get('score', 0) * 1.2  # 20% boost
        
        return items
    
    def record_shown(self, user_id: str, category: str):
        """Record shown category for user."""
        history = self.user_history[user_id]
        history.append(category)
        
        # Trim to window
        if len(history) > self.history_window * 2:
            self.user_history[user_id] = history[-self.history_window:]

Summary: Managing Self-Reinforcing Systems

We've explored the critical challenge of feedback loops in recommendation systems. Let's consolidate the key principles:

Key Takeaways

•Feedback loops are inevitable — Recommendation systems create their own training data; acknowledge and manage this reality.
•Position and selection bias confound learning — What users click is influenced by what you show; disentangle this.
•Counterfactual evaluation is essential — Use IPS/SNIPS/DR to estimate new policy performance from logged data.
•Monitor for concentration and bubbles — Track diversity metrics over time to catch harmful loops early.
•Active intervention required — Exploration, regularization, and diversity constraints prevent convergence to harmful equilibria.
•Balance short and long-term — Some short-term metric sacrifice prevents long-term system degradation.

Module Complete

Congratulations! You've completed the Production Considerations module. You now understand the critical engineering challenges of deploying recommendation systems at scale—from scalability and real-time serving to diversity, fairness, exploration, and feedback loop management. These considerations separate toy systems from production-grade recommendation infrastructure.