Machine LearningRecommendation Systems

Production Considerations

LevelAdvanced

Duration90 mins

TopicRecommendation Systems

3 / 5

Diversity and Fairness

Beyond Accuracy: The Recommendation Quality Problem

Your recommendation model achieves state-of-the-art precision and recall. It predicts user preferences with remarkable accuracy. Yet users complain the experience feels "boring" and "predictable." Meanwhile, content creators report their work never gets surfaced, and advocacy groups raise concerns about discriminatory treatment of certain user segments.

Welcome to the diversity and fairness problem—where optimizing for accuracy alone creates deeply flawed systems.

The Problem with Pure Accuracy:

Traditional recommendation metrics optimize for predicting what users will click or rate highly. But this creates several pathologies:

Filter Bubbles: Users see only content confirming existing preferences
Popularity Bias: Popular items dominate; niche content never surfaces
Provider Unfairness: New creators can't compete with established ones
User Unfairness: Certain demographic groups receive worse recommendations
Homogeneity: All recommendations look the same; no serendipity

What You Will Learn

By the end of this page, you will understand why diversity matters for user experience and platform health, master techniques for injecting diversity into recommendations, learn frameworks for defining and measuring fairness, and implement algorithms that balance accuracy, diversity, and fairness.

The Case for Diversity

Diversity in recommendations isn't just a "nice to have"—it's essential for long-term user satisfaction and platform health. Let's understand why.

User Experience Benefits:

1. Avoiding Boredom

Even if users love action movies, showing only action movies leads to fatigue. Variety sustains engagement over time.

2. Enabling Discovery

Users can't explicitly request content they don't know exists. Diverse recommendations expose users to items that expand their horizons.

3. Satisfying Multiple Needs

Users are multifaceted. The same user might want comedy on Friday night, documentaries on Sunday morning, and kids' content when their children are present.

4. Building Trust

Homogeneous recommendations feel like manipulation. Diverse recommendations feel like genuine curation.

Types of Diversity in Recommendations
Diversity Type	Definition	Example	Metric
Intra-list	Variety within a single recommendation list	Mixing genres in movie recommendations	Average pairwise distance
Temporal	Variety across sessions over time	Not repeating same items daily	Item repeat rate
Aggregate	Coverage of catalog across all users	Ensuring niche items get exposure	Catalog coverage %
Categorical	Representation across categories	Balancing fiction/non-fiction	Category entropy
Provider	Distribution across content creators	Fair exposure for new creators	Gini coefficient

Platform Health Benefits:

1. Two-Sided Marketplace Dynamics

Platforms like Amazon, Spotify, and YouTube depend on content providers creating quality content. If the algorithm only promotes established winners, new creators leave the platform.

2. Long-tail Economics

The "long tail" of niche content is often highly profitable. Users willing to pay premium prices are often seeking specialized content. Diverse recommendations unlock long-tail revenue.

3. Resilience

Platforms overly dependent on a few popular items are fragile. If those items become unavailable or stale, engagement collapses. Diverse ecosystems are more resilient.

The Accuracy-Diversity Trade-off:

Diversity and accuracy are often in tension:

$$\text{Accuracy} \uparrow \Rightarrow \text{Diversity} \downarrow$$

Maximizing accuracy means recommending what the model is most confident users will like—which tends to be safe, popular items. Improving diversity means recommending items with less certainty—which reduces measured accuracy.

The key insight: short-term accuracy metrics don't capture long-term user value. A slightly less "accurate" but more diverse system often produces better long-term engagement and satisfaction.

The Exploration Mindset

Think of diversity as exploration investments. Some diverse recommendations will miss, but the hits—items users didn't know they'd love—create memorable experiences that build loyalty. Netflix's research shows that "unexpected" hits drive more word-of-mouth than predictable recommendations.

Algorithms for Diverse Recommendations

Several algorithmic approaches inject diversity into recommendations. The choice depends on your diversity goals and latency constraints.

1. Maximal Marginal Relevance (MMR)

The classic approach: iteratively select items that balance relevance with dissimilarity to already-selected items.

$$MMR(i) = \lambda \cdot \text{Relevance}(i) - (1-\lambda) \cdot \max_{j \in S} \text{Similarity}(i, j)$$

Where $S$ is the set of already-selected items and $\lambda$ controls the relevance-diversity trade-off.

2. Determinantal Point Processes (DPP)

A probabilistic model that favors diverse sets. Items are represented by feature vectors, and the probability of selecting a set is proportional to the determinant of its kernel matrix.

$$P(S) \propto \det(K_S)$$

Determinant is maximized when items are dissimilar (orthogonal vectors).

3. Category-Based Diversification

Simpler approach: ensure minimum representation from different categories.

At least 2 genres in top 10 movies
At least 3 brands in product recommendations
No more than 3 items from same artist

diversity_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
import numpy as np
from typing import List, Dict, Tuple, Set
from dataclasses import dataclass
from scipy.spatial.distance import cosine
 
@dataclass
class ScoredItem:
    """Item with relevance score and features."""
    item_id: str
    relevance_score: float
    embedding: np.ndarray
    category: str = None
 
 
def maximal_marginal_relevance(
    candidates: List[ScoredItem],
    k: int,
    lambda_param: float = 0.5,
) -> List[ScoredItem]:
    """
    Maximal Marginal Relevance diversification.
    
    Iteratively selects items balancing relevance and diversity.
    
    Args:
        candidates: Items with relevance scores and embeddings
        k: Number of items to select
        lambda_param: Trade-off (0=diversity, 1=relevance)
    
    Returns:
        Diversified list of k items
    """
    if not candidates or k <= 0:
        return []
    
    selected: List[ScoredItem] = []
    remaining = list(candidates)
    
    # First item: highest relevance
    remaining.sort(key=lambda x: x.relevance_score, reverse=True)
    selected.append(remaining.pop(0))
    
    # Subsequent items: balance relevance and diversity
    while len(selected) < k and remaining:
        best_score = float('-inf')
        best_idx = 0
        
        for idx, candidate in enumerate(remaining):
            # Relevance term
            relevance = candidate.relevance_score
            
            # Diversity term: max similarity to selected items
            max_similarity = max(
                1 - cosine(candidate.embedding, s.embedding)
                for s in selected
            )
            
            # MMR score
            mmr_score = (
                lambda_param * relevance - 
                (1 - lambda_param) * max_similarity
            )
            
            if mmr_score > best_score:
                best_score = mmr_score
                best_idx = idx
        
        selected.append(remaining.pop(best_idx))
    
    return selected
 
 
def category_constrained_reranking(
    candidates: List[ScoredItem],
    k: int,
    max_per_category: int = 3,
    min_categories: int = 3,
) -> List[ScoredItem]:
    """
    Rerank to ensure category diversity.
    
    Enforces constraints on category distribution.
    
    Args:
        candidates: Items sorted by relevance
        k: Number of items to select
        max_per_category: Max items from same category
        min_categories: Minimum different categories
    """
    selected: List[ScoredItem] = []
    category_counts: Dict[str, int] = {}
    
    # First pass: greedily select respecting max constraint
    for candidate in sorted(candidates, 
                            key=lambda x: x.relevance_score, 
                            reverse=True):
        cat = candidate.category or 'unknown'
        
        if category_counts.get(cat, 0) < max_per_category:
            selected.append(candidate)
            category_counts[cat] = category_counts.get(cat, 0) + 1
            
            if len(selected) >= k:
                break
    
    # Check minimum categories constraint
    if len(category_counts) < min_categories:
        # Need to swap some items to add more categories
        missing_count = min_categories - len(category_counts)
        seen_categories = set(category_counts.keys())
        
        # Find items from new categories
        new_cat_items = [
            c for c in candidates 
            if c.category not in seen_categories and c not in selected
        ]
        
        if new_cat_items and len(selected) > missing_count:
            # Remove lowest-scored items with duplicated categories
            for _ in range(min(missing_count, len(new_cat_items))):
                # Find removable item (category has > 1 item)
                for i in range(len(selected) - 1, -1, -1):
                    cat = selected[i].category
                    if category_counts.get(cat, 0) > 1:
                        category_counts[cat] -= 1
                        selected.pop(i)
                        break
                
                # Add item from new category
                if new_cat_items:
                    new_item = new_cat_items.pop(0)
                    selected.append(new_item)
                    category_counts[new_item.category] = 1
    
    return selected
 
 
class DPPDiversifier:
    """
    Determinantal Point Process for diverse subset selection.
    
    Uses greedy approximation for computational efficiency.
    """
    
    def __init__(self, relevance_weight: float = 0.5):
        self.relevance_weight = relevance_weight
    
    def select(
        self,
        candidates: List[ScoredItem],
        k: int,
    ) -> List[ScoredItem]:
        """
        Select diverse subset using greedy DPP.
        
        Builds kernel matrix from item embeddings and relevance scores,
        then greedily selects items to maximize determinant.
        """
        n = len(candidates)
        if n == 0 or k <= 0:
            return []
        
        # Build kernel matrix: K_ij = relevance_i * similarity_ij * relevance_j
        embeddings = np.array([c.embedding for c in candidates])
        relevances = np.array([c.relevance_score for c in candidates])
        
        # Normalize embeddings
        norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
        norms[norms == 0] = 1
        embeddings_normalized = embeddings / norms
        
        # Similarity matrix
        similarity = embeddings_normalized @ embeddings_normalized.T
        
        # Scale by relevance
        relevance_matrix = np.outer(
            relevances ** self.relevance_weight,
            relevances ** self.relevance_weight
        )
        kernel = similarity * relevance_matrix
        
        # Greedy selection
        selected_indices = self._greedy_select(kernel, k)
        
        return [candidates[i] for i in selected_indices]
    
    def _greedy_select(
        self,
        kernel: np.ndarray,
        k: int,
    ) -> List[int]:
        """Greedy algorithm to approximately maximize det(K_S)."""
        n = kernel.shape[0]
        selected = []
        remaining = set(range(n))
        
        # First item: highest diagonal (relevance)
        first = np.argmax(np.diag(kernel))
        selected.append(first)
        remaining.remove(first)
        
        # Build submatrix incrementally
        for _ in range(k - 1):
            if not remaining:
                break
            
            best_gain = float('-inf')
            best_idx = None
            
            for idx in remaining:
                # Compute gain from adding this item
                # (simplified: use diagonal minus correlations)
                gain = kernel[idx, idx] - np.sum([
                    kernel[idx, s] ** 2 / kernel[s, s] 
                    for s in selected
                ])
                
                if gain > best_gain:
                    best_gain = gain
                    best_idx = idx
            
            if best_idx is not None:
                selected.append(best_idx)
                remaining.remove(best_idx)
        
        return selected
 
 
def compute_diversity_metrics(
    recommendations: List[ScoredItem],
) -> Dict[str, float]:
    """
    Compute diversity metrics for a recommendation list.
    """
    if len(recommendations) < 2:
        return {'avg_pairwise_distance': 0, 'category_entropy': 0}
    
    # Average pairwise distance
    embeddings = [r.embedding for r in recommendations]
    distances = []
    for i in range(len(embeddings)):
        for j in range(i + 1, len(embeddings)):
            dist = cosine(embeddings[i], embeddings[j])
            distances.append(dist)
    avg_distance = np.mean(distances) if distances else 0
    
    # Category entropy
    categories = [r.category for r in recommendations if r.category]
    if categories:
        unique, counts = np.unique(categories, return_counts=True)
        probs = counts / len(categories)
        entropy = -np.sum(probs * np.log2(probs + 1e-10))
    else:
        entropy = 0
    
    # Coverage (would need catalog size)
    unique_items = len(set(r.item_id for r in recommendations))
    
    return {
        'avg_pairwise_distance': avg_distance,
        'category_entropy': entropy,
        'unique_items': unique_items,
        'num_categories': len(set(categories)) if categories else 0,
    }

Fairness in Recommendation Systems

Fairness in recommendations has multiple dimensions, each with different stakeholders and concerns.

User-Side Fairness (Consumer Fairness)

Do all users receive equally good recommendations?

Demographic parity: Recommendation quality shouldn't vary by protected attributes (gender, race, age)
Cold-start fairness: New users shouldn't receive drastically worse recommendations
Majority/minority fairness: Users with niche preferences shouldn't be neglected

Provider-Side Fairness (Producer Fairness)

Do all content providers get fair exposure?

Exposure equity: New creators should have opportunity to be discovered
Proportional representation: Exposure should roughly match quality
Avoiding winner-take-all: Prevent monopolization by top providers

Fairness Metrics for Recommendations
Metric	Stakeholder	Definition	Formula
Quality Parity	Users	Equal accuracy across groups	NDCG(group_A) ≈ NDCG(group_B)
Exposure Fairness	Providers	Fair visibility distribution	Gini(exposure) → 0
Demographic Parity	Users	Equal treatment regardless of attributes	P(rec\|A) = P(rec\|B)
Equal Opportunity	Users/Providers	Equal true positive rates	TPR(A) = TPR(B)
Calibration	Users	Recommendations reflect true preferences	P(like\|rec, group) consistent

The Fairness-Accuracy Trade-off:

Enforcing fairness constraints typically reduces measured accuracy, because you're preventing the model from fully exploiting patterns that may correlate with protected attributes.

$$\max_{\theta} \text{Accuracy}(\theta) \text{ subject to } \text{Fairness Constraint}$$

Common Fairness Approaches:

1. Pre-processing

Modify training data to remove bias:

Balance training samples across groups
Remove or decorrelate sensitive attributes
Augment underrepresented groups

2. In-processing

Add fairness constraints to training:

Regularization terms penalizing unfair outcomes
Adversarial training to remove sensitive information
Constrained optimization with fairness constraints

3. Post-processing

Adjust outputs after model prediction:

Re-rank to achieve exposure targets
Calibrate scores across groups
Apply fairness quotas

fairness_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
import numpy as np
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
from collections import defaultdict
 
@dataclass
class Item:
    """Item with provider information."""
    item_id: str
    provider_id: str
    score: float
    category: str = None
 
 
@dataclass
class FairnessMetrics:
    """Container for fairness measurements."""
    gini_coefficient: float
    entropy: float
    min_exposure: float
    max_exposure: float
    provider_coverage: float
 
 
class ExposureFairnessReranker:
    """
    Rerank recommendations to improve provider exposure fairness.
    
    Uses constrained optimization to balance relevance with
    fair distribution of exposure across providers.
    """
    
    def __init__(
        self,
        target_distribution: Optional[Dict[str, float]] = None,
        fairness_weight: float = 0.3,
    ):
        """
        Args:
            target_distribution: Desired exposure share per provider.
                                If None, uses uniform distribution.
            fairness_weight: Trade-off between relevance and fairness.
        """
        self.target_distribution = target_distribution
        self.fairness_weight = fairness_weight
    
    def rerank(
        self,
        items: List[Item],
        k: int,
        position_bias: Optional[List[float]] = None,
    ) -> List[Item]:
        """
        Rerank items to improve exposure fairness.
        
        Uses greedy algorithm to approximately maximize:
        relevance - fairness_weight * exposure_deviation
        
        Args:
            items: Scored items to rerank
            k: Number of items to return
            position_bias: Expected clicks by position (e.g., [1, 0.5, 0.3, ...])
        """
        if not items:
            return []
        
        if position_bias is None:
            # Default: logarithmic decay
            position_bias = [1.0 / np.log2(i + 2) for i in range(k)]
        
        # Compute target exposure per provider
        providers = set(item.provider_id for item in items)
        if self.target_distribution:
            target = self.target_distribution
        else:
            # Uniform distribution
            target = {p: 1.0 / len(providers) for p in providers}
        
        # Greedy selection
        selected = []
        remaining = list(items)
        current_exposure = defaultdict(float)
        total_exposure = sum(position_bias[:k])
        
        for position in range(min(k, len(items))):
            exposure_at_position = position_bias[position]
            best_item = None
            best_score = float('-inf')
            
            for item in remaining:
                # Relevance component
                relevance = item.score
                
                # Fairness component: how much does this help fairness?
                provider = item.provider_id
                current_share = current_exposure[provider] / max(total_exposure, 1)
                target_share = target.get(provider, 0)
                
                # Bonus for under-represented providers
                fairness_bonus = target_share - current_share
                
                # Combined score
                combined = (
                    (1 - self.fairness_weight) * relevance +
                    self.fairness_weight * fairness_bonus
                )
                
                if combined > best_score:
                    best_score = combined
                    best_item = item
            
            if best_item:
                selected.append(best_item)
                remaining.remove(best_item)
                current_exposure[best_item.provider_id] += exposure_at_position
        
        return selected
    
    def compute_metrics(
        self,
        recommendations: List[Item],
        k: int,
    ) -> FairnessMetrics:
        """Compute exposure fairness metrics."""
        if not recommendations:
            return FairnessMetrics(0, 0, 0, 0, 0)
        
        # Compute exposure per provider
        position_bias = [1.0 / np.log2(i + 2) for i in range(k)]
        exposure = defaultdict(float)
        
        for pos, item in enumerate(recommendations[:k]):
            exposure[item.provider_id] += position_bias[pos]
        
        exposures = list(exposure.values())
        n_providers = len(exposures)
        
        if n_providers == 0:
            return FairnessMetrics(0, 0, 0, 0, 0)
        
        # Gini coefficient
        sorted_exp = sorted(exposures)
        n = len(sorted_exp)
        cumsum = np.cumsum(sorted_exp)
        gini = (n + 1 - 2 * np.sum(cumsum) / cumsum[-1]) / n if cumsum[-1] > 0 else 0
        
        # Entropy
        total_exp = sum(exposures)
        probs = [e / total_exp for e in exposures] if total_exp > 0 else []
        entropy = -sum(p * np.log2(p + 1e-10) for p in probs) if probs else 0
        
        return FairnessMetrics(
            gini_coefficient=gini,
            entropy=entropy,
            min_exposure=min(exposures) if exposures else 0,
            max_exposure=max(exposures) if exposures else 0,
            provider_coverage=n_providers / len(set(i.provider_id for i in recommendations))
        )
 
 
class UserGroupFairnessChecker:
    """
    Check fairness of recommendations across user groups.
    """
    
    def __init__(self, protected_attribute: str):
        """
        Args:
            protected_attribute: Name of protected attribute (e.g., 'gender', 'age_group')
        """
        self.protected_attribute = protected_attribute
    
    def compute_quality_parity(
        self,
        user_groups: Dict[str, List[str]],  # group -> user_ids
        user_metrics: Dict[str, float],  # user_id -> quality metric (e.g., NDCG)
    ) -> Dict[str, float]:
        """
        Compute recommendation quality per group.
        
        Returns dict of group -> average quality.
        """
        group_quality = {}
        
        for group, users in user_groups.items():
            qualities = [
                user_metrics[u] 
                for u in users 
                if u in user_metrics
            ]
            if qualities:
                group_quality[group] = np.mean(qualities)
            else:
                group_quality[group] = 0.0
        
        return group_quality
    
    def compute_disparity(
        self,
        group_metrics: Dict[str, float],
    ) -> Dict[str, float]:
        """
        Compute fairness disparity metrics.
        """
        values = list(group_metrics.values())
        
        if len(values) < 2:
            return {'disparity_ratio': 1.0, 'absolute_gap': 0.0}
        
        max_val = max(values)
        min_val = min(values)
        
        return {
            'disparity_ratio': min_val / max_val if max_val > 0 else 1.0,
            'absolute_gap': max_val - min_val,
            'max_group': max(group_metrics.items(), key=lambda x: x[1])[0],
            'min_group': min(group_metrics.items(), key=lambda x: x[1])[0],
        }
 
 
class FairRankingOptimizer:
    """
    Optimize ranking subject to fairness constraints using
    linear programming relaxation.
    """
    
    def __init__(
        self,
        min_exposure_per_provider: float = 0.05,
        max_exposure_per_provider: float = 0.30,
    ):
        self.min_exposure = min_exposure_per_provider
        self.max_exposure = max_exposure_per_provider
    
    def optimize(
        self,
        scores: np.ndarray,  # (n_items,) relevance scores
        provider_ids: List[str],  # provider for each item
        k: int,
    ) -> List[int]:
        """
        Find optimal ranking subject to exposure constraints.
        
        This is a simplified greedy approach.
        For true optimization, use LP solvers.
        """
        n = len(scores)
        providers = list(set(provider_ids))
        n_providers = len(providers)
        
        # Position importance weights
        position_weights = np.array([1.0 / np.log2(i + 2) for i in range(k)])
        total_exposure = position_weights.sum()
        
        # Target exposure bounds per provider
        min_exp = self.min_exposure * total_exposure
        max_exp = self.max_exposure * total_exposure
        
        # Greedy selection with constraints
        selected = []
        provider_exposure = defaultdict(float)
        available = set(range(n))
        
        for pos in range(min(k, n)):
            weight = position_weights[pos]
            best_idx = None
            best_score = float('-inf')
            
            for idx in available:
                provider = provider_ids[idx]
                
                # Check if adding would exceed max
                if provider_exposure[provider] + weight > max_exp:
                    continue
                
                if scores[idx] > best_score:
                    best_score = scores[idx]
                    best_idx = idx
            
            if best_idx is None:
                # All providers at max; select highest remaining
                if available:
                    best_idx = max(available, key=lambda i: scores[i])
            
            if best_idx is not None:
                selected.append(best_idx)
                available.remove(best_idx)
                provider_exposure[provider_ids[best_idx]] += weight
        
        return selected

Balancing Accuracy, Diversity, and Fairness

Production systems must balance multiple, often competing objectives simultaneously. This requires a principled framework for multi-objective optimization.

The Multi-Objective Problem:

$$\max_{\theta} \left( \text{Accuracy}(\theta), \text{Diversity}(\theta), \text{Fairness}(\theta) \right)$$

No single solution optimizes all objectives. Instead, we seek Pareto-optimal solutions: points where improving one objective requires degrading another.

Approaches to Multi-Objective Optimization:

1. Weighted Sum

Combine objectives into a single scalar:

$$L = \alpha \cdot L_{\text{accuracy}} + \beta \cdot L_{\text{diversity}} + \gamma \cdot L_{\text{fairness}}$$

Simple but requires choosing weights a priori.

2. Constrained Optimization

Optimize one objective subject to constraints on others:

$$\max \text{Accuracy} \text{ s.t. } \text{Diversity} \geq d_{\min}, \text{ Fairness} \geq f_{\min}$$

3. Pareto Frontier Exploration

Find multiple solutions along the Pareto frontier, then choose based on business priorities.

Production Best Practices

•Define objective hierarchy — Which objectives are constraints (must satisfy) vs optimization targets (want to maximize)?
•Set minimum thresholds — Fairness and diversity often have hard minimums (legal, ethical, or business requirements).
•A/B test trade-offs — Test different balance points to find what users actually prefer, not what metrics predict.
•Monitor all objectives — Even if not actively optimizing, track diversity and fairness metrics to catch regressions.
•Context-dependent weights — Different users/pages may warrant different objective balances.
•Regular calibration — Re-evaluate weights as user base, catalog, and business priorities evolve.

multi_objective_ranker.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
import numpy as np
from typing import List, Dict, Callable, Tuple
from dataclasses import dataclass
 
@dataclass
class RankedItem:
    """Item with multiple objective scores."""
    item_id: str
    relevance: float
    diversity_contribution: float
    fairness_contribution: float
    provider_id: str
    embedding: np.ndarray
 
 
class MultiObjectiveReranker:
    """
    Rerank items balancing multiple objectives.
    
    Uses configurable objective weights with support for
    hard constraints.
    """
    
    def __init__(
        self,
        relevance_weight: float = 0.6,
        diversity_weight: float = 0.25,
        fairness_weight: float = 0.15,
        diversity_min_threshold: float = 0.3,
        fairness_min_threshold: float = 0.4,
    ):
        self.weights = {
            'relevance': relevance_weight,
            'diversity': diversity_weight,
            'fairness': fairness_weight,
        }
        self.diversity_threshold = diversity_min_threshold
        self.fairness_threshold = fairness_min_threshold
    
    def rerank(
        self,
        items: List[RankedItem],
        k: int,
    ) -> Tuple[List[RankedItem], Dict[str, float]]:
        """
        Rerank items to balance objectives.
        
        Returns:
            (reranked_items, objective_scores)
        """
        if not items:
            return [], {}
        
        selected = []
        remaining = list(items)
        
        # Track running objective values
        total_diversity = 0.0
        provider_exposure = {}
        
        for position in range(min(k, len(items))):
            best_item = None
            best_score = float('-inf')
            
            for item in remaining:
                # Compute objective contributions
                scores = self._compute_item_score(
                    item, 
                    selected, 
                    provider_exposure,
                    position,
                    k
                )
                
                # Check hard constraints
                if not self._check_constraints(
                    item, selected, total_diversity, provider_exposure, k
                ):
                    continue
                
                # Weighted combination
                combined = sum(
                    self.weights[obj] * scores[obj]
                    for obj in scores
                )
                
                if combined > best_score:
                    best_score = combined
                    best_item = item
            
            if best_item is None and remaining:
                # Constraints too strict; relax and pick best remaining
                best_item = max(remaining, key=lambda x: x.relevance)
            
            if best_item:
                selected.append(best_item)
                remaining.remove(best_item)
                
                # Update tracking
                provider_exposure[best_item.provider_id] = (
                    provider_exposure.get(best_item.provider_id, 0) + 1
                )
                if len(selected) > 1:
                    total_diversity += self._pairwise_diversity(
                        best_item, selected[:-1]
                    )
        
        # Compute final objective scores
        final_scores = self._compute_final_scores(selected, k)
        
        return selected, final_scores
    
    def _compute_item_score(
        self,
        item: RankedItem,
        selected: List[RankedItem],
        provider_exposure: Dict[str, int],
        position: int,
        k: int,
    ) -> Dict[str, float]:
        """Compute per-objective scores for an item."""
        scores = {}
        
        # Relevance: normalized score
        scores['relevance'] = item.relevance
        
        # Diversity: average distance to selected items
        if selected:
            scores['diversity'] = self._pairwise_diversity(item, selected)
        else:
            scores['diversity'] = 1.0
        
        # Fairness: inverse of provider over-representation
        provider = item.provider_id
        current_count = provider_exposure.get(provider, 0)
        expected_count = position / max(len(provider_exposure), 1)
        
        # Bonus for under-represented providers
        scores['fairness'] = max(0, 1 - current_count / max(expected_count, 1))
        
        return scores
    
    def _pairwise_diversity(
        self,
        item: RankedItem,
        selected: List[RankedItem],
    ) -> float:
        """Compute average cosine distance to selected items."""
        if not selected:
            return 1.0
        
        distances = []
        for s in selected:
            dot = np.dot(item.embedding, s.embedding)
            norm = np.linalg.norm(item.embedding) * np.linalg.norm(s.embedding)
            similarity = dot / max(norm, 1e-8)
            distances.append(1 - similarity)
        
        return np.mean(distances)
    
    def _check_constraints(
        self,
        item: RankedItem,
        selected: List[RankedItem],
        current_diversity: float,
        provider_exposure: Dict[str, int],
        k: int,
    ) -> bool:
        """Check if selecting item would violate hard constraints."""
        n_selected = len(selected)
        
        # Diversity constraint: projected diversity must meet threshold
        if n_selected > 2:
            projected_diversity = (
                current_diversity + self._pairwise_diversity(item, selected)
            ) / n_selected
            
            if projected_diversity < self.diversity_threshold * 0.8:
                return False
        
        # Fairness constraint: no provider should dominate
        provider = item.provider_id
        if provider_exposure.get(provider, 0) >= k * 0.4:
            return False
        
        return True
    
    def _compute_final_scores(
        self,
        selected: List[RankedItem],
        k: int,
    ) -> Dict[str, float]:
        """Compute final objective values for selected set."""
        if not selected:
            return {'relevance': 0, 'diversity': 0, 'fairness': 0}
        
        # Average relevance
        relevance = np.mean([item.relevance for item in selected])
        
        # Average pairwise diversity
        if len(selected) > 1:
            diversities = []
            for i, item1 in enumerate(selected):
                for item2 in selected[i+1:]:
                    dot = np.dot(item1.embedding, item2.embedding)
                    norm = np.linalg.norm(item1.embedding) * np.linalg.norm(item2.embedding)
                    diversities.append(1 - dot / max(norm, 1e-8))
            diversity = np.mean(diversities)
        else:
            diversity = 1.0
        
        # Provider distribution fairness (1 - Gini)
        providers = [item.provider_id for item in selected]
        unique, counts = np.unique(providers, return_counts=True)
        n = len(counts)
        if n > 1:
            sorted_counts = np.sort(counts)
            cumsum = np.cumsum(sorted_counts)
            gini = (n + 1 - 2 * np.sum(cumsum) / cumsum[-1]) / n
        else:
            gini = 1.0
        fairness = 1 - gini
        
        return {
            'relevance': relevance,
            'diversity': diversity,
            'fairness': fairness,
        }

Summary: Beyond Pure Accuracy

We've explored why recommendation quality extends far beyond prediction accuracy. Let's consolidate the key principles:

Key Takeaways

•Diversity sustains engagement — Users tire of homogeneous recommendations; variety maintains long-term interest.
•Fairness has multiple dimensions — User-side (equal quality across groups) and provider-side (fair exposure) both matter.
•MMR and DPP provide diversity — Principled algorithms balance relevance with variety in recommendation lists.
•Exposure fairness enables ecosystem health — Fair treatment of providers keeps two-sided marketplaces healthy.
•Multi-objective optimization is necessary — Real systems balance accuracy, diversity, and fairness simultaneously.
•Constraints + optimization — Use hard constraints for ethical minimums, optimize objectives within those bounds.

Page Complete

You now understand how to build recommendation systems that go beyond pure accuracy to provide diverse, fair experiences. Next, we'll explore the exploration-exploitation trade-off—how to balance showing what users will likely enjoy versus discovering new information about their preferences.

3 / 5

Loading learning content...

Machine LearningRecommendation Systems

Production Considerations

LevelAdvanced

Duration90 mins

TopicRecommendation Systems

3 / 5

Diversity and Fairness

Beyond Accuracy: The Recommendation Quality Problem

Welcome to the diversity and fairness problem—where optimizing for accuracy alone creates deeply flawed systems.

The Problem with Pure Accuracy:

Traditional recommendation metrics optimize for predicting what users will click or rate highly. But this creates several pathologies:

Filter Bubbles: Users see only content confirming existing preferences
Popularity Bias: Popular items dominate; niche content never surfaces
Provider Unfairness: New creators can't compete with established ones
User Unfairness: Certain demographic groups receive worse recommendations
Homogeneity: All recommendations look the same; no serendipity

What You Will Learn

The Case for Diversity

Diversity in recommendations isn't just a "nice to have"—it's essential for long-term user satisfaction and platform health. Let's understand why.

User Experience Benefits:

1. Avoiding Boredom

Even if users love action movies, showing only action movies leads to fatigue. Variety sustains engagement over time.

2. Enabling Discovery

Users can't explicitly request content they don't know exists. Diverse recommendations expose users to items that expand their horizons.

3. Satisfying Multiple Needs

Users are multifaceted. The same user might want comedy on Friday night, documentaries on Sunday morning, and kids' content when their children are present.

4. Building Trust

Homogeneous recommendations feel like manipulation. Diverse recommendations feel like genuine curation.

Types of Diversity in Recommendations
Diversity Type	Definition	Example	Metric
Intra-list	Variety within a single recommendation list	Mixing genres in movie recommendations	Average pairwise distance
Temporal	Variety across sessions over time	Not repeating same items daily	Item repeat rate
Aggregate	Coverage of catalog across all users	Ensuring niche items get exposure	Catalog coverage %
Categorical	Representation across categories	Balancing fiction/non-fiction	Category entropy
Provider	Distribution across content creators	Fair exposure for new creators	Gini coefficient

Platform Health Benefits:

1. Two-Sided Marketplace Dynamics

Platforms like Amazon, Spotify, and YouTube depend on content providers creating quality content. If the algorithm only promotes established winners, new creators leave the platform.

2. Long-tail Economics

The "long tail" of niche content is often highly profitable. Users willing to pay premium prices are often seeking specialized content. Diverse recommendations unlock long-tail revenue.

3. Resilience

Platforms overly dependent on a few popular items are fragile. If those items become unavailable or stale, engagement collapses. Diverse ecosystems are more resilient.

The Accuracy-Diversity Trade-off:

Diversity and accuracy are often in tension:

$$\text{Accuracy} \uparrow \Rightarrow \text{Diversity} \downarrow$$

The key insight: short-term accuracy metrics don't capture long-term user value. A slightly less "accurate" but more diverse system often produces better long-term engagement and satisfaction.

The Exploration Mindset

Algorithms for Diverse Recommendations

Several algorithmic approaches inject diversity into recommendations. The choice depends on your diversity goals and latency constraints.

1. Maximal Marginal Relevance (MMR)

The classic approach: iteratively select items that balance relevance with dissimilarity to already-selected items.

$$MMR(i) = \lambda \cdot \text{Relevance}(i) - (1-\lambda) \cdot \max_{j \in S} \text{Similarity}(i, j)$$

Where $S$ is the set of already-selected items and $\lambda$ controls the relevance-diversity trade-off.

2. Determinantal Point Processes (DPP)

A probabilistic model that favors diverse sets. Items are represented by feature vectors, and the probability of selecting a set is proportional to the determinant of its kernel matrix.

$$P(S) \propto \det(K_S)$$

Determinant is maximized when items are dissimilar (orthogonal vectors).

3. Category-Based Diversification

Simpler approach: ensure minimum representation from different categories.

At least 2 genres in top 10 movies
At least 3 brands in product recommendations
No more than 3 items from same artist

diversity_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
import numpy as np
from typing import List, Dict, Tuple, Set
from dataclasses import dataclass
from scipy.spatial.distance import cosine
 
@dataclass
class ScoredItem:
    """Item with relevance score and features."""
    item_id: str
    relevance_score: float
    embedding: np.ndarray
    category: str = None
 
 
def maximal_marginal_relevance(
    candidates: List[ScoredItem],
    k: int,
    lambda_param: float = 0.5,
) -> List[ScoredItem]:
    """
    Maximal Marginal Relevance diversification.
    
    Iteratively selects items balancing relevance and diversity.
    
    Args:
        candidates: Items with relevance scores and embeddings
        k: Number of items to select
        lambda_param: Trade-off (0=diversity, 1=relevance)
    
    Returns:
        Diversified list of k items
    """
    if not candidates or k <= 0:
        return []
    
    selected: List[ScoredItem] = []
    remaining = list(candidates)
    
    # First item: highest relevance
    remaining.sort(key=lambda x: x.relevance_score, reverse=True)
    selected.append(remaining.pop(0))
    
    # Subsequent items: balance relevance and diversity
    while len(selected) < k and remaining:
        best_score = float('-inf')
        best_idx = 0
        
        for idx, candidate in enumerate(remaining):
            # Relevance term
            relevance = candidate.relevance_score
            
            # Diversity term: max similarity to selected items
            max_similarity = max(
                1 - cosine(candidate.embedding, s.embedding)
                for s in selected
            )
            
            # MMR score
            mmr_score = (
                lambda_param * relevance - 
                (1 - lambda_param) * max_similarity
            )
            
            if mmr_score > best_score:
                best_score = mmr_score
                best_idx = idx
        
        selected.append(remaining.pop(best_idx))
    
    return selected
 
 
def category_constrained_reranking(
    candidates: List[ScoredItem],
    k: int,
    max_per_category: int = 3,
    min_categories: int = 3,
) -> List[ScoredItem]:
    """
    Rerank to ensure category diversity.
    
    Enforces constraints on category distribution.
    
    Args:
        candidates: Items sorted by relevance
        k: Number of items to select
        max_per_category: Max items from same category
        min_categories: Minimum different categories
    """
    selected: List[ScoredItem] = []
    category_counts: Dict[str, int] = {}
    
    # First pass: greedily select respecting max constraint
    for candidate in sorted(candidates, 
                            key=lambda x: x.relevance_score, 
                            reverse=True):
        cat = candidate.category or 'unknown'
        
        if category_counts.get(cat, 0) < max_per_category:
            selected.append(candidate)
            category_counts[cat] = category_counts.get(cat, 0) + 1
            
            if len(selected) >= k:
                break
    
    # Check minimum categories constraint
    if len(category_counts) < min_categories:
        # Need to swap some items to add more categories
        missing_count = min_categories - len(category_counts)
        seen_categories = set(category_counts.keys())
        
        # Find items from new categories
        new_cat_items = [
            c for c in candidates 
            if c.category not in seen_categories and c not in selected
        ]
        
        if new_cat_items and len(selected) > missing_count:
            # Remove lowest-scored items with duplicated categories
            for _ in range(min(missing_count, len(new_cat_items))):
                # Find removable item (category has > 1 item)
                for i in range(len(selected) - 1, -1, -1):
                    cat = selected[i].category
                    if category_counts.get(cat, 0) > 1:
                        category_counts[cat] -= 1
                        selected.pop(i)
                        break
                
                # Add item from new category
                if new_cat_items:
                    new_item = new_cat_items.pop(0)
                    selected.append(new_item)
                    category_counts[new_item.category] = 1
    
    return selected
 
 
class DPPDiversifier:
    """
    Determinantal Point Process for diverse subset selection.
    
    Uses greedy approximation for computational efficiency.
    """
    
    def __init__(self, relevance_weight: float = 0.5):
        self.relevance_weight = relevance_weight
    
    def select(
        self,
        candidates: List[ScoredItem],
        k: int,
    ) -> List[ScoredItem]:
        """
        Select diverse subset using greedy DPP.
        
        Builds kernel matrix from item embeddings and relevance scores,
        then greedily selects items to maximize determinant.
        """
        n = len(candidates)
        if n == 0 or k <= 0:
            return []
        
        # Build kernel matrix: K_ij = relevance_i * similarity_ij * relevance_j
        embeddings = np.array([c.embedding for c in candidates])
        relevances = np.array([c.relevance_score for c in candidates])
        
        # Normalize embeddings
        norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
        norms[norms == 0] = 1
        embeddings_normalized = embeddings / norms
        
        # Similarity matrix
        similarity = embeddings_normalized @ embeddings_normalized.T
        
        # Scale by relevance
        relevance_matrix = np.outer(
            relevances ** self.relevance_weight,
            relevances ** self.relevance_weight
        )
        kernel = similarity * relevance_matrix
        
        # Greedy selection
        selected_indices = self._greedy_select(kernel, k)
        
        return [candidates[i] for i in selected_indices]
    
    def _greedy_select(
        self,
        kernel: np.ndarray,
        k: int,
    ) -> List[int]:
        """Greedy algorithm to approximately maximize det(K_S)."""
        n = kernel.shape[0]
        selected = []
        remaining = set(range(n))
        
        # First item: highest diagonal (relevance)
        first = np.argmax(np.diag(kernel))
        selected.append(first)
        remaining.remove(first)
        
        # Build submatrix incrementally
        for _ in range(k - 1):
            if not remaining:
                break
            
            best_gain = float('-inf')
            best_idx = None
            
            for idx in remaining:
                # Compute gain from adding this item
                # (simplified: use diagonal minus correlations)
                gain = kernel[idx, idx] - np.sum([
                    kernel[idx, s] ** 2 / kernel[s, s] 
                    for s in selected
                ])
                
                if gain > best_gain:
                    best_gain = gain
                    best_idx = idx
            
            if best_idx is not None:
                selected.append(best_idx)
                remaining.remove(best_idx)
        
        return selected
 
 
def compute_diversity_metrics(
    recommendations: List[ScoredItem],
) -> Dict[str, float]:
    """
    Compute diversity metrics for a recommendation list.
    """
    if len(recommendations) < 2:
        return {'avg_pairwise_distance': 0, 'category_entropy': 0}
    
    # Average pairwise distance
    embeddings = [r.embedding for r in recommendations]
    distances = []
    for i in range(len(embeddings)):
        for j in range(i + 1, len(embeddings)):
            dist = cosine(embeddings[i], embeddings[j])
            distances.append(dist)
    avg_distance = np.mean(distances) if distances else 0
    
    # Category entropy
    categories = [r.category for r in recommendations if r.category]
    if categories:
        unique, counts = np.unique(categories, return_counts=True)
        probs = counts / len(categories)
        entropy = -np.sum(probs * np.log2(probs + 1e-10))
    else:
        entropy = 0
    
    # Coverage (would need catalog size)
    unique_items = len(set(r.item_id for r in recommendations))
    
    return {
        'avg_pairwise_distance': avg_distance,
        'category_entropy': entropy,
        'unique_items': unique_items,
        'num_categories': len(set(categories)) if categories else 0,
    }

Fairness in Recommendation Systems

Fairness in recommendations has multiple dimensions, each with different stakeholders and concerns.

User-Side Fairness (Consumer Fairness)

Do all users receive equally good recommendations?

Demographic parity: Recommendation quality shouldn't vary by protected attributes (gender, race, age)
Cold-start fairness: New users shouldn't receive drastically worse recommendations
Majority/minority fairness: Users with niche preferences shouldn't be neglected

Provider-Side Fairness (Producer Fairness)

Do all content providers get fair exposure?

Exposure equity: New creators should have opportunity to be discovered
Proportional representation: Exposure should roughly match quality
Avoiding winner-take-all: Prevent monopolization by top providers

Fairness Metrics for Recommendations
Metric	Stakeholder	Definition	Formula
Quality Parity	Users	Equal accuracy across groups	NDCG(group_A) ≈ NDCG(group_B)
Exposure Fairness	Providers	Fair visibility distribution	Gini(exposure) → 0
Demographic Parity	Users	Equal treatment regardless of attributes	P(rec\|A) = P(rec\|B)
Equal Opportunity	Users/Providers	Equal true positive rates	TPR(A) = TPR(B)
Calibration	Users	Recommendations reflect true preferences	P(like\|rec, group) consistent

The Fairness-Accuracy Trade-off:

Enforcing fairness constraints typically reduces measured accuracy, because you're preventing the model from fully exploiting patterns that may correlate with protected attributes.

$$\max_{\theta} \text{Accuracy}(\theta) \text{ subject to } \text{Fairness Constraint}$$

Common Fairness Approaches:

1. Pre-processing

Modify training data to remove bias:

Balance training samples across groups
Remove or decorrelate sensitive attributes
Augment underrepresented groups

2. In-processing

Add fairness constraints to training:

Regularization terms penalizing unfair outcomes
Adversarial training to remove sensitive information
Constrained optimization with fairness constraints

3. Post-processing

Adjust outputs after model prediction:

Re-rank to achieve exposure targets
Calibrate scores across groups
Apply fairness quotas

fairness_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
import numpy as np
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
from collections import defaultdict
 
@dataclass
class Item:
    """Item with provider information."""
    item_id: str
    provider_id: str
    score: float
    category: str = None
 
 
@dataclass
class FairnessMetrics:
    """Container for fairness measurements."""
    gini_coefficient: float
    entropy: float
    min_exposure: float
    max_exposure: float
    provider_coverage: float
 
 
class ExposureFairnessReranker:
    """
    Rerank recommendations to improve provider exposure fairness.
    
    Uses constrained optimization to balance relevance with
    fair distribution of exposure across providers.
    """
    
    def __init__(
        self,
        target_distribution: Optional[Dict[str, float]] = None,
        fairness_weight: float = 0.3,
    ):
        """
        Args:
            target_distribution: Desired exposure share per provider.
                                If None, uses uniform distribution.
            fairness_weight: Trade-off between relevance and fairness.
        """
        self.target_distribution = target_distribution
        self.fairness_weight = fairness_weight
    
    def rerank(
        self,
        items: List[Item],
        k: int,
        position_bias: Optional[List[float]] = None,
    ) -> List[Item]:
        """
        Rerank items to improve exposure fairness.
        
        Uses greedy algorithm to approximately maximize:
        relevance - fairness_weight * exposure_deviation
        
        Args:
            items: Scored items to rerank
            k: Number of items to return
            position_bias: Expected clicks by position (e.g., [1, 0.5, 0.3, ...])
        """
        if not items:
            return []
        
        if position_bias is None:
            # Default: logarithmic decay
            position_bias = [1.0 / np.log2(i + 2) for i in range(k)]
        
        # Compute target exposure per provider
        providers = set(item.provider_id for item in items)
        if self.target_distribution:
            target = self.target_distribution
        else:
            # Uniform distribution
            target = {p: 1.0 / len(providers) for p in providers}
        
        # Greedy selection
        selected = []
        remaining = list(items)
        current_exposure = defaultdict(float)
        total_exposure = sum(position_bias[:k])
        
        for position in range(min(k, len(items))):
            exposure_at_position = position_bias[position]
            best_item = None
            best_score = float('-inf')
            
            for item in remaining:
                # Relevance component
                relevance = item.score
                
                # Fairness component: how much does this help fairness?
                provider = item.provider_id
                current_share = current_exposure[provider] / max(total_exposure, 1)
                target_share = target.get(provider, 0)
                
                # Bonus for under-represented providers
                fairness_bonus = target_share - current_share
                
                # Combined score
                combined = (
                    (1 - self.fairness_weight) * relevance +
                    self.fairness_weight * fairness_bonus
                )
                
                if combined > best_score:
                    best_score = combined
                    best_item = item
            
            if best_item:
                selected.append(best_item)
                remaining.remove(best_item)
                current_exposure[best_item.provider_id] += exposure_at_position
        
        return selected
    
    def compute_metrics(
        self,
        recommendations: List[Item],
        k: int,
    ) -> FairnessMetrics:
        """Compute exposure fairness metrics."""
        if not recommendations:
            return FairnessMetrics(0, 0, 0, 0, 0)
        
        # Compute exposure per provider
        position_bias = [1.0 / np.log2(i + 2) for i in range(k)]
        exposure = defaultdict(float)
        
        for pos, item in enumerate(recommendations[:k]):
            exposure[item.provider_id] += position_bias[pos]
        
        exposures = list(exposure.values())
        n_providers = len(exposures)
        
        if n_providers == 0:
            return FairnessMetrics(0, 0, 0, 0, 0)
        
        # Gini coefficient
        sorted_exp = sorted(exposures)
        n = len(sorted_exp)
        cumsum = np.cumsum(sorted_exp)
        gini = (n + 1 - 2 * np.sum(cumsum) / cumsum[-1]) / n if cumsum[-1] > 0 else 0
        
        # Entropy
        total_exp = sum(exposures)
        probs = [e / total_exp for e in exposures] if total_exp > 0 else []
        entropy = -sum(p * np.log2(p + 1e-10) for p in probs) if probs else 0
        
        return FairnessMetrics(
            gini_coefficient=gini,
            entropy=entropy,
            min_exposure=min(exposures) if exposures else 0,
            max_exposure=max(exposures) if exposures else 0,
            provider_coverage=n_providers / len(set(i.provider_id for i in recommendations))
        )
 
 
class UserGroupFairnessChecker:
    """
    Check fairness of recommendations across user groups.
    """
    
    def __init__(self, protected_attribute: str):
        """
        Args:
            protected_attribute: Name of protected attribute (e.g., 'gender', 'age_group')
        """
        self.protected_attribute = protected_attribute
    
    def compute_quality_parity(
        self,
        user_groups: Dict[str, List[str]],  # group -> user_ids
        user_metrics: Dict[str, float],  # user_id -> quality metric (e.g., NDCG)
    ) -> Dict[str, float]:
        """
        Compute recommendation quality per group.
        
        Returns dict of group -> average quality.
        """
        group_quality = {}
        
        for group, users in user_groups.items():
            qualities = [
                user_metrics[u] 
                for u in users 
                if u in user_metrics
            ]
            if qualities:
                group_quality[group] = np.mean(qualities)
            else:
                group_quality[group] = 0.0
        
        return group_quality
    
    def compute_disparity(
        self,
        group_metrics: Dict[str, float],
    ) -> Dict[str, float]:
        """
        Compute fairness disparity metrics.
        """
        values = list(group_metrics.values())
        
        if len(values) < 2:
            return {'disparity_ratio': 1.0, 'absolute_gap': 0.0}
        
        max_val = max(values)
        min_val = min(values)
        
        return {
            'disparity_ratio': min_val / max_val if max_val > 0 else 1.0,
            'absolute_gap': max_val - min_val,
            'max_group': max(group_metrics.items(), key=lambda x: x[1])[0],
            'min_group': min(group_metrics.items(), key=lambda x: x[1])[0],
        }
 
 
class FairRankingOptimizer:
    """
    Optimize ranking subject to fairness constraints using
    linear programming relaxation.
    """
    
    def __init__(
        self,
        min_exposure_per_provider: float = 0.05,
        max_exposure_per_provider: float = 0.30,
    ):
        self.min_exposure = min_exposure_per_provider
        self.max_exposure = max_exposure_per_provider
    
    def optimize(
        self,
        scores: np.ndarray,  # (n_items,) relevance scores
        provider_ids: List[str],  # provider for each item
        k: int,
    ) -> List[int]:
        """
        Find optimal ranking subject to exposure constraints.
        
        This is a simplified greedy approach.
        For true optimization, use LP solvers.
        """
        n = len(scores)
        providers = list(set(provider_ids))
        n_providers = len(providers)
        
        # Position importance weights
        position_weights = np.array([1.0 / np.log2(i + 2) for i in range(k)])
        total_exposure = position_weights.sum()
        
        # Target exposure bounds per provider
        min_exp = self.min_exposure * total_exposure
        max_exp = self.max_exposure * total_exposure
        
        # Greedy selection with constraints
        selected = []
        provider_exposure = defaultdict(float)
        available = set(range(n))
        
        for pos in range(min(k, n)):
            weight = position_weights[pos]
            best_idx = None
            best_score = float('-inf')
            
            for idx in available:
                provider = provider_ids[idx]
                
                # Check if adding would exceed max
                if provider_exposure[provider] + weight > max_exp:
                    continue
                
                if scores[idx] > best_score:
                    best_score = scores[idx]
                    best_idx = idx
            
            if best_idx is None:
                # All providers at max; select highest remaining
                if available:
                    best_idx = max(available, key=lambda i: scores[i])
            
            if best_idx is not None:
                selected.append(best_idx)
                available.remove(best_idx)
                provider_exposure[provider_ids[best_idx]] += weight
        
        return selected

Balancing Accuracy, Diversity, and Fairness

Production systems must balance multiple, often competing objectives simultaneously. This requires a principled framework for multi-objective optimization.

The Multi-Objective Problem:

$$\max_{\theta} \left( \text{Accuracy}(\theta), \text{Diversity}(\theta), \text{Fairness}(\theta) \right)$$

No single solution optimizes all objectives. Instead, we seek Pareto-optimal solutions: points where improving one objective requires degrading another.

Approaches to Multi-Objective Optimization:

1. Weighted Sum

Combine objectives into a single scalar:

$$L = \alpha \cdot L_{\text{accuracy}} + \beta \cdot L_{\text{diversity}} + \gamma \cdot L_{\text{fairness}}$$

Simple but requires choosing weights a priori.

2. Constrained Optimization

Optimize one objective subject to constraints on others:

$$\max \text{Accuracy} \text{ s.t. } \text{Diversity} \geq d_{\min}, \text{ Fairness} \geq f_{\min}$$

3. Pareto Frontier Exploration

Find multiple solutions along the Pareto frontier, then choose based on business priorities.

Production Best Practices

•Define objective hierarchy — Which objectives are constraints (must satisfy) vs optimization targets (want to maximize)?
•Set minimum thresholds — Fairness and diversity often have hard minimums (legal, ethical, or business requirements).
•A/B test trade-offs — Test different balance points to find what users actually prefer, not what metrics predict.
•Monitor all objectives — Even if not actively optimizing, track diversity and fairness metrics to catch regressions.
•Context-dependent weights — Different users/pages may warrant different objective balances.
•Regular calibration — Re-evaluate weights as user base, catalog, and business priorities evolve.

multi_objective_ranker.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
import numpy as np
from typing import List, Dict, Callable, Tuple
from dataclasses import dataclass
 
@dataclass
class RankedItem:
    """Item with multiple objective scores."""
    item_id: str
    relevance: float
    diversity_contribution: float
    fairness_contribution: float
    provider_id: str
    embedding: np.ndarray
 
 
class MultiObjectiveReranker:
    """
    Rerank items balancing multiple objectives.
    
    Uses configurable objective weights with support for
    hard constraints.
    """
    
    def __init__(
        self,
        relevance_weight: float = 0.6,
        diversity_weight: float = 0.25,
        fairness_weight: float = 0.15,
        diversity_min_threshold: float = 0.3,
        fairness_min_threshold: float = 0.4,
    ):
        self.weights = {
            'relevance': relevance_weight,
            'diversity': diversity_weight,
            'fairness': fairness_weight,
        }
        self.diversity_threshold = diversity_min_threshold
        self.fairness_threshold = fairness_min_threshold
    
    def rerank(
        self,
        items: List[RankedItem],
        k: int,
    ) -> Tuple[List[RankedItem], Dict[str, float]]:
        """
        Rerank items to balance objectives.
        
        Returns:
            (reranked_items, objective_scores)
        """
        if not items:
            return [], {}
        
        selected = []
        remaining = list(items)
        
        # Track running objective values
        total_diversity = 0.0
        provider_exposure = {}
        
        for position in range(min(k, len(items))):
            best_item = None
            best_score = float('-inf')
            
            for item in remaining:
                # Compute objective contributions
                scores = self._compute_item_score(
                    item, 
                    selected, 
                    provider_exposure,
                    position,
                    k
                )
                
                # Check hard constraints
                if not self._check_constraints(
                    item, selected, total_diversity, provider_exposure, k
                ):
                    continue
                
                # Weighted combination
                combined = sum(
                    self.weights[obj] * scores[obj]
                    for obj in scores
                )
                
                if combined > best_score:
                    best_score = combined
                    best_item = item
            
            if best_item is None and remaining:
                # Constraints too strict; relax and pick best remaining
                best_item = max(remaining, key=lambda x: x.relevance)
            
            if best_item:
                selected.append(best_item)
                remaining.remove(best_item)
                
                # Update tracking
                provider_exposure[best_item.provider_id] = (
                    provider_exposure.get(best_item.provider_id, 0) + 1
                )
                if len(selected) > 1:
                    total_diversity += self._pairwise_diversity(
                        best_item, selected[:-1]
                    )
        
        # Compute final objective scores
        final_scores = self._compute_final_scores(selected, k)
        
        return selected, final_scores
    
    def _compute_item_score(
        self,
        item: RankedItem,
        selected: List[RankedItem],
        provider_exposure: Dict[str, int],
        position: int,
        k: int,
    ) -> Dict[str, float]:
        """Compute per-objective scores for an item."""
        scores = {}
        
        # Relevance: normalized score
        scores['relevance'] = item.relevance
        
        # Diversity: average distance to selected items
        if selected:
            scores['diversity'] = self._pairwise_diversity(item, selected)
        else:
            scores['diversity'] = 1.0
        
        # Fairness: inverse of provider over-representation
        provider = item.provider_id
        current_count = provider_exposure.get(provider, 0)
        expected_count = position / max(len(provider_exposure), 1)
        
        # Bonus for under-represented providers
        scores['fairness'] = max(0, 1 - current_count / max(expected_count, 1))
        
        return scores
    
    def _pairwise_diversity(
        self,
        item: RankedItem,
        selected: List[RankedItem],
    ) -> float:
        """Compute average cosine distance to selected items."""
        if not selected:
            return 1.0
        
        distances = []
        for s in selected:
            dot = np.dot(item.embedding, s.embedding)
            norm = np.linalg.norm(item.embedding) * np.linalg.norm(s.embedding)
            similarity = dot / max(norm, 1e-8)
            distances.append(1 - similarity)
        
        return np.mean(distances)
    
    def _check_constraints(
        self,
        item: RankedItem,
        selected: List[RankedItem],
        current_diversity: float,
        provider_exposure: Dict[str, int],
        k: int,
    ) -> bool:
        """Check if selecting item would violate hard constraints."""
        n_selected = len(selected)
        
        # Diversity constraint: projected diversity must meet threshold
        if n_selected > 2:
            projected_diversity = (
                current_diversity + self._pairwise_diversity(item, selected)
            ) / n_selected
            
            if projected_diversity < self.diversity_threshold * 0.8:
                return False
        
        # Fairness constraint: no provider should dominate
        provider = item.provider_id
        if provider_exposure.get(provider, 0) >= k * 0.4:
            return False
        
        return True
    
    def _compute_final_scores(
        self,
        selected: List[RankedItem],
        k: int,
    ) -> Dict[str, float]:
        """Compute final objective values for selected set."""
        if not selected:
            return {'relevance': 0, 'diversity': 0, 'fairness': 0}
        
        # Average relevance
        relevance = np.mean([item.relevance for item in selected])
        
        # Average pairwise diversity
        if len(selected) > 1:
            diversities = []
            for i, item1 in enumerate(selected):
                for item2 in selected[i+1:]:
                    dot = np.dot(item1.embedding, item2.embedding)
                    norm = np.linalg.norm(item1.embedding) * np.linalg.norm(item2.embedding)
                    diversities.append(1 - dot / max(norm, 1e-8))
            diversity = np.mean(diversities)
        else:
            diversity = 1.0
        
        # Provider distribution fairness (1 - Gini)
        providers = [item.provider_id for item in selected]
        unique, counts = np.unique(providers, return_counts=True)
        n = len(counts)
        if n > 1:
            sorted_counts = np.sort(counts)
            cumsum = np.cumsum(sorted_counts)
            gini = (n + 1 - 2 * np.sum(cumsum) / cumsum[-1]) / n
        else:
            gini = 1.0
        fairness = 1 - gini
        
        return {
            'relevance': relevance,
            'diversity': diversity,
            'fairness': fairness,
        }

Summary: Beyond Pure Accuracy

We've explored why recommendation quality extends far beyond prediction accuracy. Let's consolidate the key principles:

Key Takeaways

•Diversity sustains engagement — Users tire of homogeneous recommendations; variety maintains long-term interest.
•Fairness has multiple dimensions — User-side (equal quality across groups) and provider-side (fair exposure) both matter.
•MMR and DPP provide diversity — Principled algorithms balance relevance with variety in recommendation lists.
•Exposure fairness enables ecosystem health — Fair treatment of providers keeps two-sided marketplaces healthy.
•Multi-objective optimization is necessary — Real systems balance accuracy, diversity, and fairness simultaneously.
•Constraints + optimization — Use hard constraints for ethical minimums, optimize objectives within those bounds.

Page Complete

3 / 5