Loading learning content...
Real-world ML deployments rarely optimize a single objective. Consider a recommendation system that must simultaneously:
These objectives often conflict. More diversity may reduce relevance. Lower latency may require simpler models with worse accuracy.
Multi-objective evaluation provides frameworks for understanding, comparing, and selecting models when multiple objectives matter—even when they can't all be maximized simultaneously.
By the end of this page, you will understand Pareto optimality and dominance, apply scalarization techniques to combine objectives, construct and interpret Pareto frontiers, and make principled decisions when objectives conflict.
Common Objective Conflicts in ML:
Mathematical Representation:
Let $f_1(\theta), f_2(\theta), ..., f_k(\theta)$ be k objective functions over model/threshold space $\theta$. Multi-objective optimization seeks:
$$\max_{\theta} {f_1(\theta), f_2(\theta), ..., f_k(\theta)}$$
When objectives conflict, no single $\theta$ maximizes all objectives simultaneously.
| Model | Accuracy | Inference Time | Interpretability |
|---|---|---|---|
| Logistic Regression | 78% | 1ms | High |
| Random Forest | 85% | 15ms | Medium |
| Deep Neural Network | 89% | 50ms | Low |
No model dominates on all three objectives. The "best" model depends on how you weight accuracy vs. speed vs. explainability—which is a business decision, not a technical one.
Pareto Dominance:
Solution A dominates solution B if:
Pareto Optimality:
A solution is Pareto optimal (or Pareto efficient) if no other solution dominates it. The set of all Pareto optimal solutions forms the Pareto frontier.
Key Insight:
Any model on the Pareto frontier represents a valid trade-off—improving any objective requires sacrificing another. Models not on the frontier are strictly inferior: you could do better on at least one metric without losing on others.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
import numpy as np def is_pareto_dominated(point, other_points, maximize=True): """ Check if a point is dominated by any other point. For maximization: dominated if another point is >= on all and > on at least one. """ for other in other_points: if np.array_equal(point, other): continue if maximize: if all(other >= point) and any(other > point): return True else: if all(other <= point) and any(other < point): return True return False def find_pareto_frontier(points, maximize=True): """ Find Pareto frontier from a set of points. Parameters ---------- points : array-like, shape (n_points, n_objectives) Each row is a solution, each column is an objective value. maximize : bool or list of bool Whether to maximize each objective. Returns ------- frontier_indices : list Indices of Pareto-optimal solutions. """ points = np.array(points) n_points = len(points) frontier_indices = [] for i in range(n_points): if not is_pareto_dominated(points[i], points, maximize): frontier_indices.append(i) return frontier_indices # Example: Model selection with accuracy vs latencymodels = [ {'name': 'Linear', 'accuracy': 0.78, 'latency_ms': 2}, {'name': 'RF-small', 'accuracy': 0.82, 'latency_ms': 8}, {'name': 'RF-large', 'accuracy': 0.85, 'latency_ms': 25}, {'name': 'XGBoost', 'accuracy': 0.86, 'latency_ms': 15}, {'name': 'DNN-small', 'accuracy': 0.84, 'latency_ms': 20}, {'name': 'DNN-large', 'accuracy': 0.89, 'latency_ms': 60}, {'name': 'Ensemble', 'accuracy': 0.90, 'latency_ms': 100},] # For Pareto: maximize accuracy, minimize latency# Convert to: maximize accuracy, maximize -latencypoints = np.array([[m['accuracy'], -m['latency_ms']] for m in models]) frontier_idx = find_pareto_frontier(points, maximize=True) print("Model Comparison: Accuracy vs Latency")print("=" * 55)print(f"{'Model':<12} {'Accuracy':>10} {'Latency':>10} {'Pareto':>10}")print("-" * 55) for i, m in enumerate(models): pareto = "✓ Optimal" if i in frontier_idx else "" print(f"{m['name']:<12} {m['accuracy']:>10.2%} {m['latency_ms']:>8}ms {pareto:>10}") print(f"\nPareto frontier contains {len(frontier_idx)} models")The Pareto frontier shows the achievable trade-offs. Moving along the frontier, you trade objective performance: more accuracy requires accepting higher latency. The frontier shape reveals the cost of trade-offs—steep regions mean expensive trade-offs; flat regions mean cheap gains.
Scalarization converts multi-objective optimization into single-objective optimization by combining objectives into a scalar value.
1. Weighted Sum (Linear Scalarization):
$$S_{\text{linear}} = \sum_{i=1}^{k} w_i \cdot f_i(\theta)$$
where $w_i \geq 0$ and $\sum w_i = 1$.
Pros: Simple, interpretable weights Cons: Cannot find solutions in non-convex regions of Pareto frontier
2. Weighted Product (Geometric Scalarization):
$$S_{\text{product}} = \prod_{i=1}^{k} f_i(\theta)^{w_i}$$
Pros: Penalizes solutions that are very poor on any objective Cons: Requires all objectives to be positive
3. Chebyshev (Min-Max) Scalarization:
$$S_{\text{cheby}} = \min_i \left( w_i \cdot |f_i(\theta) - f_i^*| \right)$$
where $f_i^*$ is the ideal value for objective $i$.
Pros: Can find any Pareto-optimal solution Cons: Requires knowing ideal values
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
import numpy as np def linear_scalarization(objectives, weights): """Weighted sum of objectives.""" return np.dot(objectives, weights) def product_scalarization(objectives, weights): """Weighted product (geometric mean) of objectives.""" return np.prod(np.power(objectives, weights)) def chebyshev_scalarization(objectives, ideal, weights): """ Chebyshev scalarization (minimize worst deviation from ideal). Returns negative to convert to maximization. """ deviations = weights * np.abs(ideal - objectives) return -np.max(deviations) # Negative because we want to minimize def rank_models_by_scalarization(models, weights, method='linear'): """Rank models using specified scalarization method.""" scores = [] for m in models: objectives = np.array([m['accuracy'], 1.0 / m['latency_ms']]) # Normalize latency if method == 'linear': score = linear_scalarization(objectives, weights) elif method == 'product': score = product_scalarization(objectives, weights) elif method == 'chebyshev': ideal = np.array([1.0, 1.0]) # Perfect accuracy, instant latency score = chebyshev_scalarization(objectives, ideal, weights) scores.append((m['name'], score)) return sorted(scores, key=lambda x: -x[1]) models = [ {'name': 'Linear', 'accuracy': 0.78, 'latency_ms': 2}, {'name': 'XGBoost', 'accuracy': 0.86, 'latency_ms': 15}, {'name': 'DNN-large', 'accuracy': 0.89, 'latency_ms': 60},] # Different weight scenariosscenarios = [ ("Accuracy-focused", [0.9, 0.1]), ("Balanced", [0.5, 0.5]), ("Latency-focused", [0.1, 0.9]),] print("Model Rankings by Scalarization")print("=" * 50) for scenario_name, weights in scenarios: print(f"\n{scenario_name} (weights: {weights}):") rankings = rank_models_by_scalarization(models, weights, 'linear') for rank, (name, score) in enumerate(rankings, 1): print(f" {rank}. {name}: {score:.4f}")When facing multi-objective decisions, several practical strategies help navigate trade-offs:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
def select_with_constraints(models, primary_obj, constraints): """ Select best model on primary objective subject to constraints. Parameters ---------- models : list of dicts Each dict has objective values as keys. primary_obj : str Key of objective to maximize. constraints : dict {objective: (op, value)} where op is 'min' or 'max'. Returns ------- Selected model or None if no feasible model. """ feasible = [] for m in models: satisfies = True for obj, (op, threshold) in constraints.items(): if op == 'min' and m[obj] < threshold: satisfies = False elif op == 'max' and m[obj] > threshold: satisfies = False if satisfies: feasible.append(m) if not feasible: return None return max(feasible, key=lambda x: x[primary_obj]) # Examplemodels = [ {'name': 'A', 'accuracy': 0.78, 'latency_ms': 5, 'fairness_gap': 0.02}, {'name': 'B', 'accuracy': 0.85, 'latency_ms': 25, 'fairness_gap': 0.04}, {'name': 'C', 'accuracy': 0.88, 'latency_ms': 50, 'fairness_gap': 0.08}, {'name': 'D', 'accuracy': 0.84, 'latency_ms': 15, 'fairness_gap': 0.03},] # Maximize accuracy with constraintsconstraints = { 'latency_ms': ('max', 30), # Latency must be <= 30ms 'fairness_gap': ('max', 0.05), # Fairness gap must be <= 5%} selected = select_with_constraints(models, 'accuracy', constraints) print("Constraint-Based Selection")print(f"Primary: Maximize accuracy")print(f"Constraints: latency <= 30ms, fairness_gap <= 5%")print(f"\nSelected: {selected['name'] if selected else 'No feasible solution'}")if selected: print(f" Accuracy: {selected['accuracy']:.2%}") print(f" Latency: {selected['latency_ms']}ms") print(f" Fairness gap: {selected['fairness_gap']:.1%}")Beyond evaluation, multi-objective thinking affects model development:
Generating Diverse Solutions:
Instead of training one model, generate a portfolio of models along the Pareto frontier:
Multi-Objective Hyperparameter Tuning:
Modern HPO tools support multi-objective optimization:
A single probabilistic classifier generates an entire Pareto frontier by varying its decision threshold. This is cheaper than training multiple models. The precision-recall curve IS a Pareto frontier for precision vs. recall objectives.
Fairness as an Objective:
Fairness constraints are naturally represented as additional objectives:
The Pareto frontier reveals the accuracy-fairness trade-off explicitly, enabling informed decisions about acceptable trade-offs.
You now understand how to evaluate and select models when multiple objectives matter. Next, we'll explore metric selection strategy—how to choose the right metrics for your specific problem context.