Loading content...
In production machine learning systems, one of the most critical operational challenges is detecting when a deployed model's behavior begins to deviate from its expected performance. This phenomenon, known as model drift or prediction drift, occurs when the statistical properties of model outputs change over time, often signaling degraded model quality, changing user behavior, or shifts in the underlying data distribution.
Effective MLOps (Machine Learning Operations) practices require continuous monitoring of model predictions to ensure reliability and enable timely model retraining or intervention. This problem focuses on implementing a comprehensive prediction distribution monitoring system.
Given two sequences of prediction scores—a reference set representing baseline model behavior (e.g., from validation or a stable production period) and a current set representing recent model outputs—compute key statistical metrics that quantify distribution differences and detect potential drift.
$$\text{Mean Shift} = \mu_{current} - \mu_{reference}$$
$$\text{Std Ratio} = \frac{\sigma_{current}}{\sigma_{reference}}$$
For the Jensen-Shannon divergence calculation:
$$JSD(P | Q) = \frac{1}{2} \cdot KL(P | M) + \frac{1}{2} \cdot KL(Q | M)$$
where $$M = \frac{1}{2}(P + Q)$$
True if the Jensen-Shannon divergence exceeds the significance threshold of 0.1, signaling that the prediction distributions have diverged meaningfully.Implement a function that accepts two lists of prediction probabilities and a bin count, then returns a dictionary containing all four monitoring metrics.
reference_preds = [0.1, 0.2, 0.3, 0.4, 0.5]
current_preds = [0.5, 0.6, 0.7, 0.8, 0.9]
n_bins = 5{'mean_shift': 0.4, 'std_ratio': 1.0, 'js_divergence': 0.0693, 'drift_detected': False}The reference predictions have mean 0.3 and the current predictions have mean 0.7, yielding a mean_shift of 0.4 (a significant upward shift in prediction scores). Both distributions have identical spread (σ ≈ 0.1414), so std_ratio equals 1.0. The Jensen-Shannon divergence of 0.0693 (with Laplace smoothing applied) falls below the 0.1 threshold, so drift_detected is False. Despite the mean shift, the relatively low JSD indicates the distributions overlap sufficiently when considering the histogram representation with smoothing.
reference_preds = [0.2, 0.3, 0.4, 0.5, 0.6]
current_preds = [0.25, 0.35, 0.45, 0.55, 0.65]
n_bins = 5{'mean_shift': 0.05, 'std_ratio': 1.0, 'js_divergence': 0.0121, 'drift_detected': False}The current predictions are shifted only slightly (by 0.05) compared to the reference. Both sets maintain the same standard deviation, producing std_ratio = 1.0. The very low JS divergence of 0.0121 confirms that the distributions are nearly identical—no drift concern is flagged.
reference_preds = [0.1, 0.3, 0.5, 0.7, 0.9]
current_preds = [0.1, 0.3, 0.5, 0.7, 0.9]
n_bins = 5{'mean_shift': 0.0, 'std_ratio': 1.0, 'js_divergence': 0.0, 'drift_detected': False}When the reference and current prediction sets are identical, all drift metrics reflect perfect alignment: zero mean shift, unit standard deviation ratio, and zero JS divergence. This represents the ideal baseline scenario where model behavior is completely stable.
Constraints