Loading problem...
In machine learning operations (MLOps) and production data systems, monitoring the stability of feature distributions is paramount for ensuring that deployed models continue to perform as expected. When the statistical properties of incoming data deviate significantly from the data used during model training, this phenomenon—known as distribution shift or data drift—can lead to degraded model accuracy, biased predictions, and unreliable outputs.
The Stability Divergence Index (SDI), also referred to as the Population Stability Index (PSI), is a widely adopted quantitative metric for measuring the degree to which a distribution has changed over time. It provides a single scalar value that captures the magnitude of distributional divergence between two datasets.
Given two distributions represented as sample sets:
The SDI is calculated through the following procedure:
Create n equal-width buckets (bins) spanning the combined range of both distributions:
min_val = min(min(baseline), min(current))max_val = max(max(baseline), max(current))width = (max_val - min_val) / nFor each bucket, calculate the proportion of samples from each distribution that fall within that bucket:
baseline_proportion[i] = (count of baseline samples in bucket i) / (total baseline samples)current_proportion[i] = (count of current samples in bucket i) / (total current samples)To avoid numerical instability with logarithms, replace any zero proportion with a small epsilon value (ε = 0.0001).
$$SDI = \sum_{i=1}^{n} (current_proportion_i - baseline_proportion_i) \times \ln\left(\frac{current_proportion_i}{baseline_proportion_i}\right)$$
The SDI value indicates the severity of distribution shift:
| SDI Value | Interpretation | Action Required |
|---|---|---|
| < 0.1 | No significant shift | Continue monitoring |
| 0.1 – 0.25 | Moderate shift | Investigate and monitor closely |
| ≥ 0.25 | Significant shift | Immediate investigation required; consider model retraining |
Write a Python function that computes the Stability Divergence Index between a baseline distribution and a current distribution, and returns a comprehensive drift assessment.
Function Requirements:
bucket_countε = 0.0001) to replace zero proportionsEdge Case:
If either input list is empty, return an empty dictionary {}.
baseline_samples = [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
current_samples = [3, 3, 4, 4, 5, 5, 6, 6, 7, 7]
bucket_count = 5{'psi': 6.6336, 'drift_detected': True, 'drift_level': 'significant'}The baseline data is concentrated in the range [1, 5] while the current data has shifted to [3, 7]. This represents a notable rightward shift in the distribution.
Bucket Analysis:
When we compute proportions for each of the 5 buckets and apply the SDI formula, the resulting value of 6.6336 far exceeds the 0.25 threshold, indicating a significant distribution shift that warrants immediate investigation into why production data has diverged so dramatically from the training baseline.
baseline_samples = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
current_samples = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
bucket_count = 5{'psi': 0.0, 'drift_detected': False, 'drift_level': 'none'}When the baseline and current distributions are identical, every bucket contains the same proportion of samples from both distributions.
Bucket Analysis:
The SDI of 0.0 confirms that there is no distribution shift whatsoever. This is the ideal scenario in production—the incoming data closely matches what the model was trained on.
baseline_samples = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3]
current_samples = [8, 8, 8, 8, 9, 9, 9, 9, 10, 10]
bucket_count = 5{'psi': 17.2439, 'drift_detected': True, 'drift_level': 'significant'}This example demonstrates an extreme case of distribution shift where the baseline and current data occupy completely non-overlapping regions of the feature space.
Bucket Analysis:
The epsilon smoothing (0.0001) is applied to prevent division by zero, but the massive disparity results in an extremely high SDI of 17.2439. This indicates a catastrophic distribution shift—the production data is fundamentally different from training data, and model predictions are likely unreliable.
Constraints