Loading content...
In machine learning classification, a model's predicted probability should ideally reflect the true likelihood of an event occurring. A well-calibrated model is one where predictions of 70% confidence are correct 70% of the time, predictions of 90% are correct 90% of the time, and so forth. Assessing calibration is critical for applications where probability estimates directly inform decision-making.
The Expected Calibration Error (ECE) is a widely-used metric to quantify how well a model's predicted probabilities align with actual outcomes. It measures the weighted average discrepancy between predicted confidence levels and observed accuracy across different probability intervals.
Algorithm Overview:
To compute the ECE:
Partition the probability space into n_bins equal-width intervals spanning [0, 1]. The first bin includes both endpoints [0, 1/n_bins], while subsequent bins are half-open intervals (lower, upper].
Group predictions into bins based on their predicted probability values. Each prediction falls into exactly one bin.
For each non-empty bin, calculate:
Compute the weighted absolute deviation for each bin: multiply the absolute difference |Accuracy - Confidence| by the proportion of total samples in that bin.
Sum these weighted deviations across all bins to obtain the final ECE score.
Interpretation:
Your Task:
Implement a function that calculates the Expected Calibration Error for binary classification predictions. The function should return the ECE value rounded to 3 decimal places.
y_true = [0, 0, 1, 1]
y_prob = [0.8, 0.8, 0.8, 0.8]
n_bins = 50.3With 5 bins (width = 0.2 each), all 4 predictions with probability 0.8 fall into the bin (0.6, 0.8].
• Confidence = average of [0.8, 0.8, 0.8, 0.8] = 0.8 • Accuracy = 2 positives out of 4 samples = 0.5 • Absolute difference = |0.5 - 0.8| = 0.3 • Weight = 4/4 = 1.0 (all samples are in this bin)
ECE = 1.0 × 0.3 = 0.3
This indicates the model is overconfident — it predicts 80% probability but only achieves 50% accuracy.
y_true = [0, 0, 0, 1, 1, 1]
y_prob = [0.1, 0.2, 0.3, 0.7, 0.8, 0.9]
n_bins = 50.2With 5 bins (width = 0.2), the predictions are distributed as:
• Bin [0.0, 0.2]: [0.1, 0.2] with labels [0, 0] → Confidence = 0.15, Accuracy = 0.0, Weight = 2/6 • Bin (0.2, 0.4]: [0.3] with labels [0] → Confidence = 0.3, Accuracy = 0.0, Weight = 1/6 • Bin (0.6, 0.8]: [0.7, 0.8] with labels [1, 1] → Confidence = 0.75, Accuracy = 1.0, Weight = 2/6 • Bin (0.8, 1.0]: [0.9] with labels [1] → Confidence = 0.9, Accuracy = 1.0, Weight = 1/6
Weighted errors: • Bin 1: (2/6) × |0.0 - 0.15| = 0.05 • Bin 2: (1/6) × |0.0 - 0.3| = 0.05 • Bin 3: (2/6) × |1.0 - 0.75| = 0.0833 • Bin 4: (1/6) × |1.0 - 0.9| = 0.0167
ECE = 0.05 + 0.05 + 0.0833 + 0.0167 = 0.2
y_true = [0, 1, 0, 1, 1]
y_prob = [0.2, 0.6, 0.3, 0.8, 0.9]
n_bins = 100.24With 10 bins (width = 0.1), predictions are grouped as:
• Bin (0.1, 0.2]: [0.2] with labels [0] → Confidence = 0.2, Accuracy = 0.0, Weight = 1/5 • Bin (0.2, 0.3]: [0.3] with labels [0] → Confidence = 0.3, Accuracy = 0.0, Weight = 1/5 • Bin (0.5, 0.6]: [0.6] with labels [1] → Confidence = 0.6, Accuracy = 1.0, Weight = 1/5 • Bin (0.7, 0.8]: [0.8] with labels [1] → Confidence = 0.8, Accuracy = 1.0, Weight = 1/5 • Bin (0.8, 0.9]: [0.9] with labels [1] → Confidence = 0.9, Accuracy = 1.0, Weight = 1/5
Weighted errors sum to ECE = 0.24
Constraints