0/318

00:00:00

Description

Editorial

Probability Calibration Assessment

MEDIUM20 pts

In machine learning classification, a model's predicted probability should ideally reflect the true likelihood of an event occurring. A well-calibrated model is one where predictions of 70% confidence are correct 70% of the time, predictions of 90% are correct 90% of the time, and so forth. Assessing calibration is critical for applications where probability estimates directly inform decision-making.

The Expected Calibration Error (ECE) is a widely-used metric to quantify how well a model's predicted probabilities align with actual outcomes. It measures the weighted average discrepancy between predicted confidence levels and observed accuracy across different probability intervals.

Algorithm Overview:

To compute the ECE:

Partition the probability space into n_bins equal-width intervals spanning [0, 1]. The first bin includes both endpoints [0, 1/n_bins], while subsequent bins are half-open intervals (lower, upper].
Group predictions into bins based on their predicted probability values. Each prediction falls into exactly one bin.
For each non-empty bin, calculate:
- Confidence: The mean predicted probability of all samples in the bin
- Accuracy: The fraction of positive labels (actual outcomes equal to 1) among samples in the bin
Compute the weighted absolute deviation for each bin: multiply the absolute difference |Accuracy - Confidence| by the proportion of total samples in that bin.
Sum these weighted deviations across all bins to obtain the final ECE score.

Interpretation:

ECE = 0 indicates perfect calibration — the model's confidence matches reality exactly
Higher ECE values indicate miscalibration — the model is either overconfident or underconfident
A model that always predicts 0.8 probability but achieves only 50% accuracy is severely overconfident

Your Task:

Implement a function that calculates the Expected Calibration Error for binary classification predictions. The function should return the ECE value rounded to 3 decimal places.

Example

Input

y_true = [0, 0, 1, 1]
y_prob = [0.8, 0.8, 0.8, 0.8]
n_bins = 5

Output

0.3

Explanation

With 5 bins (width = 0.2 each), all 4 predictions with probability 0.8 fall into the bin (0.6, 0.8].

• Confidence = average of [0.8, 0.8, 0.8, 0.8] = 0.8 • Accuracy = 2 positives out of 4 samples = 0.5 • Absolute difference = |0.5 - 0.8| = 0.3 • Weight = 4/4 = 1.0 (all samples are in this bin)

ECE = 1.0 × 0.3 = 0.3

This indicates the model is overconfident — it predicts 80% probability but only achieves 50% accuracy.

Example

Input

y_true = [0, 0, 0, 1, 1, 1]
y_prob = [0.1, 0.2, 0.3, 0.7, 0.8, 0.9]
n_bins = 5

Output

0.2

Explanation

With 5 bins (width = 0.2), the predictions are distributed as:

• Bin [0.0, 0.2]: [0.1, 0.2] with labels [0, 0] → Confidence = 0.15, Accuracy = 0.0, Weight = 2/6 • Bin (0.2, 0.4]: [0.3] with labels [0] → Confidence = 0.3, Accuracy = 0.0, Weight = 1/6 • Bin (0.6, 0.8]: [0.7, 0.8] with labels [1, 1] → Confidence = 0.75, Accuracy = 1.0, Weight = 2/6 • Bin (0.8, 1.0]: [0.9] with labels [1] → Confidence = 0.9, Accuracy = 1.0, Weight = 1/6

Weighted errors: • Bin 1: (2/6) × |0.0 - 0.15| = 0.05 • Bin 2: (1/6) × |0.0 - 0.3| = 0.05 • Bin 3: (2/6) × |1.0 - 0.75| = 0.0833 • Bin 4: (1/6) × |1.0 - 0.9| = 0.0167

ECE = 0.05 + 0.05 + 0.0833 + 0.0167 = 0.2

Example

Input

y_true = [0, 1, 0, 1, 1]
y_prob = [0.2, 0.6, 0.3, 0.8, 0.9]
n_bins = 10

Output

0.24

Explanation

With 10 bins (width = 0.1), predictions are grouped as:

• Bin (0.1, 0.2]: [0.2] with labels [0] → Confidence = 0.2, Accuracy = 0.0, Weight = 1/5 • Bin (0.2, 0.3]: [0.3] with labels [0] → Confidence = 0.3, Accuracy = 0.0, Weight = 1/5 • Bin (0.5, 0.6]: [0.6] with labels [1] → Confidence = 0.6, Accuracy = 1.0, Weight = 1/5 • Bin (0.7, 0.8]: [0.8] with labels [1] → Confidence = 0.8, Accuracy = 1.0, Weight = 1/5 • Bin (0.8, 0.9]: [0.9] with labels [1] → Confidence = 0.9, Accuracy = 1.0, Weight = 1/5

Weighted errors sum to ECE = 0.24

Accepted0/0·0% Acceptance

Constraints

1 ≤ len(y_true) = len(y_prob) ≤ 10,000
y_true[i] ∈ {0, 1} for all i
0 ≤ y_prob[i] ≤ 1 for all i
1 ≤ n_bins ≤ 100
At least one bin will contain at least one sample
The output should be rounded to exactly 3 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

n_bins =

y_prob =

[0.8,0.8,0.8,0.8]

y_true =

[0,0,1,1]

Probability Calibration Assessment

Hints

Probability Calibration Assessment

Hints