Loading content...
In machine learning, training a model is only half the battle—evaluating its performance is equally critical for understanding whether the model is suitable for deployment. For binary classification problems, a comprehensive set of evaluation metrics provides different perspectives on model behavior, especially when dealing with imbalanced datasets or when different types of errors carry different costs.
This problem requires you to implement a complete binary classifier evaluation suite that computes multiple interconnected metrics from the fundamental building blocks of the confusion matrix.
The confusion matrix is a 2×2 table that summarizes the prediction results for a binary classification problem:
| Predicted Positive (1) | Predicted Negative (0) | |
|---|---|---|
| Actual Positive (1) | True Positives (TP) | False Negatives (FN) |
| Actual Negative (0) | False Positives (FP) | True Negatives (TN) |
The proportion of all predictions that are correct: $$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$
The harmonic mean of precision and recall, providing a balanced measure: $$\text{Precision} = \frac{TP}{TP + FP}$$ $$\text{Recall} = \frac{TP}{TP + FN}$$ $$\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$
If both precision and recall are zero, the F1 Score is defined as 0.
The proportion of actual negatives that are correctly identified: $$\text{Specificity} = \frac{TN}{TN + FP}$$
If there are no actual negatives (TN + FP = 0), specificity is undefined and should return 0.
The proportion of negative predictions that are correct: $$\text{NPV} = \frac{TN}{TN + FN}$$
If there are no negative predictions (TN + FN = 0), NPV is undefined and should return 0.
Implement a function that takes two lists—ground truth labels and model predictions—and returns a tuple containing the confusion matrix and all four metrics. Round all float values to 3 decimal places.
ground_truth = [1, 0, 1, 0, 1]
predictions = [1, 0, 0, 1, 1]([[2, 1], [1, 1]], 0.6, 0.667, 0.5, 0.5)Step 1: Build the Confusion Matrix Comparing ground_truth with predictions element by element:
Confusion Matrix: [[TP=2, FN=1], [FP=1, TN=1]]
Step 2: Calculate Metrics
ground_truth = [1, 0, 1, 0, 1, 0]
predictions = [1, 0, 1, 0, 1, 0]([[3, 0], [0, 3]], 1.0, 1.0, 1.0, 1.0)Perfect Classifier Scenario Every prediction matches the ground truth exactly:
All metrics achieve their maximum value of 1.0, indicating perfect classification performance.
ground_truth = [1, 1, 0, 0]
predictions = [0, 0, 1, 1]([[0, 2], [2, 0]], 0.0, 0.0, 0.0, 0.0)Adversarial Classifier Scenario The model produces the exact opposite of the correct labels:
This represents the worst possible classifier (worse than random guessing), with all metrics at 0.0.
Constraints