Loading problem...
In machine learning, evaluating the performance of a binary classifier across different decision thresholds is crucial for understanding its discriminative capability. The Receiver Operating Characteristic (ROC) Curve is a fundamental diagnostic tool that visualizes the trade-off between a classifier's sensitivity (ability to detect positives) and specificity (ability to avoid false alarms).
Given a set of ground truth binary labels and corresponding predicted confidence scores (probabilities), your task is to construct the performance curve by computing the False Positive Rate (FPR) and True Positive Rate (TPR) at multiple classification thresholds.
Key Definitions:
True Positive Rate (TPR), also called Sensitivity or Recall: $$TPR = \frac{TP}{TP + FN} = \frac{TP}{P}$$ where TP is the number of true positives, FN is the number of false negatives, and P is the total number of actual positives.
False Positive Rate (FPR), also called Fall-out: $$FPR = \frac{FP}{FP + TN} = \frac{FP}{N}$$ where FP is the number of false positives, TN is the number of true negatives, and N is the total number of actual negatives.
Curve Construction Process:
Your Task: Implement a function that computes the complete performance curve by returning the FPR and TPR values as two separate lists. The function should gracefully handle edge cases where all samples belong to a single class.
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]([0.0, 0.0, 0.5, 0.5, 1.0], [0.0, 0.5, 0.5, 1.0, 1.0])The dataset contains P=2 positive samples (labels 1) and N=2 negative samples (labels 0).
Unique thresholds evaluated (descending order): [∞, 0.8, 0.4, 0.35, 0.1]
• Threshold = ∞: No samples classified as positive → TP=0, FP=0 → (FPR=0/2=0.0, TPR=0/2=0.0)
• Threshold = 0.8: Only sample with score 0.8 (label=1) passes → TP=1, FP=0 → (FPR=0/2=0.0, TPR=1/2=0.5)
• Threshold = 0.4: Scores 0.8 and 0.4 pass. Sample(0.8)=1, Sample(0.4)=0 → TP=1, FP=1 → (FPR=1/2=0.5, TPR=1/2=0.5)
• Threshold = 0.35: Scores 0.8, 0.4, 0.35 pass. Labels: 1, 0, 1 → TP=2, FP=1 → (FPR=1/2=0.5, TPR=2/2=1.0)
• Threshold = 0.1: All samples pass → TP=2, FP=2 → (FPR=2/2=1.0, TPR=2/2=1.0)
The resulting curve traces: (0,0) → (0,0.5) → (0.5,0.5) → (0.5,1.0) → (1.0,1.0)
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.2, 0.8, 0.9]([0.0, 0.0, 0.0, 0.5, 1.0], [0.0, 0.5, 1.0, 1.0, 1.0])This represents a well-separated classifier where positive samples have higher scores than negatives.
Unique thresholds (descending): [∞, 0.9, 0.8, 0.2, 0.1]
• Threshold = ∞: (FPR=0.0, TPR=0.0) - Starting point
• Threshold = 0.9: Only score 0.9 (label=1) passes → TP=1, FP=0 → (FPR=0.0, TPR=0.5)
• Threshold = 0.8: Scores 0.9, 0.8 pass (both label=1) → TP=2, FP=0 → (FPR=0.0, TPR=1.0)
• Threshold = 0.2: Scores 0.9, 0.8, 0.2 pass (labels: 1,1,0) → TP=2, FP=1 → (FPR=0.5, TPR=1.0)
• Threshold = 0.1: All pass → TP=2, FP=2 → (FPR=1.0, TPR=1.0)
Notice how the curve hugs the left side and top—this indicates excellent classifier performance. A perfect classifier would step from (0,0) directly to (0,1) then to (1,1).
y_true = [1, 0, 1, 0, 1]
y_scores = [0.5, 0.6, 0.4, 0.7, 0.3]([0.0, 0.5, 1.0, 1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.3333, 0.6667, 1.0])This represents a poorly performing classifier where negative samples tend to have higher scores than positives—essentially an inverted classifier.
Dataset: P=3 positives (indices 0,2,4), N=2 negatives (indices 1,3) Unique thresholds (descending): [∞, 0.7, 0.6, 0.5, 0.4, 0.3]
• Threshold = ∞: (FPR=0.0, TPR=0.0)
• Threshold = 0.7: Score 0.7 (label=0) passes → TP=0, FP=1 → (FPR=0.5, TPR=0.0)
• Threshold = 0.6: Scores 0.7, 0.6 pass (both label=0) → TP=0, FP=2 → (FPR=1.0, TPR=0.0)
• Threshold = 0.5: Scores 0.7, 0.6, 0.5 pass (labels: 0,0,1) → TP=1, FP=2 → (FPR=1.0, TPR=0.333)
• Threshold = 0.4: TP=2, FP=2 → (FPR=1.0, TPR=0.667)
• Threshold = 0.3: All pass → TP=3, FP=2 → (FPR=1.0, TPR=1.0)
The curve hugs the bottom and right side, indicating the classifier performs worse than random chance. TPR values are rounded to 4 decimal places.
Constraints