Loading content...
The Precision-Recall (PR) Curve is a powerful evaluation tool for binary classification models, particularly valuable when dealing with imbalanced datasets where positive examples are rare. Unlike ROC curves, PR curves focus specifically on the performance of the classifier concerning the positive class, making them ideal for scenarios like fraud detection, disease diagnosis, or anomaly detection.
Core Concepts:
Precision measures the proportion of predicted positives that are actually positive: $$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}$$
Recall (also called Sensitivity or True Positive Rate) measures the proportion of actual positives that are correctly identified: $$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}$$
Building the PR Curve:
The PR curve is constructed by evaluating precision and recall at various decision thresholds. For each unique score in the predicted probabilities (sorted in descending order):
Edge Case Handling:
Your Task: Write a Python function that computes the precision-recall curve given true binary labels and predicted probability scores. The function should return precision values, recall values, and the corresponding thresholds used for evaluation.
y_true = [1, 0, 1, 0]
y_scores = [0.9, 0.7, 0.4, 0.2]{"precisions": [1.0, 0.5, 0.6667, 0.5], "recalls": [0.5, 0.5, 1.0, 1.0], "thresholds": [0.9, 0.7, 0.4, 0.2]}The thresholds are the unique scores sorted in descending order: [0.9, 0.7, 0.4, 0.2].
At threshold 0.9: • Predicted positive: sample 0 (score 0.9 ≥ 0.9) → TP=1, FP=0 • Precision = 1/(1+0) = 1.0, Recall = 1/2 = 0.5
At threshold 0.7: • Predicted positive: samples 0, 1 (scores ≥ 0.7) → TP=1, FP=1 • Precision = 1/(1+1) = 0.5, Recall = 1/2 = 0.5
At threshold 0.4: • Predicted positive: samples 0, 1, 2 (scores ≥ 0.4) → TP=2, FP=1 • Precision = 2/(2+1) ≈ 0.6667, Recall = 2/2 = 1.0
At threshold 0.2: • Predicted positive: all samples (scores ≥ 0.2) → TP=2, FP=2 • Precision = 2/(2+2) = 0.5, Recall = 2/2 = 1.0
y_true = [1, 1, 0, 0]
y_scores = [0.8, 0.6, 0.4, 0.2]{"precisions": [1.0, 1.0, 0.6667, 0.5], "recalls": [0.5, 1.0, 1.0, 1.0], "thresholds": [0.8, 0.6, 0.4, 0.2]}This example shows a well-separated case where positive samples have higher scores.
At threshold 0.8: • Only sample 0 is predicted positive → TP=1, FP=0 • Precision = 1.0, Recall = 0.5 (found 1 of 2 positives)
At threshold 0.6: • Samples 0, 1 predicted positive → TP=2, FP=0 • Precision = 1.0, Recall = 1.0 (perfect at this threshold!)
At threshold 0.4: • Samples 0, 1, 2 predicted positive → TP=2, FP=1 • Precision = 0.6667, Recall = 1.0
At threshold 0.2: • All samples predicted positive → TP=2, FP=2 • Precision = 0.5, Recall = 1.0
y_true = [1, 1, 1, 0, 0, 0]
y_scores = [0.9, 0.8, 0.7, 0.3, 0.2, 0.1]{"precisions": [1.0, 1.0, 1.0, 0.75, 0.6, 0.5], "recalls": [0.3333, 0.6667, 1.0, 1.0, 1.0, 1.0], "thresholds": [0.9, 0.8, 0.7, 0.3, 0.2, 0.1]}With 3 positive and 3 negative samples, this shows the classic precision-recall trade-off.
At threshold 0.9: 1 positive predicted → Precision=1.0, Recall=0.3333 At threshold 0.8: 2 positives predicted → Precision=1.0, Recall=0.6667 At threshold 0.7: 3 positives predicted → Precision=1.0, Recall=1.0 (optimal threshold!) At threshold 0.3: 4 predicted positive → Precision=0.75, Recall=1.0 At threshold 0.2: 5 predicted positive → Precision=0.6, Recall=1.0 At threshold 0.1: All predicted positive → Precision=0.5, Recall=1.0
Notice how lowering the threshold maintains recall at 1.0 but degrades precision.
Constraints