Ranking Metrics - Learning Module

Loading content...

0/245

Partial AUC (pAUC)

Focusing on What Matters: The Relevant Region

The Area Under the ROC Curve (AUC) is a beloved metric—threshold-independent, interpretable as a ranking probability, and robust to class imbalance. But AUC has a subtle flaw: it treats all parts of the ROC curve equally, even regions that are operationally irrelevant.

Consider a medical screening test:

A False Positive Rate of 50% is unacceptable (half of healthy patients undergo unnecessary procedures)
We would never operate at FPR > 10%, perhaps not even > 1%
Yet standard AUC gives equal weight to performance at FPR = 90% as at FPR = 1%

Or consider fraud detection:

Investigating 30% of all transactions as potential fraud is impossible
Only the low-FPR regime (perhaps FPR < 5%) is operationally relevant
A model excelling at high FPR provides no practical value

Partial AUC (pAUC) addresses this by computing the area under only the relevant portion of the ROC curve—typically the region where FPR is below some threshold. This focuses evaluation on the regime where the model will actually be deployed.

What You Will Learn

By the end of this page, you will understand pAUC from first principles—its definition, computation, normalization variants, relationship to full AUC, and appropriate use cases. You will be able to compute, normalize, and interpret pAUC for practical model evaluation in constrained operating regimes.

ROC Curve Fundamentals: A Brief Review

Before defining pAUC, let's establish the ROC curve foundation.

ROC Curve Definition:

For a binary classifier with continuous scores, the ROC curve plots:

x-axis: False Positive Rate (FPR) = FP / (FP + TN) = FP / N
y-axis: True Positive Rate (TPR) = TP / (TP + FN) = TP / P

as the classification threshold varies from $+\infty$ (predict all negative) to $-\infty$ (predict all positive).

Key Properties:

Starts at (0, 0): At threshold = +∞, nothing is predicted positive
Ends at (1, 1): At threshold = -∞, everything is predicted positive
Monotonically non-decreasing: Each step moves right, up, or diagonally
Diagonal = random classifier: Equal probability of ranking positive above negative
Perfect classifier: Goes from (0,0) → (0,1) → (1,1), touching the top-left corner

Full AUC:

$$\text{AUC} = \int_0^1 \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Equivalently, AUC equals the probability that a randomly chosen positive example is scored higher than a randomly chosen negative example:

$$\text{AUC} = P(s^+ > s^-)$$

where $s^+$ and $s^-$ are scores for random positive and negative examples.

roc_fundamentals.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
 
def generate_example_roc():
    """Generate example ROC curve data for illustration."""
    np.random.seed(42)
    n_pos = 100
    n_neg = 900
    
    # Positive scores: higher (good model)
    scores_pos = np.random.normal(0.7, 0.2, n_pos)
    # Negative scores: lower
    scores_neg = np.random.normal(0.3, 0.2, n_neg)
    
    y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
    y_scores = np.concatenate([scores_pos, scores_neg])
    
    fpr, tpr, thresholds = roc_curve(y_true, y_scores)
    roc_auc = auc(fpr, tpr)
    
    return fpr, tpr, roc_auc
 
fpr, tpr, roc_auc = generate_example_roc()
 
print("ROC Curve Analysis")
print("=" * 50)
print(f"Full AUC: {roc_auc:.4f}")
 
# Show key points on the curve
print("\nKey points on ROC curve:")
print(f"{'FPR':<10} {'TPR':<10} {'Description'}")
print("-" * 40)
 
key_fprs = [0.01, 0.05, 0.10, 0.20, 0.50]
for target_fpr in key_fprs:
    # Find closest actual FPR
    idx = np.argmin(np.abs(fpr - target_fpr))
    print(f"{fpr[idx]:.4f}     {tpr[idx]:.4f}     FPR ≈ {target_fpr:.0%}")
 
# Illustrate the problem with full AUC
print("\nThe Problem with Full AUC:")
print("-" * 50)
print("If we only care about FPR < 10%, full AUC includes:")
print(f"  - Relevant regime (FPR < 0.10):   ~10% of the FPR range")
print(f"  - Irrelevant regime (FPR > 0.10): ~90% of the FPR range")
print("Full AUC is dominated by the operationally irrelevant region!")

Partial AUC: Focusing on the Relevant Region

Partial AUC (pAUC) computes the area under the ROC curve over a restricted FPR range.

Definition:

Given an FPR range $[\alpha, \beta]$ where $0 \leq \alpha < \beta \leq 1$:

$$\text{pAUC}(\alpha, \beta) = \int_{\alpha}^{\beta} \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Typically, we're interested in low FPR regimes, so $\alpha = 0$ and we specify only $\beta$:

$$\text{pAUC}(0, \beta) = \int_{0}^{\beta} \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Common choices for $\beta$:

$\beta = 0.01$ (1% FPR) — Very strict applications (medical screening)
$\beta = 0.05$ (5% FPR) — Strict applications (fraud detection)
$\beta = 0.10$ (10% FPR) — Moderately constrained
$\beta = 0.20$ (20% FPR) — Lenient constraint

Range of pAUC:

Maximum: $\beta$ (if TPR = 1 throughout the range)
Minimum: 0 (if TPR = 0 throughout the range)
Random classifier: $\beta^2 / 2$ (area under diagonal from 0 to β)

This unnormalized range makes direct interpretation difficult, motivating normalization.

pauc_computation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def partial_auc(y_true, y_scores, max_fpr=0.1):
    """
    Compute partial AUC up to a specified FPR threshold.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores (higher = more likely positive)
        max_fpr: Maximum FPR to consider (beta)
    
    Returns:
        Dict with raw pAUC and normalized variants
    """
    fpr, tpr, _ = roc_curve(y_true, y_scores)
    
    # Find the portion of the curve where FPR <= max_fpr
    stop_idx = np.searchsorted(fpr, max_fpr, side='right')
    
    # Interpolate to exactly max_fpr if needed
    if stop_idx < len(fpr) and fpr[stop_idx] > max_fpr:
        # Linear interpolation
        if stop_idx > 0:
            t = (max_fpr - fpr[stop_idx - 1]) / (fpr[stop_idx] - fpr[stop_idx - 1])
            tpr_at_max = tpr[stop_idx - 1] + t * (tpr[stop_idx] - tpr[stop_idx - 1])
        else:
            tpr_at_max = tpr[0]
        fpr_partial = np.append(fpr[:stop_idx], max_fpr)
        tpr_partial = np.append(tpr[:stop_idx], tpr_at_max)
    else:
        fpr_partial = fpr[:stop_idx]
        tpr_partial = tpr[:stop_idx]
    
    # Compute raw pAUC
    if len(fpr_partial) < 2:
        raw_pauc = 0.0
    else:
        raw_pauc = auc(fpr_partial, tpr_partial)
    
    # Theoretical bounds
    min_pauc = 0  # TPR = 0 everywhere
    max_pauc = max_fpr  # TPR = 1 everywhere
    random_pauc = max_fpr ** 2 / 2  # Area under diagonal
    
    return {
        'raw_pauc': raw_pauc,
        'max_fpr': max_fpr,
        'min_possible': min_pauc,
        'max_possible': max_pauc,
        'random_baseline': random_pauc,
        'fpr_partial': fpr_partial,
        'tpr_partial': tpr_partial
    }
 
# Example
np.random.seed(42)
n_pos, n_neg = 100, 900
 
# Good model
scores_pos = np.random.normal(0.7, 0.2, n_pos)
scores_neg = np.random.normal(0.3, 0.2, n_neg)
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
y_scores = np.concatenate([scores_pos, scores_neg])
 
print("Partial AUC Analysis")
print("=" * 60)
 
for max_fpr in [0.05, 0.10, 0.20, 0.50, 1.0]:
    result = partial_auc(y_true, y_scores, max_fpr)
    
    print(f"\npAUC(0, {max_fpr:.2f}):")
    print(f"  Raw pAUC:         {result['raw_pauc']:.6f}")
    print(f"  Random baseline:  {result['random_baseline']:.6f}")
    print(f"  Max possible:     {result['max_possible']:.6f}")
    print(f"  Ratio vs random:  {result['raw_pauc'] / result['random_baseline']:.2f}x")

pAUC Is Not Directly Comparable

Raw pAUC values depend heavily on the chosen β. pAUC(0, 0.05) ≈ 0.04 might be excellent, while pAUC(0, 0.5) = 0.4 might be mediocre. Always report β and consider normalization for comparability.

pAUC Normalization: Making Values Interpretable

Raw pAUC is difficult to interpret because its range depends on $\beta$. Several normalization schemes address this.

1. McClish Normalization (Standardized pAUC):

The most common normalization, introduced by McClish (1989), maps pAUC to [0, 1]:

$$\text{pAUC}{\text{McClish}} = \frac{1}{2} \left( 1 + \frac{\text{pAUC} - \text{pAUC}{\text{min}}}{\text{pAUC}{\text{max}} - \text{pAUC}{\text{min}}} \right)$$

For the range $[0, \beta]$:

$\text{pAUC}_{\text{min}} = 0$ (TPR = 0 everywhere)
$\text{pAUC}_{\text{max}} = \beta$ (TPR = 1 everywhere)

Simplifying:

$$\text{pAUC}_{\text{McClish}} = \frac{1}{2} \left( 1 + \frac{\text{pAUC}}{\beta} \right)$$

This has range [0.5, 1.0], where 0.5 corresponds to pAUC = 0 (worst) and 1.0 corresponds to perfect.

2. Simple Normalization:

Divide by the maximum possible pAUC:

$$\text{pAUC}_{\text{simple}} = \frac{\text{pAUC}}{\beta}$$

This has range [0, 1], directly interpretable as the fraction of maximum pAUC achieved.

3. Normalized-by-Random:

Compare to the random classifier:

$$\text{pAUC}_{\text{ratio}} = \frac{\text{pAUC}}{\beta^2/2}$$

This shows how many times better than random the model is in this region.

pauc_normalization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def pauc_with_normalizations(y_true, y_scores, max_fpr=0.1):
    """
    Compute pAUC with various normalization schemes.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores
        max_fpr: Maximum FPR (beta)
    
    Returns:
        Dict with raw and normalized pAUC values
    """
    fpr, tpr, _ = roc_curve(y_true, y_scores)
    
    # Interpolate to exact max_fpr
    stop_idx = np.searchsorted(fpr, max_fpr, side='right')
    if stop_idx > 0 and fpr[stop_idx - 1] < max_fpr:
        if stop_idx < len(fpr):
            t = (max_fpr - fpr[stop_idx - 1]) / (fpr[stop_idx] - fpr[stop_idx - 1])
            tpr_interp = tpr[stop_idx - 1] + t * (tpr[stop_idx] - tpr[stop_idx - 1])
            fpr_partial = np.append(fpr[:stop_idx], max_fpr)
            tpr_partial = np.append(tpr[:stop_idx], tpr_interp)
        else:
            fpr_partial = fpr[:stop_idx]
            tpr_partial = tpr[:stop_idx]
    else:
        fpr_partial = fpr[:stop_idx]
        tpr_partial = tpr[:stop_idx]
    
    raw_pauc = auc(fpr_partial, tpr_partial) if len(fpr_partial) >= 2 else 0.0
    
    beta = max_fpr
    
    # Normalization schemes
    # 1. Simple: [0, 1]
    simple = raw_pauc / beta
    
    # 2. McClish: [0.5, 1.0]
    mcclish = 0.5 * (1 + raw_pauc / beta)
    
    # 3. Ratio to random
    random_pauc = beta ** 2 / 2
    ratio_to_random = raw_pauc / random_pauc if random_pauc > 0 else 0
    
    # 4. Above-random normalized (similar to Gini)
    # pAUC - random, normalized by max - random
    above_random = (raw_pauc - random_pauc) / (beta - random_pauc)
    
    return {
        'raw': raw_pauc,
        'simple_normalized': simple,
        'mcclish_normalized': mcclish,
        'ratio_to_random': ratio_to_random,
        'above_random_normalized': above_random,
        'max_fpr': beta
    }
 
# Compare normalizations across different FPR ranges
np.random.seed(42)
n_pos, n_neg = 100, 900
 
scores_pos = np.random.normal(0.7, 0.2, n_pos)
scores_neg = np.random.normal(0.3, 0.2, n_neg)
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
y_scores = np.concatenate([scores_pos, scores_neg])
 
print("pAUC Normalization Comparison")
print("=" * 80)
print(f"{'Max FPR':<10} {'Raw':<10} {'Simple':<10} {'McClish':<10} "
      f"{'×Random':<10} {'Above-Rand':<12}")
print("-" * 80)
 
for max_fpr in [0.01, 0.05, 0.10, 0.20, 0.50]:
    result = pauc_with_normalizations(y_true, y_scores, max_fpr)
    print(f"{max_fpr:<10.2f} {result['raw']:<10.6f} "
          f"{result['simple_normalized']:<10.4f} "
          f"{result['mcclish_normalized']:<10.4f} "
          f"{result['ratio_to_random']:<10.2f} "
          f"{result['above_random_normalized']:<12.4f}")
 
print("\nInterpretation:")
print("-" * 80)
print("Raw:          Absolute area under curve in [0, beta] range")
print("Simple:       Fraction of maximum pAUC (range [0, 1])")
print("McClish:      Standardized to [0.5, 1.0] for comparability")
print("×Random:      Multiple of random classifier's pAUC")
print("Above-Rand:   Normalized improvement over random (like Gini for partial range)")

pAUC Normalization Methods Comparison
Method	Range	Random Value	Use Case
Raw pAUC	[0, β]	β²/2	Precise computation, research
Simple (pAUC/β)	[0, 1]	β/2	Intuitive interpretation
McClish	[0.5, 1.0]	0.5 + β/4	Comparison across β values
Ratio to Random	[0, 2/β]	1.0	Improvement over baseline
Above-Random	[-∞, 1]	0	Like Gini for partial range

Two-Way Partial AUC

Sometimes we need to constrain both FPR and TPR. Two-Way pAUC (or restricted pAUC) evaluates the ROC curve within a rectangular region.

Definition:

Given FPR range $[\alpha_1, \beta_1]$ and TPR range $[\alpha_2, \beta_2]$:

$$\text{pAUC}{\text{2way}} = \int{\text{FPR} \in [\alpha_1, \beta_1], \text{TPR} \in [\alpha_2, \beta_2]} \text{ROC region}$$

Use Cases:

TPR constraint: In fraud detection, we might require TPR ≥ 0.9 (catch 90% of frauds). We only care about the ROC region where this is satisfied.
FPR + TPR constraints: In medical screening, we might need:
- FPR ≤ 0.05 (acceptable false alarm rate)
- TPR ≥ 0.8 (minimum sensitivity)
Only the rectangular region satisfying both is relevant.

Computation:

Two-way pAUC requires identifying the portion of the ROC curve that falls within both constraints, which may be empty if the constraints are unsatisfiable by the model.

two_way_pauc.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def two_way_pauc(y_true, y_scores, 
                 fpr_range=(0.0, 0.1), 
                 tpr_range=(0.8, 1.0)):
    """
    Compute Two-Way Partial AUC.
    
    Computes area under ROC curve within specified FPR and TPR ranges.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores
        fpr_range: (min_fpr, max_fpr) tuple
        tpr_range: (min_tpr, max_tpr) tuple
    
    Returns:
        Dict with two-way pAUC and diagnostics
    """
    fpr, tpr, thresholds = roc_curve(y_true, y_scores)
    
    min_fpr, max_fpr = fpr_range
    min_tpr, max_tpr = tpr_range
    
    # Find points within both constraints
    mask = (fpr >= min_fpr) & (fpr <= max_fpr) & (tpr >= min_tpr) & (tpr <= max_tpr)
    
    fpr_filtered = fpr[mask]
    tpr_filtered = tpr[mask]
    
    if len(fpr_filtered) < 2:
        # Constraints not satisfiable or barely touched
        return {
            'two_way_pauc': 0.0,
            'fpr_range': fpr_range,
            'tpr_range': tpr_range,
            'points_in_region': len(fpr_filtered),
            'feasible': len(fpr_filtered) >= 2
        }
    
    # Compute area (simple trapezoidal for filtered region)
    raw_area = auc(fpr_filtered, tpr_filtered)
    
    # Subtract the lower TPR baseline within the FPR range
    # This accounts for the TPR floor
    fpr_width = fpr_filtered[-1] - fpr_filtered[0]
    baseline_area = min_tpr * fpr_width
    
    # The effective two-way pAUC measures area above the TPR floor
    effective_area = raw_area - baseline_area
    
    # Maximum possible (TPR = max_tpr throughout the FPR range)
    max_possible = (max_tpr - min_tpr) * (max_fpr - min_fpr)
    
    normalized = effective_area / max_possible if max_possible > 0 else 0.0
    
    return {
        'two_way_pauc': raw_area,
        'effective_area': effective_area,
        'normalized': normalized,
        'fpr_range': fpr_range,
        'tpr_range': tpr_range,
        'points_in_region': len(fpr_filtered),
        'feasible': True
    }
 
# Example: Fraud detection with constraints
np.random.seed(42)
n_fraud = 100
n_legit = 9900
 
# Model scores
scores_fraud = np.random.normal(0.75, 0.15, n_fraud)
scores_legit = np.random.normal(0.25, 0.2, n_legit)
 
y_true = np.concatenate([np.ones(n_fraud), np.zeros(n_legit)])
y_scores = np.concatenate([scores_fraud, scores_legit])
 
print("Two-Way pAUC Analysis")
print("=" * 60)
print("Scenario: Fraud detection")
print("  - Constraint 1: FPR ≤ 5% (can't investigate too many)")
print("  - Constraint 2: TPR ≥ 80% (must catch most fraud)")
print()
 
# Standard pAUC (one-way)
from sklearn.metrics import roc_auc_score
full_auc = roc_auc_score(y_true, y_scores)
print(f"Full AUC: {full_auc:.4f}")
 
# One-way pAUC (FPR constraint only)
fpr, tpr, _ = roc_curve(y_true, y_scores)
idx = np.where(fpr <= 0.05)[0]
one_way_pauc = auc(fpr[idx], tpr[idx]) if len(idx) >= 2 else 0
print(f"One-way pAUC (FPR ≤ 5%): {one_way_pauc:.6f}")
 
# Two-way pAUC
result = two_way_pauc(y_true, y_scores, 
                       fpr_range=(0.0, 0.05),
                       tpr_range=(0.8, 1.0))
 
print(f"\nTwo-way pAUC (FPR ≤ 5%, TPR ≥ 80%):")
print(f"  Raw area in region:    {result['two_way_pauc']:.6f}")
print(f"  Effective area:        {result['effective_area']:.6f}")
print(f"  Normalized:            {result['normalized']:.4f}")
print(f"  Points in region:      {result['points_in_region']}")
print(f"  Constraints feasible:  {result['feasible']}")

Relationship to Full AUC and Other Metrics

Understanding how pAUC relates to other metrics helps guide metric selection.

pAUC vs. Full AUC:

Aspect	Full AUC	pAUC
FPR range	[0, 1]	[α, β] (typically [0, β])
Interpretation	Overall discrimination	Discrimination in target regime
Sensitivity to	Entire ROC curve	Only specified region
Operational relevance	May include irrelevant regions	Focuses on deployable region

Key Insight:

Full AUC and pAUC can disagree on model rankings. Model A might have higher full AUC but lower pAUC in the target region, meaning Model B is better where it matters.

pAUC vs. TPR@FPR (Sensitivity at Specificity):

TPR@FPR is a point metric—the TPR at a single FPR threshold. pAUC is an area metric covering a range.

Metric	What it measures	Pros	Cons
TPR@FPR=0.05	Single point on ROC	Simple, specific	Single point = high variance
pAUC(0, 0.05)	Area under FPR ≤ 5%	Averages over range	Less specific

When Models Cross:

If two ROC curves cross within the target region, pAUC gives an average view. Point metrics (TPR@FPR) might prefer one model at some thresholds and another at different thresholds.

pauc_vs_full_auc.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import numpy as np
from sklearn.metrics import roc_curve, auc, roc_auc_score
 
def compare_models_pauc_vs_auc(models, y_true, max_fpr=0.05):
    """
    Compare models on full AUC vs pAUC.
    Demonstrates cases where rankings differ.
    """
    results = []
    
    for name, y_scores in models.items():
        fpr, tpr, _ = roc_curve(y_true, y_scores)
        full_auc = roc_auc_score(y_true, y_scores)
        
        # pAUC
        idx = np.where(fpr <= max_fpr)[0]
        if len(idx) >= 2:
            pauc = auc(fpr[idx], tpr[idx])
        else:
            pauc = 0.0
        
        # TPR at target FPR
        idx_at_fpr = np.searchsorted(fpr, max_fpr)
        if idx_at_fpr > 0:
            tpr_at_fpr = tpr[idx_at_fpr - 1]
        else:
            tpr_at_fpr = 0.0
        
        results.append({
            'name': name,
            'full_auc': full_auc,
            'pauc': pauc,
            'tpr_at_fpr': tpr_at_fpr
        })
    
    return results
 
# Create models with different characteristics
np.random.seed(42)
n_pos, n_neg = 200, 1800
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
 
# Model A: Good overall, weaker at low FPR
scores_a_pos = np.random.normal(0.65, 0.2, n_pos)
scores_a_neg = np.random.normal(0.35, 0.2, n_neg)
y_scores_a = np.concatenate([scores_a_pos, scores_a_neg])
 
# Model B: Slightly lower overall AUC, but stronger at low FPR
# Has a "fat tail" of very high-confidence positives
scores_b_pos = np.concatenate([
    np.random.normal(0.9, 0.05, int(n_pos * 0.2)),  # 20% very high confidence
    np.random.normal(0.5, 0.2, int(n_pos * 0.8))    # 80% moderate
])
scores_b_neg = np.random.normal(0.4, 0.15, n_neg)
y_scores_b = np.concatenate([scores_b_pos, scores_b_neg])
 
models = {
    'Model A (balanced)': y_scores_a,
    'Model B (high-conf tail)': y_scores_b
}
 
print("Comparing Models: Full AUC vs pAUC")
print("=" * 70)
print()
 
for max_fpr in [0.01, 0.05, 0.10]:
    results = compare_models_pauc_vs_auc(models, y_true, max_fpr)
    
    print(f"Max FPR = {max_fpr:.0%}")
    print("-" * 60)
    print(f"{'Model':<25} {'Full AUC':<12} {'pAUC':<12} {'TPR@FPR':<12}")
    print("-" * 60)
    
    for r in results:
        print(f"{r['name']:<25} {r['full_auc']:<12.4f} "
              f"{r['pauc']:<12.6f} {r['tpr_at_fpr']:<12.4f}")
    
    # Determine winner for each metric
    full_auc_winner = max(results, key=lambda x: x['full_auc'])['name']
    pauc_winner = max(results, key=lambda x: x['pauc'])['name']
    
    print()
    print(f"Full AUC winner: {full_auc_winner.split()[0]} {full_auc_winner.split()[1]}")
    print(f"pAUC winner:     {pauc_winner.split()[0]} {pauc_winner.split()[1]}")
    
    if full_auc_winner != pauc_winner:
        print("  >>> DIFFERENT WINNERS! pAUC reveals different operational performance.")
    print()

Marketing vs. Reality

Be wary of models marketed on full AUC when you'll operate in a constrained FPR regime. A model with 'state-of-the-art AUC = 0.98' might have pAUC(0, 0.05) worse than a simpler model with full AUC = 0.94. Always evaluate on the metric that reflects your operational constraints.

Practical Applications of pAUC

pAUC Applications by Domain
Domain	Typical β (max FPR)	Rationale
Cancer Screening	0.01 - 0.05	False alarms cause anxiety, unnecessary biopsies
Fraud Detection	0.01 - 0.10	Can't manually investigate more than ~1-10% of transactions
Security Screening (Airport)	0.001 - 0.01	Massive passenger volume; even 1% FPR is millions of delays
Spam Filtering	0.001 - 0.01	False positives lose important emails
Disease Diagnosis	0.05 - 0.20	Cost of false positives depends on follow-up procedures
Credit Scoring	0.05 - 0.15	Regulatory and business constraints on rejection rates

Case Study: Mammography Screening

In breast cancer screening:

Millions of mammograms performed annually
FPR = 10% means millions of unnecessary callbacks, biopsies, and patient anxiety
Regulatory bodies focus on TPR at FPR = 0.1 (what fraction of cancers are caught at 10% false recall rate?)
pAUC(0, 0.1) is more operationally meaningful than full AUC

Case Study: Credit Card Fraud

In real-time fraud detection:

Billions of transactions globally
FPR = 1% means millions of legitimate transactions blocked
Cost of FP (angry customer, lost sale) vs. FN (fraud loss) is highly asymmetric
pAUC(0, 0.01) focuses on the tiny viable operating region

Case Study: Rare Disease Screening

Prevalence might be 1 in 10,000
Even FPR = 0.01 means 100 FPs per TP (if prevalence = 0.01%)
pAUC at extremely low FPR (0.001) becomes relevant
This is where most models fail and where gains matter most

Statistical Considerations for pAUC

pAUC has unique statistical properties that affect confidence intervals and significance testing.

Variance of pAUC:

The variance of pAUC depends on:

Sample sizes of positive and negative classes
The FPR range [α, β]
The true ROC curve shape in that region

Intuition: Narrower FPR ranges use fewer negative examples to estimate the curve, increasing variance. pAUC(0, 0.01) has higher variance than pAUC(0, 0.10).

Confidence Intervals:

Bootstrap is the most common approach:

Resample positives and negatives with replacement
Compute pAUC for bootstrap sample
Repeat 1000-10000 times
Take percentiles for CI

Significance Testing:

To compare two models on pAUC:

Paired bootstrap: Resample the same indices for both models, compute pAUC difference, build CI
DeLong-like methods: Extensions of DeLong's method (for AUC) to pAUC exist but are less common
Permutation tests: Shuffle model labels and compute pAUC differences under the null

pauc_statistics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def pauc_with_bootstrap_ci(y_true, y_scores, max_fpr=0.1,
                            n_bootstrap=2000, confidence=0.95):
    """
    Compute pAUC with bootstrap confidence interval.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores
        max_fpr: Maximum FPR for partial AUC
        n_bootstrap: Number of bootstrap iterations
        confidence: Confidence level for CI
    
    Returns:
        Dict with pAUC estimate and confidence interval
    """
    def compute_pauc(y_t, y_s, max_f):
        fpr, tpr, _ = roc_curve(y_t, y_s)
        idx = np.where(fpr <= max_f)[0]
        return auc(fpr[idx], tpr[idx]) if len(idx) >= 2 else 0.0
    
    # Point estimate
    pauc_estimate = compute_pauc(y_true, y_scores, max_fpr)
    
    # Bootstrap
    rng = np.random.default_rng(42)
    n = len(y_true)
    bootstrap_paucs = []
    
    for _ in range(n_bootstrap):
        idx = rng.choice(n, size=n, replace=True)
        y_t_boot = y_true[idx]
        y_s_boot = y_scores[idx]
        
        # Ensure both classes present
        if len(np.unique(y_t_boot)) < 2:
            continue
        
        pauc_boot = compute_pauc(y_t_boot, y_s_boot, max_fpr)
        bootstrap_paucs.append(pauc_boot)
    
    bootstrap_paucs = np.array(bootstrap_paucs)
    
    alpha = 1 - confidence
    ci_lower = np.percentile(bootstrap_paucs, 100 * alpha / 2)
    ci_upper = np.percentile(bootstrap_paucs, 100 * (1 - alpha / 2))
    
    return {
        'pauc': pauc_estimate,
        'ci_lower': ci_lower,
        'ci_upper': ci_upper,
        'bootstrap_std': np.std(bootstrap_paucs),
        'max_fpr': max_fpr,
        'n_bootstrap_valid': len(bootstrap_paucs)
    }
 
def compare_models_pauc_significance(y_true, y_scores_a, y_scores_b, 
                                      max_fpr=0.1, n_bootstrap=5000):
    """
    Compare two models' pAUC with significance testing.
    """
    def compute_pauc(y_t, y_s, max_f):
        fpr, tpr, _ = roc_curve(y_t, y_s)
        idx = np.where(fpr <= max_f)[0]
        return auc(fpr[idx], tpr[idx]) if len(idx) >= 2 else 0.0
    
    pauc_a = compute_pauc(y_true, y_scores_a, max_fpr)
    pauc_b = compute_pauc(y_true, y_scores_b, max_fpr)
    observed_diff = pauc_a - pauc_b
    
    # Paired bootstrap
    rng = np.random.default_rng(42)
    n = len(y_true)
    boot_diffs = []
    
    for _ in range(n_bootstrap):
        idx = rng.choice(n, size=n, replace=True)
        y_t_boot = y_true[idx]
        
        if len(np.unique(y_t_boot)) < 2:
            continue
        
        pauc_a_boot = compute_pauc(y_t_boot, y_scores_a[idx], max_fpr)
        pauc_b_boot = compute_pauc(y_t_boot, y_scores_b[idx], max_fpr)
        boot_diffs.append(pauc_a_boot - pauc_b_boot)
    
    boot_diffs = np.array(boot_diffs)
    ci_lower = np.percentile(boot_diffs, 2.5)
    ci_upper = np.percentile(boot_diffs, 97.5)
    
    significant = ci_lower > 0 or ci_upper < 0
    
    return {
        'pauc_a': pauc_a,
        'pauc_b': pauc_b,
        'difference': observed_diff,
        'ci_lower': ci_lower,
        'ci_upper': ci_upper,
        'significant_95': significant
    }
 
# Example
np.random.seed(42)
n_pos, n_neg = 100, 900
 
scores_pos_a = np.random.normal(0.7, 0.2, n_pos)
scores_neg_a = np.random.normal(0.3, 0.2, n_neg)
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
y_scores_a = np.concatenate([scores_pos_a, scores_neg_a])
 
# Slightly better model
scores_pos_b = np.random.normal(0.72, 0.18, n_pos)
scores_neg_b = np.random.normal(0.28, 0.18, n_neg)
y_scores_b = np.concatenate([scores_pos_b, scores_neg_b])
 
print("pAUC with Bootstrap Confidence Intervals")
print("=" * 60)
 
for max_fpr in [0.05, 0.10]:
    result = pauc_with_bootstrap_ci(y_true, y_scores_a, max_fpr)
    print(f"\nMax FPR = {max_fpr:.0%}:")
    print(f"  pAUC = {result['pauc']:.6f}")
    print(f"  95% CI: [{result['ci_lower']:.6f}, {result['ci_upper']:.6f}]")
    print(f"  Bootstrap std: {result['bootstrap_std']:.6f}")
 
print("\nModel Comparison:")
comparison = compare_models_pauc_significance(y_true, y_scores_a, y_scores_b, 0.1)
print(f"  Model A pAUC: {comparison['pauc_a']:.6f}")
print(f"  Model B pAUC: {comparison['pauc_b']:.6f}")
print(f"  Difference (A - B): {comparison['difference']:.6f}")
print(f"  95% CI: [{comparison['ci_lower']:.6f}, {comparison['ci_upper']:.6f}]")
print(f"  Significant: {comparison['significant_95']}")

Common Pitfalls and Best Practices

Common Pitfalls

•Comparing raw pAUC across different β values: pAUC(0, 0.05) is not comparable to pAUC(0, 0.10). Use normalized versions or be explicit about parameters.
•Ignoring the variance: pAUC at very low FPR has high variance. Always compute CIs.
•Insufficient negative samples: Computing pAUC at FPR = 0.01 with only 100 negatives means just 1 false positive — extremely noisy.
•Interpolation issues: At extreme FPR values, the ROC curve is a step function. Interpolation methods matter for consistency.
•Optimizing for pAUC directly without regularization: Can lead to overfitting to the specific FPR range.

Best Practices

•Document the FPR range: Always specify β when reporting pAUC. "pAUC = 0.04" means nothing without knowing the range.
•Report with normalization: McClish normalization or simple normalization aids interpretation.
•Include full AUC for context: pAUC alone doesn't show overall discrimination; report both.
•Use adequate sample sizes: For pAUC(0, β), need at least 100/β negatives for stable estimation.
•Report multiple β values: pAUC at β = 0.01, 0.05, 0.10 shows performance across operating regimes.
•Bootstrap for CIs: Analytical CIs are complex; bootstrap is robust and easy.

Summary: Partial AUC

We have established pAUC as the essential metric for evaluating classifiers in constrained operating regimes. Let's consolidate our understanding:

Key Takeaways

•pAUC focuses on operationally relevant FPR ranges — evaluates performance where the model will actually be deployed
•Raw pAUC ranges from 0 to β — use normalization for interpretable [0,1] scale (McClish or simple)
•Full AUC can be misleading — a model with higher full AUC may have lower pAUC in your target regime
•Two-way pAUC constrains both FPR and TPR, useful when both have operational limits
•Higher variance at low FPR — pAUC(0, 0.01) needs large negative samples; always bootstrap for CIs
•Always specify β — pAUC values are meaningless without knowing the FPR range
•Report alongside full AUC — gives complete picture of model discrimination

Mathematical Summary:

$$\text{pAUC}(0, \beta) = \int_{0}^{\beta} \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Normalizations:

Simple: $\text{pAUC} / \beta$ → [0, 1]
McClish: $0.5(1 + \text{pAUC}/\beta)$ → [0.5, 1]

Module Complete:

With pAUC, we conclude our exploration of ranking metrics. You now have a comprehensive toolkit for evaluating ranked systems:

Metric	Best For
P@k, R@k	Simple cutoff evaluation
MAP	Binary relevance, all positions
NDCG	Graded relevance
MRR	Single-answer retrieval
pAUC	Constrained FPR operation

Module Complete

You have completed Module 5: Ranking Metrics. You now possess a comprehensive understanding of how to evaluate ranked retrieval, recommendation, and classification systems—from simple Precision@k to sophisticated partial AUC. Choose metrics that align with your operational constraints and always report with statistical rigor.

Partial AUC (pAUC)

Focusing on What Matters: The Relevant Region

Consider a medical screening test:

A False Positive Rate of 50% is unacceptable (half of healthy patients undergo unnecessary procedures)
We would never operate at FPR > 10%, perhaps not even > 1%
Yet standard AUC gives equal weight to performance at FPR = 90% as at FPR = 1%

Or consider fraud detection:

Investigating 30% of all transactions as potential fraud is impossible
Only the low-FPR regime (perhaps FPR < 5%) is operationally relevant
A model excelling at high FPR provides no practical value

What You Will Learn

ROC Curve Fundamentals: A Brief Review

Before defining pAUC, let's establish the ROC curve foundation.

ROC Curve Definition:

For a binary classifier with continuous scores, the ROC curve plots:

x-axis: False Positive Rate (FPR) = FP / (FP + TN) = FP / N
y-axis: True Positive Rate (TPR) = TP / (TP + FN) = TP / P

as the classification threshold varies from $+\infty$ (predict all negative) to $-\infty$ (predict all positive).

Key Properties:

Starts at (0, 0): At threshold = +∞, nothing is predicted positive
Ends at (1, 1): At threshold = -∞, everything is predicted positive
Monotonically non-decreasing: Each step moves right, up, or diagonally
Diagonal = random classifier: Equal probability of ranking positive above negative
Perfect classifier: Goes from (0,0) → (0,1) → (1,1), touching the top-left corner

Full AUC:

$$\text{AUC} = \int_0^1 \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Equivalently, AUC equals the probability that a randomly chosen positive example is scored higher than a randomly chosen negative example:

$$\text{AUC} = P(s^+ > s^-)$$

where $s^+$ and $s^-$ are scores for random positive and negative examples.

roc_fundamentals.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
 
def generate_example_roc():
    """Generate example ROC curve data for illustration."""
    np.random.seed(42)
    n_pos = 100
    n_neg = 900
    
    # Positive scores: higher (good model)
    scores_pos = np.random.normal(0.7, 0.2, n_pos)
    # Negative scores: lower
    scores_neg = np.random.normal(0.3, 0.2, n_neg)
    
    y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
    y_scores = np.concatenate([scores_pos, scores_neg])
    
    fpr, tpr, thresholds = roc_curve(y_true, y_scores)
    roc_auc = auc(fpr, tpr)
    
    return fpr, tpr, roc_auc
 
fpr, tpr, roc_auc = generate_example_roc()
 
print("ROC Curve Analysis")
print("=" * 50)
print(f"Full AUC: {roc_auc:.4f}")
 
# Show key points on the curve
print("\nKey points on ROC curve:")
print(f"{'FPR':<10} {'TPR':<10} {'Description'}")
print("-" * 40)
 
key_fprs = [0.01, 0.05, 0.10, 0.20, 0.50]
for target_fpr in key_fprs:
    # Find closest actual FPR
    idx = np.argmin(np.abs(fpr - target_fpr))
    print(f"{fpr[idx]:.4f}     {tpr[idx]:.4f}     FPR ≈ {target_fpr:.0%}")
 
# Illustrate the problem with full AUC
print("\nThe Problem with Full AUC:")
print("-" * 50)
print("If we only care about FPR < 10%, full AUC includes:")
print(f"  - Relevant regime (FPR < 0.10):   ~10% of the FPR range")
print(f"  - Irrelevant regime (FPR > 0.10): ~90% of the FPR range")
print("Full AUC is dominated by the operationally irrelevant region!")

Partial AUC: Focusing on the Relevant Region

Partial AUC (pAUC) computes the area under the ROC curve over a restricted FPR range.

Definition:

Given an FPR range $[\alpha, \beta]$ where $0 \leq \alpha < \beta \leq 1$:

$$\text{pAUC}(\alpha, \beta) = \int_{\alpha}^{\beta} \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Typically, we're interested in low FPR regimes, so $\alpha = 0$ and we specify only $\beta$:

$$\text{pAUC}(0, \beta) = \int_{0}^{\beta} \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Common choices for $\beta$:

$\beta = 0.01$ (1% FPR) — Very strict applications (medical screening)
$\beta = 0.05$ (5% FPR) — Strict applications (fraud detection)
$\beta = 0.10$ (10% FPR) — Moderately constrained
$\beta = 0.20$ (20% FPR) — Lenient constraint

Range of pAUC:

Maximum: $\beta$ (if TPR = 1 throughout the range)
Minimum: 0 (if TPR = 0 throughout the range)
Random classifier: $\beta^2 / 2$ (area under diagonal from 0 to β)

This unnormalized range makes direct interpretation difficult, motivating normalization.

pauc_computation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def partial_auc(y_true, y_scores, max_fpr=0.1):
    """
    Compute partial AUC up to a specified FPR threshold.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores (higher = more likely positive)
        max_fpr: Maximum FPR to consider (beta)
    
    Returns:
        Dict with raw pAUC and normalized variants
    """
    fpr, tpr, _ = roc_curve(y_true, y_scores)
    
    # Find the portion of the curve where FPR <= max_fpr
    stop_idx = np.searchsorted(fpr, max_fpr, side='right')
    
    # Interpolate to exactly max_fpr if needed
    if stop_idx < len(fpr) and fpr[stop_idx] > max_fpr:
        # Linear interpolation
        if stop_idx > 0:
            t = (max_fpr - fpr[stop_idx - 1]) / (fpr[stop_idx] - fpr[stop_idx - 1])
            tpr_at_max = tpr[stop_idx - 1] + t * (tpr[stop_idx] - tpr[stop_idx - 1])
        else:
            tpr_at_max = tpr[0]
        fpr_partial = np.append(fpr[:stop_idx], max_fpr)
        tpr_partial = np.append(tpr[:stop_idx], tpr_at_max)
    else:
        fpr_partial = fpr[:stop_idx]
        tpr_partial = tpr[:stop_idx]
    
    # Compute raw pAUC
    if len(fpr_partial) < 2:
        raw_pauc = 0.0
    else:
        raw_pauc = auc(fpr_partial, tpr_partial)
    
    # Theoretical bounds
    min_pauc = 0  # TPR = 0 everywhere
    max_pauc = max_fpr  # TPR = 1 everywhere
    random_pauc = max_fpr ** 2 / 2  # Area under diagonal
    
    return {
        'raw_pauc': raw_pauc,
        'max_fpr': max_fpr,
        'min_possible': min_pauc,
        'max_possible': max_pauc,
        'random_baseline': random_pauc,
        'fpr_partial': fpr_partial,
        'tpr_partial': tpr_partial
    }
 
# Example
np.random.seed(42)
n_pos, n_neg = 100, 900
 
# Good model
scores_pos = np.random.normal(0.7, 0.2, n_pos)
scores_neg = np.random.normal(0.3, 0.2, n_neg)
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
y_scores = np.concatenate([scores_pos, scores_neg])
 
print("Partial AUC Analysis")
print("=" * 60)
 
for max_fpr in [0.05, 0.10, 0.20, 0.50, 1.0]:
    result = partial_auc(y_true, y_scores, max_fpr)
    
    print(f"\npAUC(0, {max_fpr:.2f}):")
    print(f"  Raw pAUC:         {result['raw_pauc']:.6f}")
    print(f"  Random baseline:  {result['random_baseline']:.6f}")
    print(f"  Max possible:     {result['max_possible']:.6f}")
    print(f"  Ratio vs random:  {result['raw_pauc'] / result['random_baseline']:.2f}x")

pAUC Is Not Directly Comparable

Raw pAUC values depend heavily on the chosen β. pAUC(0, 0.05) ≈ 0.04 might be excellent, while pAUC(0, 0.5) = 0.4 might be mediocre. Always report β and consider normalization for comparability.

pAUC Normalization: Making Values Interpretable

Raw pAUC is difficult to interpret because its range depends on $\beta$. Several normalization schemes address this.

1. McClish Normalization (Standardized pAUC):

The most common normalization, introduced by McClish (1989), maps pAUC to [0, 1]:

$$\text{pAUC}{\text{McClish}} = \frac{1}{2} \left( 1 + \frac{\text{pAUC} - \text{pAUC}{\text{min}}}{\text{pAUC}{\text{max}} - \text{pAUC}{\text{min}}} \right)$$

For the range $[0, \beta]$:

$\text{pAUC}_{\text{min}} = 0$ (TPR = 0 everywhere)
$\text{pAUC}_{\text{max}} = \beta$ (TPR = 1 everywhere)

Simplifying:

$$\text{pAUC}_{\text{McClish}} = \frac{1}{2} \left( 1 + \frac{\text{pAUC}}{\beta} \right)$$

This has range [0.5, 1.0], where 0.5 corresponds to pAUC = 0 (worst) and 1.0 corresponds to perfect.

2. Simple Normalization:

Divide by the maximum possible pAUC:

$$\text{pAUC}_{\text{simple}} = \frac{\text{pAUC}}{\beta}$$

This has range [0, 1], directly interpretable as the fraction of maximum pAUC achieved.

3. Normalized-by-Random:

Compare to the random classifier:

$$\text{pAUC}_{\text{ratio}} = \frac{\text{pAUC}}{\beta^2/2}$$

This shows how many times better than random the model is in this region.

pauc_normalization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def pauc_with_normalizations(y_true, y_scores, max_fpr=0.1):
    """
    Compute pAUC with various normalization schemes.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores
        max_fpr: Maximum FPR (beta)
    
    Returns:
        Dict with raw and normalized pAUC values
    """
    fpr, tpr, _ = roc_curve(y_true, y_scores)
    
    # Interpolate to exact max_fpr
    stop_idx = np.searchsorted(fpr, max_fpr, side='right')
    if stop_idx > 0 and fpr[stop_idx - 1] < max_fpr:
        if stop_idx < len(fpr):
            t = (max_fpr - fpr[stop_idx - 1]) / (fpr[stop_idx] - fpr[stop_idx - 1])
            tpr_interp = tpr[stop_idx - 1] + t * (tpr[stop_idx] - tpr[stop_idx - 1])
            fpr_partial = np.append(fpr[:stop_idx], max_fpr)
            tpr_partial = np.append(tpr[:stop_idx], tpr_interp)
        else:
            fpr_partial = fpr[:stop_idx]
            tpr_partial = tpr[:stop_idx]
    else:
        fpr_partial = fpr[:stop_idx]
        tpr_partial = tpr[:stop_idx]
    
    raw_pauc = auc(fpr_partial, tpr_partial) if len(fpr_partial) >= 2 else 0.0
    
    beta = max_fpr
    
    # Normalization schemes
    # 1. Simple: [0, 1]
    simple = raw_pauc / beta
    
    # 2. McClish: [0.5, 1.0]
    mcclish = 0.5 * (1 + raw_pauc / beta)
    
    # 3. Ratio to random
    random_pauc = beta ** 2 / 2
    ratio_to_random = raw_pauc / random_pauc if random_pauc > 0 else 0
    
    # 4. Above-random normalized (similar to Gini)
    # pAUC - random, normalized by max - random
    above_random = (raw_pauc - random_pauc) / (beta - random_pauc)
    
    return {
        'raw': raw_pauc,
        'simple_normalized': simple,
        'mcclish_normalized': mcclish,
        'ratio_to_random': ratio_to_random,
        'above_random_normalized': above_random,
        'max_fpr': beta
    }
 
# Compare normalizations across different FPR ranges
np.random.seed(42)
n_pos, n_neg = 100, 900
 
scores_pos = np.random.normal(0.7, 0.2, n_pos)
scores_neg = np.random.normal(0.3, 0.2, n_neg)
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
y_scores = np.concatenate([scores_pos, scores_neg])
 
print("pAUC Normalization Comparison")
print("=" * 80)
print(f"{'Max FPR':<10} {'Raw':<10} {'Simple':<10} {'McClish':<10} "
      f"{'×Random':<10} {'Above-Rand':<12}")
print("-" * 80)
 
for max_fpr in [0.01, 0.05, 0.10, 0.20, 0.50]:
    result = pauc_with_normalizations(y_true, y_scores, max_fpr)
    print(f"{max_fpr:<10.2f} {result['raw']:<10.6f} "
          f"{result['simple_normalized']:<10.4f} "
          f"{result['mcclish_normalized']:<10.4f} "
          f"{result['ratio_to_random']:<10.2f} "
          f"{result['above_random_normalized']:<12.4f}")
 
print("\nInterpretation:")
print("-" * 80)
print("Raw:          Absolute area under curve in [0, beta] range")
print("Simple:       Fraction of maximum pAUC (range [0, 1])")
print("McClish:      Standardized to [0.5, 1.0] for comparability")
print("×Random:      Multiple of random classifier's pAUC")
print("Above-Rand:   Normalized improvement over random (like Gini for partial range)")

pAUC Normalization Methods Comparison
Method	Range	Random Value	Use Case
Raw pAUC	[0, β]	β²/2	Precise computation, research
Simple (pAUC/β)	[0, 1]	β/2	Intuitive interpretation
McClish	[0.5, 1.0]	0.5 + β/4	Comparison across β values
Ratio to Random	[0, 2/β]	1.0	Improvement over baseline
Above-Random	[-∞, 1]	0	Like Gini for partial range

Two-Way Partial AUC

Sometimes we need to constrain both FPR and TPR. Two-Way pAUC (or restricted pAUC) evaluates the ROC curve within a rectangular region.

Definition:

Given FPR range $[\alpha_1, \beta_1]$ and TPR range $[\alpha_2, \beta_2]$:

$$\text{pAUC}{\text{2way}} = \int{\text{FPR} \in [\alpha_1, \beta_1], \text{TPR} \in [\alpha_2, \beta_2]} \text{ROC region}$$

Use Cases:

TPR constraint: In fraud detection, we might require TPR ≥ 0.9 (catch 90% of frauds). We only care about the ROC region where this is satisfied.
FPR + TPR constraints: In medical screening, we might need:
- FPR ≤ 0.05 (acceptable false alarm rate)
- TPR ≥ 0.8 (minimum sensitivity)
Only the rectangular region satisfying both is relevant.

Computation:

Two-way pAUC requires identifying the portion of the ROC curve that falls within both constraints, which may be empty if the constraints are unsatisfiable by the model.

two_way_pauc.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def two_way_pauc(y_true, y_scores, 
                 fpr_range=(0.0, 0.1), 
                 tpr_range=(0.8, 1.0)):
    """
    Compute Two-Way Partial AUC.
    
    Computes area under ROC curve within specified FPR and TPR ranges.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores
        fpr_range: (min_fpr, max_fpr) tuple
        tpr_range: (min_tpr, max_tpr) tuple
    
    Returns:
        Dict with two-way pAUC and diagnostics
    """
    fpr, tpr, thresholds = roc_curve(y_true, y_scores)
    
    min_fpr, max_fpr = fpr_range
    min_tpr, max_tpr = tpr_range
    
    # Find points within both constraints
    mask = (fpr >= min_fpr) & (fpr <= max_fpr) & (tpr >= min_tpr) & (tpr <= max_tpr)
    
    fpr_filtered = fpr[mask]
    tpr_filtered = tpr[mask]
    
    if len(fpr_filtered) < 2:
        # Constraints not satisfiable or barely touched
        return {
            'two_way_pauc': 0.0,
            'fpr_range': fpr_range,
            'tpr_range': tpr_range,
            'points_in_region': len(fpr_filtered),
            'feasible': len(fpr_filtered) >= 2
        }
    
    # Compute area (simple trapezoidal for filtered region)
    raw_area = auc(fpr_filtered, tpr_filtered)
    
    # Subtract the lower TPR baseline within the FPR range
    # This accounts for the TPR floor
    fpr_width = fpr_filtered[-1] - fpr_filtered[0]
    baseline_area = min_tpr * fpr_width
    
    # The effective two-way pAUC measures area above the TPR floor
    effective_area = raw_area - baseline_area
    
    # Maximum possible (TPR = max_tpr throughout the FPR range)
    max_possible = (max_tpr - min_tpr) * (max_fpr - min_fpr)
    
    normalized = effective_area / max_possible if max_possible > 0 else 0.0
    
    return {
        'two_way_pauc': raw_area,
        'effective_area': effective_area,
        'normalized': normalized,
        'fpr_range': fpr_range,
        'tpr_range': tpr_range,
        'points_in_region': len(fpr_filtered),
        'feasible': True
    }
 
# Example: Fraud detection with constraints
np.random.seed(42)
n_fraud = 100
n_legit = 9900
 
# Model scores
scores_fraud = np.random.normal(0.75, 0.15, n_fraud)
scores_legit = np.random.normal(0.25, 0.2, n_legit)
 
y_true = np.concatenate([np.ones(n_fraud), np.zeros(n_legit)])
y_scores = np.concatenate([scores_fraud, scores_legit])
 
print("Two-Way pAUC Analysis")
print("=" * 60)
print("Scenario: Fraud detection")
print("  - Constraint 1: FPR ≤ 5% (can't investigate too many)")
print("  - Constraint 2: TPR ≥ 80% (must catch most fraud)")
print()
 
# Standard pAUC (one-way)
from sklearn.metrics import roc_auc_score
full_auc = roc_auc_score(y_true, y_scores)
print(f"Full AUC: {full_auc:.4f}")
 
# One-way pAUC (FPR constraint only)
fpr, tpr, _ = roc_curve(y_true, y_scores)
idx = np.where(fpr <= 0.05)[0]
one_way_pauc = auc(fpr[idx], tpr[idx]) if len(idx) >= 2 else 0
print(f"One-way pAUC (FPR ≤ 5%): {one_way_pauc:.6f}")
 
# Two-way pAUC
result = two_way_pauc(y_true, y_scores, 
                       fpr_range=(0.0, 0.05),
                       tpr_range=(0.8, 1.0))
 
print(f"\nTwo-way pAUC (FPR ≤ 5%, TPR ≥ 80%):")
print(f"  Raw area in region:    {result['two_way_pauc']:.6f}")
print(f"  Effective area:        {result['effective_area']:.6f}")
print(f"  Normalized:            {result['normalized']:.4f}")
print(f"  Points in region:      {result['points_in_region']}")
print(f"  Constraints feasible:  {result['feasible']}")

Relationship to Full AUC and Other Metrics

Understanding how pAUC relates to other metrics helps guide metric selection.

pAUC vs. Full AUC:

Aspect	Full AUC	pAUC
FPR range	[0, 1]	[α, β] (typically [0, β])
Interpretation	Overall discrimination	Discrimination in target regime
Sensitivity to	Entire ROC curve	Only specified region
Operational relevance	May include irrelevant regions	Focuses on deployable region

Key Insight:

Full AUC and pAUC can disagree on model rankings. Model A might have higher full AUC but lower pAUC in the target region, meaning Model B is better where it matters.

pAUC vs. TPR@FPR (Sensitivity at Specificity):

TPR@FPR is a point metric—the TPR at a single FPR threshold. pAUC is an area metric covering a range.

Metric	What it measures	Pros	Cons
TPR@FPR=0.05	Single point on ROC	Simple, specific	Single point = high variance
pAUC(0, 0.05)	Area under FPR ≤ 5%	Averages over range	Less specific

When Models Cross:

If two ROC curves cross within the target region, pAUC gives an average view. Point metrics (TPR@FPR) might prefer one model at some thresholds and another at different thresholds.

pauc_vs_full_auc.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import numpy as np
from sklearn.metrics import roc_curve, auc, roc_auc_score
 
def compare_models_pauc_vs_auc(models, y_true, max_fpr=0.05):
    """
    Compare models on full AUC vs pAUC.
    Demonstrates cases where rankings differ.
    """
    results = []
    
    for name, y_scores in models.items():
        fpr, tpr, _ = roc_curve(y_true, y_scores)
        full_auc = roc_auc_score(y_true, y_scores)
        
        # pAUC
        idx = np.where(fpr <= max_fpr)[0]
        if len(idx) >= 2:
            pauc = auc(fpr[idx], tpr[idx])
        else:
            pauc = 0.0
        
        # TPR at target FPR
        idx_at_fpr = np.searchsorted(fpr, max_fpr)
        if idx_at_fpr > 0:
            tpr_at_fpr = tpr[idx_at_fpr - 1]
        else:
            tpr_at_fpr = 0.0
        
        results.append({
            'name': name,
            'full_auc': full_auc,
            'pauc': pauc,
            'tpr_at_fpr': tpr_at_fpr
        })
    
    return results
 
# Create models with different characteristics
np.random.seed(42)
n_pos, n_neg = 200, 1800
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
 
# Model A: Good overall, weaker at low FPR
scores_a_pos = np.random.normal(0.65, 0.2, n_pos)
scores_a_neg = np.random.normal(0.35, 0.2, n_neg)
y_scores_a = np.concatenate([scores_a_pos, scores_a_neg])
 
# Model B: Slightly lower overall AUC, but stronger at low FPR
# Has a "fat tail" of very high-confidence positives
scores_b_pos = np.concatenate([
    np.random.normal(0.9, 0.05, int(n_pos * 0.2)),  # 20% very high confidence
    np.random.normal(0.5, 0.2, int(n_pos * 0.8))    # 80% moderate
])
scores_b_neg = np.random.normal(0.4, 0.15, n_neg)
y_scores_b = np.concatenate([scores_b_pos, scores_b_neg])
 
models = {
    'Model A (balanced)': y_scores_a,
    'Model B (high-conf tail)': y_scores_b
}
 
print("Comparing Models: Full AUC vs pAUC")
print("=" * 70)
print()
 
for max_fpr in [0.01, 0.05, 0.10]:
    results = compare_models_pauc_vs_auc(models, y_true, max_fpr)
    
    print(f"Max FPR = {max_fpr:.0%}")
    print("-" * 60)
    print(f"{'Model':<25} {'Full AUC':<12} {'pAUC':<12} {'TPR@FPR':<12}")
    print("-" * 60)
    
    for r in results:
        print(f"{r['name']:<25} {r['full_auc']:<12.4f} "
              f"{r['pauc']:<12.6f} {r['tpr_at_fpr']:<12.4f}")
    
    # Determine winner for each metric
    full_auc_winner = max(results, key=lambda x: x['full_auc'])['name']
    pauc_winner = max(results, key=lambda x: x['pauc'])['name']
    
    print()
    print(f"Full AUC winner: {full_auc_winner.split()[0]} {full_auc_winner.split()[1]}")
    print(f"pAUC winner:     {pauc_winner.split()[0]} {pauc_winner.split()[1]}")
    
    if full_auc_winner != pauc_winner:
        print("  >>> DIFFERENT WINNERS! pAUC reveals different operational performance.")
    print()

Marketing vs. Reality

Practical Applications of pAUC

pAUC Applications by Domain
Domain	Typical β (max FPR)	Rationale
Cancer Screening	0.01 - 0.05	False alarms cause anxiety, unnecessary biopsies
Fraud Detection	0.01 - 0.10	Can't manually investigate more than ~1-10% of transactions
Security Screening (Airport)	0.001 - 0.01	Massive passenger volume; even 1% FPR is millions of delays
Spam Filtering	0.001 - 0.01	False positives lose important emails
Disease Diagnosis	0.05 - 0.20	Cost of false positives depends on follow-up procedures
Credit Scoring	0.05 - 0.15	Regulatory and business constraints on rejection rates

Case Study: Mammography Screening

In breast cancer screening:

Millions of mammograms performed annually
FPR = 10% means millions of unnecessary callbacks, biopsies, and patient anxiety
Regulatory bodies focus on TPR at FPR = 0.1 (what fraction of cancers are caught at 10% false recall rate?)
pAUC(0, 0.1) is more operationally meaningful than full AUC

Case Study: Credit Card Fraud

In real-time fraud detection:

Billions of transactions globally
FPR = 1% means millions of legitimate transactions blocked
Cost of FP (angry customer, lost sale) vs. FN (fraud loss) is highly asymmetric
pAUC(0, 0.01) focuses on the tiny viable operating region

Case Study: Rare Disease Screening

Prevalence might be 1 in 10,000
Even FPR = 0.01 means 100 FPs per TP (if prevalence = 0.01%)
pAUC at extremely low FPR (0.001) becomes relevant
This is where most models fail and where gains matter most

Statistical Considerations for pAUC

pAUC has unique statistical properties that affect confidence intervals and significance testing.

Variance of pAUC:

The variance of pAUC depends on:

Sample sizes of positive and negative classes
The FPR range [α, β]
The true ROC curve shape in that region

Intuition: Narrower FPR ranges use fewer negative examples to estimate the curve, increasing variance. pAUC(0, 0.01) has higher variance than pAUC(0, 0.10).

Confidence Intervals:

Bootstrap is the most common approach:

Resample positives and negatives with replacement
Compute pAUC for bootstrap sample
Repeat 1000-10000 times
Take percentiles for CI

Significance Testing:

To compare two models on pAUC:

Paired bootstrap: Resample the same indices for both models, compute pAUC difference, build CI
DeLong-like methods: Extensions of DeLong's method (for AUC) to pAUC exist but are less common
Permutation tests: Shuffle model labels and compute pAUC differences under the null

pauc_statistics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
import numpy as np
from sklearn.metrics import roc_curve, auc
 
def pauc_with_bootstrap_ci(y_true, y_scores, max_fpr=0.1,
                            n_bootstrap=2000, confidence=0.95):
    """
    Compute pAUC with bootstrap confidence interval.
    
    Args:
        y_true: Binary labels
        y_scores: Predicted scores
        max_fpr: Maximum FPR for partial AUC
        n_bootstrap: Number of bootstrap iterations
        confidence: Confidence level for CI
    
    Returns:
        Dict with pAUC estimate and confidence interval
    """
    def compute_pauc(y_t, y_s, max_f):
        fpr, tpr, _ = roc_curve(y_t, y_s)
        idx = np.where(fpr <= max_f)[0]
        return auc(fpr[idx], tpr[idx]) if len(idx) >= 2 else 0.0
    
    # Point estimate
    pauc_estimate = compute_pauc(y_true, y_scores, max_fpr)
    
    # Bootstrap
    rng = np.random.default_rng(42)
    n = len(y_true)
    bootstrap_paucs = []
    
    for _ in range(n_bootstrap):
        idx = rng.choice(n, size=n, replace=True)
        y_t_boot = y_true[idx]
        y_s_boot = y_scores[idx]
        
        # Ensure both classes present
        if len(np.unique(y_t_boot)) < 2:
            continue
        
        pauc_boot = compute_pauc(y_t_boot, y_s_boot, max_fpr)
        bootstrap_paucs.append(pauc_boot)
    
    bootstrap_paucs = np.array(bootstrap_paucs)
    
    alpha = 1 - confidence
    ci_lower = np.percentile(bootstrap_paucs, 100 * alpha / 2)
    ci_upper = np.percentile(bootstrap_paucs, 100 * (1 - alpha / 2))
    
    return {
        'pauc': pauc_estimate,
        'ci_lower': ci_lower,
        'ci_upper': ci_upper,
        'bootstrap_std': np.std(bootstrap_paucs),
        'max_fpr': max_fpr,
        'n_bootstrap_valid': len(bootstrap_paucs)
    }
 
def compare_models_pauc_significance(y_true, y_scores_a, y_scores_b, 
                                      max_fpr=0.1, n_bootstrap=5000):
    """
    Compare two models' pAUC with significance testing.
    """
    def compute_pauc(y_t, y_s, max_f):
        fpr, tpr, _ = roc_curve(y_t, y_s)
        idx = np.where(fpr <= max_f)[0]
        return auc(fpr[idx], tpr[idx]) if len(idx) >= 2 else 0.0
    
    pauc_a = compute_pauc(y_true, y_scores_a, max_fpr)
    pauc_b = compute_pauc(y_true, y_scores_b, max_fpr)
    observed_diff = pauc_a - pauc_b
    
    # Paired bootstrap
    rng = np.random.default_rng(42)
    n = len(y_true)
    boot_diffs = []
    
    for _ in range(n_bootstrap):
        idx = rng.choice(n, size=n, replace=True)
        y_t_boot = y_true[idx]
        
        if len(np.unique(y_t_boot)) < 2:
            continue
        
        pauc_a_boot = compute_pauc(y_t_boot, y_scores_a[idx], max_fpr)
        pauc_b_boot = compute_pauc(y_t_boot, y_scores_b[idx], max_fpr)
        boot_diffs.append(pauc_a_boot - pauc_b_boot)
    
    boot_diffs = np.array(boot_diffs)
    ci_lower = np.percentile(boot_diffs, 2.5)
    ci_upper = np.percentile(boot_diffs, 97.5)
    
    significant = ci_lower > 0 or ci_upper < 0
    
    return {
        'pauc_a': pauc_a,
        'pauc_b': pauc_b,
        'difference': observed_diff,
        'ci_lower': ci_lower,
        'ci_upper': ci_upper,
        'significant_95': significant
    }
 
# Example
np.random.seed(42)
n_pos, n_neg = 100, 900
 
scores_pos_a = np.random.normal(0.7, 0.2, n_pos)
scores_neg_a = np.random.normal(0.3, 0.2, n_neg)
y_true = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])
y_scores_a = np.concatenate([scores_pos_a, scores_neg_a])
 
# Slightly better model
scores_pos_b = np.random.normal(0.72, 0.18, n_pos)
scores_neg_b = np.random.normal(0.28, 0.18, n_neg)
y_scores_b = np.concatenate([scores_pos_b, scores_neg_b])
 
print("pAUC with Bootstrap Confidence Intervals")
print("=" * 60)
 
for max_fpr in [0.05, 0.10]:
    result = pauc_with_bootstrap_ci(y_true, y_scores_a, max_fpr)
    print(f"\nMax FPR = {max_fpr:.0%}:")
    print(f"  pAUC = {result['pauc']:.6f}")
    print(f"  95% CI: [{result['ci_lower']:.6f}, {result['ci_upper']:.6f}]")
    print(f"  Bootstrap std: {result['bootstrap_std']:.6f}")
 
print("\nModel Comparison:")
comparison = compare_models_pauc_significance(y_true, y_scores_a, y_scores_b, 0.1)
print(f"  Model A pAUC: {comparison['pauc_a']:.6f}")
print(f"  Model B pAUC: {comparison['pauc_b']:.6f}")
print(f"  Difference (A - B): {comparison['difference']:.6f}")
print(f"  95% CI: [{comparison['ci_lower']:.6f}, {comparison['ci_upper']:.6f}]")
print(f"  Significant: {comparison['significant_95']}")

Common Pitfalls and Best Practices

Common Pitfalls

•Comparing raw pAUC across different β values: pAUC(0, 0.05) is not comparable to pAUC(0, 0.10). Use normalized versions or be explicit about parameters.
•Ignoring the variance: pAUC at very low FPR has high variance. Always compute CIs.
•Insufficient negative samples: Computing pAUC at FPR = 0.01 with only 100 negatives means just 1 false positive — extremely noisy.
•Interpolation issues: At extreme FPR values, the ROC curve is a step function. Interpolation methods matter for consistency.
•Optimizing for pAUC directly without regularization: Can lead to overfitting to the specific FPR range.

Best Practices

•Document the FPR range: Always specify β when reporting pAUC. "pAUC = 0.04" means nothing without knowing the range.
•Report with normalization: McClish normalization or simple normalization aids interpretation.
•Include full AUC for context: pAUC alone doesn't show overall discrimination; report both.
•Use adequate sample sizes: For pAUC(0, β), need at least 100/β negatives for stable estimation.
•Report multiple β values: pAUC at β = 0.01, 0.05, 0.10 shows performance across operating regimes.
•Bootstrap for CIs: Analytical CIs are complex; bootstrap is robust and easy.

Summary: Partial AUC

We have established pAUC as the essential metric for evaluating classifiers in constrained operating regimes. Let's consolidate our understanding:

Key Takeaways

•pAUC focuses on operationally relevant FPR ranges — evaluates performance where the model will actually be deployed
•Raw pAUC ranges from 0 to β — use normalization for interpretable [0,1] scale (McClish or simple)
•Full AUC can be misleading — a model with higher full AUC may have lower pAUC in your target regime
•Two-way pAUC constrains both FPR and TPR, useful when both have operational limits
•Higher variance at low FPR — pAUC(0, 0.01) needs large negative samples; always bootstrap for CIs
•Always specify β — pAUC values are meaningless without knowing the FPR range
•Report alongside full AUC — gives complete picture of model discrimination

Mathematical Summary:

$$\text{pAUC}(0, \beta) = \int_{0}^{\beta} \text{TPR}(\text{FPR}) , d(\text{FPR})$$

Normalizations:

Simple: $\text{pAUC} / \beta$ → [0, 1]
McClish: $0.5(1 + \text{pAUC}/\beta)$ → [0.5, 1]

Module Complete:

With pAUC, we conclude our exploration of ranking metrics. You now have a comprehensive toolkit for evaluating ranked systems:

Metric	Best For
P@k, R@k	Simple cutoff evaluation
MAP	Binary relevance, all positions
NDCG	Graded relevance
MRR	Single-answer retrieval
pAUC	Constrained FPR operation

Module Complete