Bias Detection Mitigation - Learning Module

Loading content...

0/278

Post-processing Methods

The Post-processing Philosophy

Post-processing methods occupy a unique position in the fairness intervention landscape: they act on a trained model's outputs without modifying the model itself. This seemingly simple constraint has profound practical implications.

When you cannot retrain a model—perhaps due to computational cost, proprietary constraints, or regulatory requirements to preserve an audited model—post-processing is your only option. When fairness requirements differ across deployment contexts, post-processing enables context-specific adjustments without maintaining multiple models. And when you need to understand the cost of fairness separate from model quality, post-processing makes this analysis transparent.

Learning Objectives

By the end of this page, you will be able to: (1) Apply threshold optimization to satisfy demographic parity or equalized odds, (2) Implement calibration-based post-processing methods, (3) Use reject option classification to handle uncertain predictions fairly, (4) Apply the Hardt et al. (2016) equalized odds post-processing algorithm, (5) Understand the theoretical foundations and limitations of post-processing approaches.

The Post-processing Setup:

Given:

A trained classifier $h: \mathcal{X} \rightarrow [0, 1]$ (outputs probability scores)
Protected attribute $A \in {0, 1}$
Fairness criterion $\mathcal{F}$

Find a transformation $T$ such that the post-processed classifier $\hat{Y} = T(h(X), A)$ satisfies $\mathcal{F}$ while minimizing accuracy loss.

Key Insight: Unlike in-processing, post-processing explicitly uses the protected attribute $A$ at prediction time. This is both its power (can directly condition on group membership) and its challenge (may not be legally or practically feasible to access $A$ at deployment).

Threshold Optimization

The simplest post-processing approach uses group-specific classification thresholds. Rather than applying a single threshold $\tau$ to convert probabilities to predictions, we use different thresholds for different groups:

$$\hat{Y}(x, a) = \mathbb{1}[h(x) \geq \tau_a]$$

Mathematical Foundation:

For demographic parity, we want: $$P(\hat{Y} = 1 | A = 0) = P(\hat{Y} = 1 | A = 1)$$

For a given model $h$, this translates to finding thresholds $\tau_0, \tau_1$ such that: $$P(h(X) \geq \tau_0 | A = 0) = P(h(X) \geq \tau_1 | A = 1)$$

These are the $(1-p)$-quantiles of the group-conditional score distributions, where $p$ is the target positive rate.

Algorithm: Threshold Search for Demographic Parity

Choose target positive rate $p \in [0, 1]$
For each group $a$:
- Compute the $(1-p)$-quantile of scores: $\tau_a = Q_{1-p}({h(x_i) : a_i = a})$
Apply group-specific thresholds: $\hat{y}i = \mathbb{1}[h(x_i) \geq \tau{a_i}]$

The choice of $p$ controls the overall positive rate. Common choices:

Population positive rate: $p = P(Y = 1)$ (maintains overall base rate)
Minimum group rate: $p = \min_a P(h(X) \geq 0.5 | A = a)$ (matches lowest group)
Accuracy-maximizing: Optimize $p$ to maximize accuracy subject to fairness

Threshold Optimization Implementation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
import numpy as np
from typing import Dict, Tuple, Optional
from scipy.optimize import minimize_scalar
 
class ThresholdOptimizer:
    """
    Post-processing fairness via group-specific thresholds.
    
    Supports demographic parity and equal opportunity constraints.
    """
    
    def __init__(self, fairness_criterion: str = 'demographic_parity'):
        """
        Args:
            fairness_criterion: 'demographic_parity' or 'equal_opportunity'
        """
        self.fairness_criterion = fairness_criterion
        self.thresholds_ = {}
        
    def fit(self, 
            scores: np.ndarray, 
            protected: np.ndarray, 
            labels: Optional[np.ndarray] = None,
            target_rate: Optional[float] = None) -> 'ThresholdOptimizer':
        """
        Learn group-specific thresholds to satisfy fairness criterion.
        
        Args:
            scores: Model probability outputs (n_samples,)
            protected: Protected attribute values (n_samples,)
            labels: True labels (required for equal_opportunity)
            target_rate: Target positive rate (if None, uses overall rate)
        """
        groups = np.unique(protected)
        
        if self.fairness_criterion == 'demographic_parity':
            # Find thresholds that equalize positive prediction rates
            if target_rate is None:
                # Use overall rate at default threshold 0.5
                target_rate = np.mean(scores >= 0.5)
            
            for group in groups:
                group_mask = protected == group
                group_scores = scores[group_mask]
                # Find threshold giving target_rate positive predictions
                self.thresholds_[group] = np.quantile(group_scores, 1 - target_rate)
                
        elif self.fairness_criterion == 'equal_opportunity':
            assert labels is not None, "Labels required for equal opportunity"
            
            # Find thresholds that equalize TPR among positive examples
            if target_rate is None:
                # Use overall TPR at threshold 0.5
                pos_mask = labels == 1
                target_rate = np.mean(scores[pos_mask] >= 0.5)
            
            for group in groups:
                group_mask = (protected == group) & (labels == 1)
                if group_mask.sum() == 0:
                    self.thresholds_[group] = 0.5
                    continue
                group_scores = scores[group_mask]
                self.thresholds_[group] = np.quantile(group_scores, 1 - target_rate)
        
        return self
    
    def predict(self, scores: np.ndarray, protected: np.ndarray) -> np.ndarray:
        """
        Apply group-specific thresholds to get fair predictions.
        
        Args:
            scores: Model probability outputs (n_samples,)
            protected: Protected attribute values (n_samples,)
            
        Returns:
            Binary predictions (n_samples,)
        """
        predictions = np.zeros(len(scores), dtype=int)
        
        for group, threshold in self.thresholds_.items():
            mask = protected == group
            predictions[mask] = (scores[mask] >= threshold).astype(int)
        
        return predictions
    
    def fit_predict(self, 
                    scores: np.ndarray, 
                    protected: np.ndarray,
                    labels: Optional[np.ndarray] = None,
                    target_rate: Optional[float] = None) -> np.ndarray:
        """Fit and predict in one step."""
        return self.fit(scores, protected, labels, target_rate).predict(scores, protected)
 
 
def find_optimal_target_rate(scores: np.ndarray,
                             protected: np.ndarray,
                             labels: np.ndarray,
                             fairness_criterion: str = 'demographic_parity') -> Tuple[float, float]:
    """
    Find the target rate that maximizes accuracy while satisfying fairness.
    
    Returns:
        (optimal_rate, best_accuracy)
    """
    def neg_accuracy(rate):
        optimizer = ThresholdOptimizer(fairness_criterion)
        preds = optimizer.fit_predict(scores, protected, labels, target_rate=rate)
        return -np.mean(preds == labels)
    
    result = minimize_scalar(neg_accuracy, bounds=(0.01, 0.99), method='bounded')
    return result.x, -result.fun
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 3000
    
    # Biased model scores
    protected = np.random.binomial(1, 0.4, n)
    labels = np.random.binomial(1, 0.5, n)
    
    # Simulate biased model: higher scores for protected=0
    base_scores = 0.3 + 0.4 * labels + np.random.normal(0, 0.2, n)
    scores = base_scores - 0.2 * (protected == 1)  # Bias against group 1
    scores = np.clip(scores, 0, 1)
    
    print("Original (threshold=0.5):")
    orig_preds = (scores >= 0.5).astype(int)
    print(f"  P(Ŷ=1|A=0) = {np.mean(orig_preds[protected==0]):.3f}")
    print(f"  P(Ŷ=1|A=1) = {np.mean(orig_preds[protected==1]):.3f}")
    print(f"  Accuracy = {np.mean(orig_preds == labels):.3f}")
    
    # Apply fair post-processing
    optimizer = ThresholdOptimizer('demographic_parity')
    fair_preds = optimizer.fit_predict(scores, protected, labels)
    
    print("
After threshold optimization:")
    print(f"  P(Ŷ=1|A=0) = {np.mean(fair_preds[protected==0]):.3f}")
    print(f"  P(Ŷ=1|A=1) = {np.mean(fair_preds[protected==1]):.3f}")
    print(f"  Accuracy = {np.mean(fair_preds == labels):.3f}")
    print(f"  Thresholds: {optimizer.thresholds_}")

Threshold Optimization Limitations

Group-specific thresholds can only achieve fairness if the score distributions for different groups overlap. If one group's scores are uniformly higher, no threshold choice can equalize rates without harming accuracy severely. This occurs when the underlying model is severely biased—post-processing cannot fix fundamentally broken predictions.

Equalized Odds Post-processing

The seminal work by Hardt, Price, and Srebro (2016) introduced a rigorous post-processing method for achieving equalized odds: equal true positive rates (TPR) and false positive rates (FPR) across groups.

The Key Insight:

Instead of directly modifying predictions, we learn a randomized prediction rule that mixes the original prediction with controlled randomization:

$$\hat{Y}{fair}(x, a, y) = \begin{cases} \hat{Y}(x) & \text{with probability } p{a,y,\hat{Y}(x)} 1 - \hat{Y}(x) & \text{with probability } 1 - p_{a,y,\hat{Y}(x)} \end{cases}$$

The probabilities $p$ are chosen to satisfy equalized odds constraints.

Mathematical Formulation:

We parametrize the post-processing as a set of probabilities for each $(a, y, \hat{y})$ combination. Let:

$p_{a,1,1}$ = P(keep positive prediction for group $a$, actual positive)
$p_{a,0,1}$ = P(keep positive prediction for group $a$, actual negative)
Similar for negative predictions

Constraints:

Valid Probabilities: $0 \leq p \leq 1$
Equalized Odds:
- TPR constraint: $E[\hat{Y}{fair} | Y=1, A=0] = E[\hat{Y}{fair} | Y=1, A=1]$
- FPR constraint: $E[\hat{Y}{fair} | Y=0, A=0] = E[\hat{Y}{fair} | Y=0, A=1]$

Objective: Minimize classification loss subject to constraints: $$\min_p E[L(\hat{Y}_{fair}, Y)] \quad \text{s.t. equalized odds constraints}$$

This is a linear program! The constraints are linear in $p$, and error rate objectives are linear too.

Equalized Odds Post-processing (Hardt et al.)
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
import numpy as np
from scipy.optimize import linprog
from typing import Dict, Tuple
 
class EqualizedOddsPostProcessor:
    """
    Post-processing for equalized odds following Hardt et al. (2016).
    
    Learns a randomized prediction rule that satisfies equalized odds
    (equal TPR and FPR across groups) while minimizing error.
    """
    
    def __init__(self):
        self.mixing_rates_ = {}
        
    def fit(self, 
            predictions: np.ndarray, 
            protected: np.ndarray, 
            labels: np.ndarray) -> 'EqualizedOddsPostProcessor':
        """
        Learn mixing rates to achieve equalized odds.
        
        Args:
            predictions: Binary predictions from original model (0 or 1)
            protected: Binary protected attribute
            labels: True binary labels
        """
        groups = np.unique(protected)
        assert len(groups) == 2, "Binary protected attribute required"
        
        # Compute confusion matrix rates for each group
        rates = {}
        for a in groups:
            mask = protected == a
            for y in [0, 1]:
                y_mask = mask & (labels == y)
                if y_mask.sum() == 0:
                    rates[(a, y, 0)] = 0
                    rates[(a, y, 1)] = 0
                else:
                    rates[(a, y, 0)] = np.mean(predictions[y_mask] == 0)  # P(Ŷ=0|A=a,Y=y)
                    rates[(a, y, 1)] = np.mean(predictions[y_mask] == 1)  # P(Ŷ=1|A=a,Y=y)
        
        # For simplicity, we find thresholds that roughly equalize
        # Full LP solution would find optimal mixing probabilities
        
        # Target rates: average across groups
        target_tpr = np.mean([
            rates[(a, 1, 1)] for a in groups
        ])
        target_fpr = np.mean([
            rates[(a, 0, 1)] for a in groups
        ])
        
        # For each group, compute mixing probabilities
        for a in groups:
            current_tpr = rates[(a, 1, 1)]  # P(Ŷ=1|Y=1,A=a)
            current_fpr = rates[(a, 0, 1)]  # P(Ŷ=1|Y=0,A=a)
            
            # We need to adjust predictions to reach target rates
            # For positive labels (Y=1): need to reach target_tpr
            # For negative labels (Y=0): need to reach target_fpr
            
            self.mixing_rates_[a] = {
                'tpr_current': current_tpr,
                'tpr_target': target_tpr,
                'fpr_current': current_fpr,
                'fpr_target': target_fpr
            }
        
        return self
    
    def predict(self, 
                predictions: np.ndarray, 
                protected: np.ndarray,
                labels: Optional[np.ndarray] = None) -> np.ndarray:
        """
        Apply randomized post-processing to achieve equalized odds.
        
        Note: Full equalized odds requires knowing Y at prediction time,
        which is often not available. This implementation provides an
        approximation that uses predicted Y when actual Y is unavailable.
        """
        n = len(predictions)
        fair_predictions = predictions.copy()
        
        for a in np.unique(protected):
            mask = protected == a
            rates = self.mixing_rates_.get(a, None)
            if rates is None:
                continue
            
            # For fair equalized odds at prediction time,
            # we use stochastic prediction adjustment
            group_preds = predictions[mask]
            
            # Adjust positive predictions
            pos_mask = mask & (predictions == 1)
            if pos_mask.sum() > 0:
                # If we're over-predicting positive, flip some to negative
                current_rate = np.mean(group_preds == 1)
                target_rate = (rates['tpr_target'] + rates['fpr_target']) / 2
                if current_rate > target_rate:
                    flip_prob = (current_rate - target_rate) / current_rate
                    flip = np.random.rand(pos_mask.sum()) < flip_prob
                    fair_predictions[np.where(pos_mask)[0][flip]] = 0
            
            # Adjust negative predictions
            neg_mask = mask & (predictions == 0)
            if neg_mask.sum() > 0:
                current_rate = np.mean(group_preds == 0)
                target_neg_rate = 1 - (rates['tpr_target'] + rates['fpr_target']) / 2
                if current_rate > target_neg_rate:
                    flip_prob = (current_rate - target_neg_rate) / current_rate
                    flip = np.random.rand(neg_mask.sum()) < flip_prob
                    fair_predictions[np.where(neg_mask)[0][flip]] = 1
        
        return fair_predictions
 
 
def compute_equalized_odds_violation(predictions: np.ndarray,
                                      protected: np.ndarray,
                                      labels: np.ndarray) -> Dict[str, float]:
    """Compute equalized odds metrics."""
    groups = np.unique(protected)
    
    metrics = {}
    for a in groups:
        mask = protected == a
        pos_mask = mask & (labels == 1)
        neg_mask = mask & (labels == 0)
        
        tpr = np.mean(predictions[pos_mask]) if pos_mask.sum() > 0 else 0
        fpr = np.mean(predictions[neg_mask]) if neg_mask.sum() > 0 else 0
        metrics[f'TPR_group_{a}'] = tpr
        metrics[f'FPR_group_{a}'] = fpr
    
    metrics['TPR_gap'] = abs(metrics['TPR_group_0'] - metrics['TPR_group_1'])
    metrics['FPR_gap'] = abs(metrics['FPR_group_0'] - metrics['FPR_group_1'])
    
    return metrics
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 3000
    
    protected = np.random.binomial(1, 0.5, n)
    labels = np.random.binomial(1, 0.5, n)
    
    # Biased predictions: higher TPR for group 0, higher FPR for group 1
    predictions = labels.copy()
    # Add noise
    flip = np.random.rand(n) < 0.2
    predictions[flip] = 1 - predictions[flip]
    # Add group bias
    predictions[(protected == 1) & (labels == 0)] = (
        np.random.rand(((protected == 1) & (labels == 0)).sum()) < 0.4
    ).astype(int)
    
    print("Original predictions:")
    metrics = compute_equalized_odds_violation(predictions, protected, labels)
    for k, v in metrics.items():
        print(f"  {k}: {v:.3f}")
    
    # Apply post-processing
    processor = EqualizedOddsPostProcessor()
    processor.fit(predictions, protected, labels)
    fair_predictions = processor.predict(predictions, protected)
    
    print("
After equalized odds post-processing:")
    metrics = compute_equalized_odds_violation(fair_predictions, protected, labels)
    for k, v in metrics.items():
        print(f"  {k}: {v:.3f}")

The Label Requirement Challenge

True equalized odds post-processing requires knowing the actual label Y at prediction time to apply the correct randomization. In practice, Y is unknown at prediction time. Solutions include: (1) Using estimated Y from the model, (2) Applying 'demographic parity-style' adjustments that don't require Y, (3) Using calibrated probabilities as soft labels.

Calibration-Based Post-processing

Calibration ensures that predicted probabilities reflect true outcome frequencies: when a model predicts 0.8, the actual positive rate should be 80%. Group-specific calibration extends this to require calibration within each protected group.

Why Calibration Matters for Fairness:

A model can achieve demographic parity in predictions while being poorly calibrated for minority groups—predicting 0.7 when the true rate is 0.3. This leads to unfair downstream decisions even when aggregate statistics look fair.

Formal Definition:

A classifier $h$ is perfectly calibrated if: $$P(Y = 1 | h(X) = p) = p \quad \text{for all } p \in [0, 1]$$

A classifier has group-specific calibration if: $$P(Y = 1 | h(X) = p, A = a) = p \quad \text{for all } p, a$$

Pleiss et al. (2017) - Calibration for Fairness:

This work proposes multi-calibration: learn separate calibration functions for different groups. The process:

Train Base Model: Learn $h(x)$ on full data
Fit Group Calibrators: For each group $a$, fit calibration function $c_a$: $$\hat{p}_{calibrated}(x, a) = c_a(h(x))$$
Apply at Prediction: Use group-specific calibrators

Common Calibration Methods:

Platt Scaling: Logistic regression from scores to labels $$c(s) = \sigma(w \cdot s + b)$$
Isotonic Regression: Non-parametric, monotonic calibration $$c(s) = \text{isotonic_regression}(s, y)$$
Beta Calibration: More flexible parametric mapping $$c(s) = \frac{1}{1 + e^{a \log(s) + b \log(1-s) + c}}$$

Group-Specific Calibration
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
import numpy as np
from sklearn.isotonic import IsotonicRegression
from sklearn.linear_model import LogisticRegression
from typing import Dict
 
class GroupSpecificCalibrator:
    """
    Post-processing that applies group-specific calibration.
    
    Ensures predicted probabilities are well-calibrated within each group,
    which is a form of fairness (equal meaning of probability across groups).
    """
    
    def __init__(self, method: str = 'isotonic'):
        """
        Args:
            method: 'isotonic' for non-parametric, 'platt' for logistic scaling
        """
        self.method = method
        self.calibrators_ = {}
        
    def fit(self, 
            scores: np.ndarray, 
            labels: np.ndarray, 
            protected: np.ndarray) -> 'GroupSpecificCalibrator':
        """
        Fit group-specific calibration functions.
        
        Args:
            scores: Model probability outputs (n_samples,)
            labels: True binary labels (n_samples,)
            protected: Protected attribute values (n_samples,)
        """
        for group in np.unique(protected):
            mask = protected == group
            group_scores = scores[mask]
            group_labels = labels[mask]
            
            if self.method == 'isotonic':
                calibrator = IsotonicRegression(
                    y_min=0.0, y_max=1.0, 
                    out_of_bounds='clip'
                )
                calibrator.fit(group_scores, group_labels)
                
            elif self.method == 'platt':
                calibrator = LogisticRegression()
                calibrator.fit(group_scores.reshape(-1, 1), group_labels)
                
            self.calibrators_[group] = calibrator
        
        return self
    
    def calibrate(self, 
                  scores: np.ndarray, 
                  protected: np.ndarray) -> np.ndarray:
        """
        Apply group-specific calibration to scores.
        
        Returns:
            Calibrated probabilities (n_samples,)
        """
        calibrated = np.zeros_like(scores)
        
        for group, calibrator in self.calibrators_.items():
            mask = protected == group
            group_scores = scores[mask]
            
            if self.method == 'isotonic':
                calibrated[mask] = calibrator.predict(group_scores)
            elif self.method == 'platt':
                calibrated[mask] = calibrator.predict_proba(
                    group_scores.reshape(-1, 1)
                )[:, 1]
        
        return calibrated
    
    def fit_calibrate(self, 
                      scores: np.ndarray, 
                      labels: np.ndarray, 
                      protected: np.ndarray) -> np.ndarray:
        """Fit and calibrate in one step."""
        return self.fit(scores, labels, protected).calibrate(scores, protected)
 
 
def expected_calibration_error(scores: np.ndarray, 
                                labels: np.ndarray, 
                                n_bins: int = 10) -> float:
    """
    Compute Expected Calibration Error (ECE).
    
    Lower is better. 0 means perfect calibration.
    """
    bin_boundaries = np.linspace(0, 1, n_bins + 1)
    ece = 0.0
    
    for i in range(n_bins):
        bin_mask = (scores >= bin_boundaries[i]) & (scores < bin_boundaries[i + 1])
        if bin_mask.sum() == 0:
            continue
        
        bin_accuracy = np.mean(labels[bin_mask])
        bin_confidence = np.mean(scores[bin_mask])
        bin_weight = bin_mask.sum() / len(scores)
        
        ece += bin_weight * abs(bin_accuracy - bin_confidence)
    
    return ece
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 3000
    
    protected = np.random.binomial(1, 0.5, n)
    labels = np.random.binomial(1, 0.5, n)
    
    # Generate miscalibrated scores with group-specific miscalibration
    base_scores = 0.2 + 0.6 * labels + np.random.normal(0, 0.15, n)
    # Group 0: overconfident, Group 1: underconfident
    scores = base_scores.copy()
    scores[protected == 0] = np.clip(base_scores[protected == 0] * 1.3, 0, 1)
    scores[protected == 1] = np.clip(base_scores[protected == 1] * 0.7, 0, 1)
    
    print("Before calibration:")
    for a in [0, 1]:
        mask = protected == a
        ece = expected_calibration_error(scores[mask], labels[mask])
        print(f"  Group {a} ECE: {ece:.4f}")
    
    # Apply group-specific calibration
    calibrator = GroupSpecificCalibrator(method='isotonic')
    calibrated = calibrator.fit_calibrate(scores, labels, protected)
    
    print("
After group-specific calibration:")
    for a in [0, 1]:
        mask = protected == a
        ece = expected_calibration_error(calibrated[mask], labels[mask])
        print(f"  Group {a} ECE: {ece:.4f}")

Calibration vs. Other Fairness Criteria

Group-specific calibration is sometimes called 'predictive parity' or 'sufficiency.' Due to impossibility results, it often conflicts with demographic parity or equalized odds when base rates differ. Calibration focuses on what predictions mean rather than how they're distributed—important for decision-makers interpreting probabilities.

Reject Option Classification

Reject option classification allows the model to abstain from making predictions when confidence is low—and uses this abstention strategically to improve fairness. Predictions near the decision boundary (where discrimination is most likely) can be handled differently.

The Core Idea (Kamiran et al., 2012):

Define a critical region around the decision boundary: scores in $[\tau - \Delta, \tau + \Delta]$
For samples in the critical region, instead of using the standard prediction, make fairness-favoring decisions

This focuses fairness interventions on uncertain predictions where the model's decision is essentially arbitrary anyway.

Algorithm:

For a classifier with scores $s(x) \in [0, 1]$ and threshold $\tau = 0.5$:

1. Define critical region: [τ - Δ, τ + Δ]
2. For each sample x with group a:
   - If s(x) outside critical region: use normal prediction
   - If s(x) in critical region:
     - If group a is underprivileged and s(x) > τ - Δ: predict positive (promote)
     - If group a is privileged and s(x) < τ + Δ: predict negative (demote)

The Philosophy:

When the model is uncertain (score near 0.5), it's essentially random whether we predict positive or negative. By tilting these uncertain predictions toward the disadvantaged group, we improve fairness without overriding confident predictions.

Choosing Δ:

Larger Δ: More samples subject to fairness intervention, greater fairness improvement
Smaller Δ: Fewer interventions, less accuracy impact
Cross-validate to find optimal balance

Reject Option Classification
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
import numpy as np
from typing import Tuple
 
class RejectOptionClassifier:
    """
    Fairness-aware classification using reject option.
    
    Modifies predictions in the critical region (near decision boundary)
    to favor the underprivileged group, improving demographic parity.
    """
    
    def __init__(self, 
                 critical_width: float = 0.1,
                 threshold: float = 0.5):
        """
        Args:
            critical_width: Width of critical region (Δ)
            threshold: Classification threshold (τ)
        """
        self.critical_width = critical_width
        self.threshold = threshold
        self.underprivileged_group_ = None
        
    def fit(self, 
            scores: np.ndarray, 
            protected: np.ndarray,
            labels: np.ndarray = None) -> 'RejectOptionClassifier':
        """
        Identify the underprivileged group based on positive prediction rates.
        
        Args:
            scores: Model probability outputs
            protected: Protected attribute values (binary)
            labels: Optional, not used but kept for API consistency
        """
        groups = np.unique(protected)
        
        # Compute positive prediction rate for each group
        rates = {}
        for g in groups:
            mask = protected == g
            rates[g] = np.mean(scores[mask] >= self.threshold)
        
        # Underprivileged group has lower positive rate
        self.underprivileged_group_ = min(rates.keys(), key=lambda g: rates[g])
        
        return self
    
    def predict(self, 
                scores: np.ndarray, 
                protected: np.ndarray) -> np.ndarray:
        """
        Make fairness-adjusted predictions using reject option.
        
        Returns:
            Binary predictions with reject option applied
        """
        predictions = (scores >= self.threshold).astype(int)
        
        # Identify critical region
        low_bound = self.threshold - self.critical_width
        high_bound = self.threshold + self.critical_width
        
        in_critical = (scores >= low_bound) & (scores <= high_bound)
        
        # For underprivileged group in critical region: favor positive
        underprivileged_critical = in_critical & (protected == self.underprivileged_group_)
        predictions[underprivileged_critical & (scores >= low_bound)] = 1
        
        # For privileged group in critical region: favor negative
        privileged_critical = in_critical & (protected != self.underprivileged_group_)
        predictions[privileged_critical & (scores <= high_bound)] = 0
        
        return predictions
    
    def fit_predict(self, 
                    scores: np.ndarray, 
                    protected: np.ndarray,
                    labels: np.ndarray = None) -> np.ndarray:
        """Fit and predict in one step."""
        return self.fit(scores, protected, labels).predict(scores, protected)
 
 
def analyze_critical_region(scores: np.ndarray,
                            protected: np.ndarray,
                            labels: np.ndarray,
                            threshold: float = 0.5,
                            width: float = 0.1) -> dict:
    """
    Analyze the composition and properties of the critical region.
    """
    low = threshold - width
    high = threshold + width
    
    in_critical = (scores >= low) & (scores <= high)
    
    results = {
        'critical_fraction': np.mean(in_critical),
        'critical_count': in_critical.sum(),
    }
    
    for a in np.unique(protected):
        mask = (protected == a) & in_critical
        results[f'group_{a}_in_critical'] = mask.sum()
        if mask.sum() > 0:
            results[f'group_{a}_critical_positive_rate'] = np.mean(labels[mask])
    
    return results
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 3000
    
    protected = np.random.binomial(1, 0.5, n)
    labels = np.random.binomial(1, 0.5, n)
    
    # Biased scores
    base = 0.3 + 0.4 * labels + np.random.normal(0, 0.15, n)
    scores = np.clip(base - 0.15 * protected, 0, 1)  # Bias against group 1
    
    print("Original predictions (threshold=0.5):")
    orig = (scores >= 0.5).astype(int)
    print(f"  P(Ŷ=1|A=0) = {np.mean(orig[protected==0]):.3f}")
    print(f"  P(Ŷ=1|A=1) = {np.mean(orig[protected==1]):.3f}")
    print(f"  Accuracy = {np.mean(orig == labels):.3f}")
    
    # Analyze critical region
    print("
Critical region analysis:")
    analysis = analyze_critical_region(scores, protected, labels, width=0.15)
    for k, v in analysis.items():
        print(f"  {k}: {v}")
    
    # Apply reject option
    roc = RejectOptionClassifier(critical_width=0.15)
    fair_preds = roc.fit_predict(scores, protected, labels)
    
    print("
After reject option classification:")
    print(f"  P(Ŷ=1|A=0) = {np.mean(fair_preds[protected==0]):.3f}")
    print(f"  P(Ŷ=1|A=1) = {np.mean(fair_preds[protected==1]):.3f}")
    print(f"  Accuracy = {np.mean(fair_preds == labels):.3f}")

Philosophical Justification

Reject option classification is philosophically appealing because it concentrates fairness interventions on cases where the model is genuinely uncertain. When a model gives 0.95 confidence, overriding that prediction feels like ignoring valuable information. When it gives 0.51, the model is essentially coin-flipping anyway—so we might as well flip toward fairness.

Fair Ranking and Top-k Selection

Many ML applications produce rankings rather than binary classifications—search results, recommendations, job candidate lists. Post-processing for fair ranking focuses on ensuring fair representation and exposure across ranked positions.

Key Fairness Concepts in Ranking:

Proportional Representation: Each group should appear in the top-k proportionally to their representation in the candidate pool
Equal Exposure: Members of different groups should receive equal average exposure (weighted by position)
Individual Fairness: Similarly qualified individuals should be ranked similarly

Position Bias in Ranking:

Users pay more attention to top positions. If Group B is systematically ranked lower, they receive less exposure even with similar qualifications. The exposure of item at position $k$ is typically modeled as:

$$\text{exposure}(k) = \frac{1}{\log_2(k + 1)}$$

Singh & Joachims (2018) - Equity of Attention:

This influential work proposes fairness constraints for rankings:

Demographic Parity in Exposure: $$\frac{\sum_{i \in G_0} \text{exposure}(\text{rank}(i))}{|G_0|} = \frac{\sum_{i \in G_1} \text{exposure}(\text{rank}(i))}{|G_1|}$$

Both groups should receive equal average exposure.

Merit-Weighted Exposure: $$\frac{\text{Exposure}(G_a)}{\text{Merit}(G_a)} = \frac{\text{Exposure}(G_b)}{\text{Merit}(G_b)}$$

Exposure should be proportional to merit (relevance scores).

Algorithmic Approach:

Formulate ranking as a doubly stochastic matrix $P$ where $P_{ij}$ = probability of item $i$ at position $j$
Define exposure constraint as linear constraint on $P$
Solve constrained optimization (birkhoff-von Neumann decomposition)
Sample or deterministically convert to ranking

Fair Top-k Selection
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
import numpy as np
from typing import List, Tuple
 
class FairTopKSelector:
    """
    Select top-k items while ensuring fair representation of groups.
    
    Implements a simple greedy algorithm that alternates between groups
    to achieve proportional representation in the selected set.
    """
    
    def __init__(self, k: int, representation: str = 'proportional'):
        """
        Args:
            k: Number of items to select
            representation: 'proportional' (match population) or 'equal'
        """
        self.k = k
        self.representation = representation
        
    def select(self, 
               scores: np.ndarray, 
               protected: np.ndarray,
               population_weights: dict = None) -> np.ndarray:
        """
        Select top-k items with fair representation.
        
        Args:
            scores: Item scores (higher = better)
            protected: Group membership for each item
            population_weights: Optional target proportions per group
            
        Returns:
            Indices of selected items
        """
        n = len(scores)
        groups = np.unique(protected)
        
        # Compute target counts per group
        if self.representation == 'equal':
            targets = {g: self.k // len(groups) for g in groups}
            # Distribute remainder
            remainder = self.k % len(groups)
            for i, g in enumerate(groups):
                if i < remainder:
                    targets[g] += 1
        else:  # proportional
            if population_weights is None:
                population_weights = {
                    g: np.mean(protected == g) for g in groups
                }
            targets = {
                g: int(np.round(self.k * population_weights[g])) 
                for g in groups
            }
            # Adjust to sum to k
            diff = self.k - sum(targets.values())
            if diff > 0:
                # Add to largest group
                largest = max(targets, key=lambda g: targets[g])
                targets[largest] += diff
            elif diff < 0:
                # Remove from largest group
                largest = max(targets, key=lambda g: targets[g])
                targets[largest] += diff
        
        # Sort each group by score
        sorted_by_group = {
            g: np.argsort(-scores[protected == g])
            for g in groups
        }
        group_indices = {
            g: np.where(protected == g)[0]
            for g in groups
        }
        
        # Greedy selection: pick top from each group until targets met
        selected = []
        counts = {g: 0 for g in groups}
        pointers = {g: 0 for g in groups}
        
        while len(selected) < self.k:
            # Find group with most remaining quota
            remaining = {
                g: targets[g] - counts[g] 
                for g in groups 
                if pointers[g] < len(sorted_by_group[g])
            }
            
            if not remaining:
                break
            
            # Pick item from group with most slots to fill
            best_group = max(remaining, key=lambda g: remaining[g])
            
            item_local_idx = sorted_by_group[best_group][pointers[best_group]]
            item_global_idx = group_indices[best_group][item_local_idx]
            
            selected.append(item_global_idx)
            counts[best_group] += 1
            pointers[best_group] += 1
        
        return np.array(selected)
 
 
def compute_ranking_fairness(selected_indices: np.ndarray,
                              scores: np.ndarray,
                              protected: np.ndarray) -> dict:
    """Compute fairness metrics for top-k selection."""
    groups = np.unique(protected)
    n_total = len(scores)
    k = len(selected_indices)
    
    metrics = {}
    
    # Representation in selection vs population
    for g in groups:
        pop_rate = np.mean(protected == g)
        sel_rate = np.mean(protected[selected_indices] == g)
        metrics[f'group_{g}_population_rate'] = pop_rate
        metrics[f'group_{g}_selection_rate'] = sel_rate
        metrics[f'group_{g}_representation_ratio'] = sel_rate / pop_rate if pop_rate > 0 else 0
    
    # Average score of selected by group
    for g in groups:
        sel_group = selected_indices[protected[selected_indices] == g]
        if len(sel_group) > 0:
            metrics[f'group_{g}_avg_score_selected'] = np.mean(scores[sel_group])
    
    return metrics
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 500
    k = 50
    
    # Candidates with biased scores
    protected = np.random.binomial(1, 0.4, n)  # 40% in group 1
    
    # True merit (unbiased)
    true_merit = np.random.normal(5, 1, n)
    
    # Observed scores (biased against group 1)
    scores = true_merit - 0.5 * protected + np.random.normal(0, 0.3, n)
    
    print("Naive top-k selection:")
    naive_top_k = np.argsort(-scores)[:k]
    naive_metrics = compute_ranking_fairness(naive_top_k, scores, protected)
    for key, val in naive_metrics.items():
        print(f"  {key}: {val:.3f}")
    
    print("
Fair top-k selection (proportional):")
    selector = FairTopKSelector(k=k, representation='proportional')
    fair_top_k = selector.select(scores, protected)
    fair_metrics = compute_ranking_fairness(fair_top_k, scores, protected)
    for key, val in fair_metrics.items():
        print(f"  {key}: {val:.3f}")

Ranking Fairness Challenges

Fair ranking is particularly challenging because fairness constraints may conflict with relevance optimization. A perfectly fair ranking may place less relevant items higher than more relevant ones. Additionally, the position bias model (how much users attend to each position) is an assumption that may not hold universally.

Comparing Post-processing Methods

Post-processing Methods Comparison
Method	Fairness Target	Requires Y at Prediction	Complexity	Best For
Threshold Optimization	Demographic Parity	No	Low	Simple adjustments, different group base rates
Equalized Odds (Hardt)	Equalized TPR/FPR	Yes (or estimated)	Medium	When error rate equality matters
Group Calibration	Calibration within groups	For fitting only	Low	When probability interpretation matters
Reject Option	Demographic Parity	No	Low	When model is uncertain (scores near 0.5)
Fair Ranking	Exposure fairness	No	Medium-High	Search, recommendations, selection

Selection Guidelines:

Use Threshold Optimization when:

You need a quick, interpretable fix
Protected attribute is available at prediction time
Score distributions overlap sufficiently

Use Equalized Odds when:

Equal error rates across groups are the fairness priority
You can estimate or approximate true labels at prediction time
You need theoretical fairness guarantees

Use Calibration when:

Decision-makers need to interpret probabilities
Downstream systems rely on calibrated confidence
You want group-specific probability guarantees

Use Reject Option when:

Many predictions fall in uncertain range
You want minimal impact on confident predictions
The philosophy of 'tilting ties' resonates with stakeholders

Use Fair Ranking when:

Output is a ranking, not classification
Exposure or representation across positions matters
Selection/hiring/recommendation contexts

Combining Methods

Post-processing methods can be chained. For example: (1) Calibrate each group separately, (2) Apply threshold optimization on calibrated scores, (3) Use reject option for borderline cases. Experiment with combinations on held-out data to find what works for your specific application.

Limitations and Considerations

Post-processing is powerful but has fundamental limitations that practitioners must understand.

Key Limitations

•Cannot Fix Severe Model Bias: If the model's score distributions don't overlap between groups, no threshold can equalize rates without random predictions. Post-processing can't create information the model doesn't have.
•Requires Protected Attribute Access: Most methods need $A$ at prediction time. If protected attributes aren't collected or are legally restricted from use, post-processing may be infeasible.
•May Reduce Individual Fairness: To achieve group fairness, post-processing may treat similar individuals differently based on group membership. This can conflict with individual fairness principles.
•Accuracy Cost is Unavoidable: Making predictions fairer often means making some predictions 'worse' by traditional accuracy metrics. This tradeoff is fundamental, not a limitation of the method.
•Doesn't Fix Root Causes: Post-processing adjusts symptoms (biased predictions) without addressing causes (biased data, features, or model). Root cause remediation through pre-processing or in-processing may be more sustainable.

When Post-processing Shines

Post-processing is most valuable when: (1) Models are fixed or expensive to retrain, (2) Fairness requirements vary by context and need runtime adjustment, (3) Auditing and transparency require separable fairness interventions, (4) Quick deployment of fairness improvements is needed. It's often the fastest path to fairer predictions, even if not the most fundamental.

Summary: Post-processing Methods

Key Takeaways

•Post-processing modifies predictions after training, enabling fairness adjustments without retraining models.
•Threshold optimization uses group-specific classification thresholds to equalize prediction rates.
•Equalized odds post-processing (Hardt et al.) uses randomized prediction rules to achieve equal TPR and FPR.
•Group-specific calibration ensures predicted probabilities have consistent meaning across groups.
•Reject option classification focuses fairness interventions on uncertain predictions near the decision boundary.
•Fair ranking methods ensure proportional representation and equal exposure in ranked outputs.
•Post-processing requires protected attribute access at prediction time, which may not always be feasible.
•Effectiveness is limited by model quality—severely biased models cannot be fully corrected by post-processing.

What's Next:

The final page of this module examines fairness-accuracy tradeoffs—the fundamental tension between predictive performance and fairness, impossibility results that limit what can be achieved, and practical strategies for navigating these tradeoffs in real-world applications.

Page Complete

You now understand the major post-processing approaches to bias mitigation—from simple threshold optimization to sophisticated fair ranking algorithms. These techniques provide essential tools for deploying fairer ML systems, especially when model retraining is not an option.