ML Interpretability & FairnessBias Detection and Mitigation

Bias Detection and Mitigation

LevelAdvanced

Duration90 mins

TopicBias Detection and Mitigation

2 / 5

Pre-processing Methods

The Pre-processing Philosophy

Pre-processing methods for bias mitigation operate on a simple but powerful premise: if the data is the source of bias, fix the data. By modifying training data before it reaches the learning algorithm, we can remove or neutralize discriminatory patterns without requiring any changes to the model itself.

This approach has profound implications. It preserves model agnosticity—you can use any learning algorithm on fair pre-processed data. It enables reuse—debiased data can be shared and used across multiple modeling tasks. And it provides transparency—the transformations applied to data are explicit and auditable.

Learning Objectives

By the end of this page, you will be able to: (1) Explain the philosophy and tradeoffs of pre-processing approaches to fairness, (2) Implement re-sampling techniques for balancing training data, (3) Apply re-weighting methods to adjust sample importance, (4) Understand data transformation methods including Disparate Impact Remover and Fair Representation Learning, (5) Evaluate when pre-processing is appropriate versus other approaches.

The Pre-processing Landscape:

Pre-processing methods fall into three broad categories:

Re-sampling: Change which samples appear in training data (undersampling majorities, oversampling minorities)
Re-weighting: Assign different importance weights to training samples
Data Transformation: Modify feature values or representations to remove discriminatory information

Each approach has distinct advantages and limitations. Re-sampling and re-weighting are simple to implement but may not address subtle correlations. Data transformation can achieve stronger fairness guarantees but may lose useful information. The right choice depends on your specific context, data, and fairness requirements.

Mathematical Framework for Pre-processing

Before examining specific techniques, let's establish the mathematical foundation that unifies pre-processing approaches.

The Data Distribution Transformation Problem:

Let $D_{original} = {(x_i, a_i, y_i)}_{i=1}^n$ be the original training dataset, where:

$x_i \in \mathcal{X}$ are non-protected features
$a_i \in \mathcal{A}$ are protected attributes (e.g., race, gender)
$y_i \in \mathcal{Y}$ are target labels

Our goal is to construct a transformed dataset $D_{fair}$ such that any model trained on $D_{fair}$ satisfies a given fairness constraint while maintaining predictive utility.

Formal Objective:

$$D_{fair} = T(D_{original})$$

where transformation $T$ satisfies:

Fairness: Models trained on $D_{fair}$ satisfy $\mathcal{F}(h) \leq \epsilon$ for fairness criterion $\mathcal{F}$
Utility: Predictive performance degrades minimally: $L(h_{fair}) - L(h_{original}) \leq \delta$

The Fundamental Tradeoff

Pre-processing inherently involves a fairness-utility tradeoff. The more aggressively we remove discriminatory information, the more predictive power we may sacrifice. This tradeoff is not a bug but a fundamental property of fairness in prediction.

Why Pre-processing Works:

The key insight is that statistical dependence between protected attributes and outcomes can be decomposed into:

Legitimate Dependence: Correlations that arise from genuine, non-discriminatory causal pathways
Illegitimate Dependence: Correlations that arise from historical discrimination or bias

Pre-processing attempts to remove or reduce illegitimate dependence. The challenge is distinguishing between these two types, which is fundamentally a causal inference problem.

Statistical Independence Goal:

Many pre-processing methods target statistical independence between features and protected attributes:

$$P_{D_{fair}}(X | A) = P_{D_{fair}}(X)$$

This ensures features carry no information about protected attributes, preventing the model from using protected information (even implicitly) for prediction.

Re-sampling Methods

Re-sampling modifies the training set composition by changing which examples appear and how often. This is the simplest class of pre-processing methods, requiring no feature transformation, only sample selection.

Core Intuition: If bias arises from imbalanced representation—more examples of one group with certain outcomes—we can rebalance by duplicating underrepresented combinations or removing overrepresented ones.

Types of Re-sampling

•Random Undersampling: Remove examples from overrepresented (group, label) combinations to balance proportions
•Random Oversampling: Duplicate examples from underrepresented (group, label) combinations
•Uniform Sampling: Ensure equal representation of all (group, label) combinations in training batches
•Stratified Sampling: Maintain proportional representation while ensuring each subgroup reaches minimum thresholds
•Synthetic Oversampling (SMOTE-variants): Generate synthetic examples for underrepresented combinations using interpolation

Undersampling for Fairness:

Undersampling removes examples to balance the joint distribution of $(A, Y)$. The goal is to create a dataset where:

$$P_{sampled}(A = a, Y = y) = \frac{1}{|\mathcal{A}| \cdot |\mathcal{Y}|}$$

Algorithm:

1. Compute counts N(a,y) for each (protected_attribute, label) combination
2. Find minimum count: N_min = min_{a,y} N(a,y)
3. For each combination (a,y):
   - Randomly sample N_min examples from the N(a,y) available
4. Combine all samples into the balanced dataset

Advantages:

Simple to implement
Guarantees balanced representation
No synthetic data generation

Disadvantages:

Discards potentially valuable information
May significantly reduce training set size
Doesn't address correlation between X and A

Fair Undersampling Implementation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import numpy as np
from collections import Counter
from typing import Tuple, List
 
def fair_undersample(X: np.ndarray, 
                     protected: np.ndarray, 
                     labels: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Balance dataset by undersampling overrepresented (group, label) combinations.
    
    Args:
        X: Feature matrix (n_samples, n_features)
        protected: Protected attribute values (n_samples,)
        labels: Target labels (n_samples,)
    
    Returns:
        Balanced (X, protected, labels) tuple
    """
    # Create (group, label) tuples
    combinations = list(zip(protected, labels))
    counts = Counter(combinations)
    
    # Find minimum count across all combinations
    min_count = min(counts.values())
    
    # Collect indices for balanced sampling
    balanced_indices = []
    for combo in counts.keys():
        combo_indices = [i for i, c in enumerate(combinations) if c == combo]
        sampled = np.random.choice(combo_indices, size=min_count, replace=False)
        balanced_indices.extend(sampled)
    
    # Shuffle to prevent ordering effects
    np.random.shuffle(balanced_indices)
    
    return X[balanced_indices], protected[balanced_indices], labels[balanced_indices]
 
 
def fair_oversample(X: np.ndarray,
                    protected: np.ndarray,
                    labels: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Balance dataset by oversampling underrepresented (group, label) combinations.
    
    Args:
        X: Feature matrix (n_samples, n_features)
        protected: Protected attribute values (n_samples,)
        labels: Target labels (n_samples,)
    
    Returns:
        Balanced (X, protected, labels) tuple
    """
    combinations = list(zip(protected, labels))
    counts = Counter(combinations)
    
    # Find maximum count across all combinations
    max_count = max(counts.values())
    
    # Collect indices with oversampling
    balanced_indices = []
    for combo in counts.keys():
        combo_indices = [i for i, c in enumerate(combinations) if c == combo]
        current_count = len(combo_indices)
        
        if current_count < max_count:
            # Sample with replacement to reach max_count
            oversampled = np.random.choice(
                combo_indices, 
                size=max_count, 
                replace=True
            )
            balanced_indices.extend(oversampled)
        else:
            balanced_indices.extend(combo_indices)
    
    np.random.shuffle(balanced_indices)
    
    return X[balanced_indices], protected[balanced_indices], labels[balanced_indices]

Re-sampling Limitations

Re-sampling balances the marginal distribution of (A, Y) but doesn't address correlations between features X and protected attribute A. If features encode protected information (e.g., name encoding gender, zip code encoding race), re-sampling alone won't prevent discrimination.

Re-weighting Methods

Re-weighting assigns different importance weights to training samples, modifying the effective training distribution without changing the samples themselves. This preserves all data while adjusting the optimization objective.

The Re-weighting Objective:

Standard empirical risk minimization: $$\hat{h} = \arg\min_h \frac{1}{n} \sum_{i=1}^n L(h(x_i), y_i)$$

Weighted empirical risk minimization: $$\hat{h}{fair} = \arg\min_h \frac{1}{\sum_i w_i} \sum{i=1}^n w_i \cdot L(h(x_i), y_i)$$

where weights $w_i$ are chosen to counteract bias.

Calders-Verwer Reweighting:

The seminal reweighting approach by Calders and Verwer (2010) assigns weights to achieve demographic parity. The key insight is that discrimination arises from different base rates of positive outcomes across groups.

Weight Computation:

For binary protected attribute $A \in {0, 1}$ and binary label $Y \in {0, 1}$:

$$w_{a,y} = \frac{P(A = a) \cdot P(Y = y)}{P(A = a, Y = y)}$$

This reweights to simulate independence between $A$ and $Y$.

Derivation:

In the original data: $P(Y=y | A=a) \neq P(Y=y)$ (groups have different positive rates)

After reweighting: $P_{weighted}(Y=y | A=a) = P(Y=y)$ (groups have equal positive rates)

The weight transforms the dependent distribution into an independent one: $$P_{weighted}(A, Y) = P(A) \cdot P(Y)$$

Calders-Verwer Reweighting
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import numpy as np
from typing import Dict, Tuple
 
def compute_calders_verwer_weights(protected: np.ndarray, 
                                    labels: np.ndarray) -> np.ndarray:
    """
    Compute sample weights following Calders-Verwer reweighting for demographic parity.
    
    The weights transform the joint distribution P(A, Y) to independence: P(A) * P(Y)
    
    Args:
        protected: Binary protected attribute (n_samples,)
        labels: Binary target labels (n_samples,)
    
    Returns:
        Sample weights (n_samples,)
    """
    n = len(protected)
    
    # Compute marginal probabilities
    p_a_1 = np.mean(protected)  # P(A = 1)
    p_a_0 = 1 - p_a_1           # P(A = 0)
    p_y_1 = np.mean(labels)     # P(Y = 1)
    p_y_0 = 1 - p_y_1           # P(Y = 0)
    
    # Compute joint probabilities
    p_a0_y0 = np.mean((protected == 0) & (labels == 0))
    p_a0_y1 = np.mean((protected == 0) & (labels == 1))
    p_a1_y0 = np.mean((protected == 1) & (labels == 0))
    p_a1_y1 = np.mean((protected == 1) & (labels == 1))
    
    # Compute weights: w(a,y) = P(A=a) * P(Y=y) / P(A=a, Y=y)
    weight_map = {
        (0, 0): (p_a_0 * p_y_0) / p_a0_y0 if p_a0_y0 > 0 else 1.0,
        (0, 1): (p_a_0 * p_y_1) / p_a0_y1 if p_a0_y1 > 0 else 1.0,
        (1, 0): (p_a_1 * p_y_0) / p_a1_y0 if p_a1_y0 > 0 else 1.0,
        (1, 1): (p_a_1 * p_y_1) / p_a1_y1 if p_a1_y1 > 0 else 1.0,
    }
    
    # Assign weights to each sample
    weights = np.array([
        weight_map[(int(a), int(y))] 
        for a, y in zip(protected, labels)
    ])
    
    # Normalize to sum to n (preserves effective sample size interpretation)
    weights = weights * (n / np.sum(weights))
    
    return weights
 
 
def train_weighted_model(X: np.ndarray, 
                         y: np.ndarray, 
                         weights: np.ndarray,
                         model_class,
                         **model_kwargs):
    """
    Train a model using sample weights for fairness.
    
    Most sklearn classifiers accept sample_weight parameter.
    """
    model = model_class(**model_kwargs)
    model.fit(X, y, sample_weight=weights)
    return model
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.linear_model import LogisticRegression
    
    # Generate synthetic biased data
    np.random.seed(42)
    n_samples = 10000
    
    protected = np.random.binomial(1, 0.4, n_samples)  # 40% in protected group
    # Biased labels: protected group has lower positive rate
    base_rate = 0.3 + 0.3 * (1 - protected)  # 60% for A=0, 30% for A=1
    labels = np.random.binomial(1, base_rate, n_samples)
    X = np.column_stack([
        np.random.randn(n_samples, 3),  # Random features
        protected * 0.5  # Feature slightly correlated with A
    ])
    
    # Compute fairness weights
    weights = compute_calders_verwer_weights(protected, labels)
    
    print("Original distribution:")
    print(f"  P(Y=1|A=0) = {np.mean(labels[protected == 0]):.3f}")
    print(f"  P(Y=1|A=1) = {np.mean(labels[protected == 1]):.3f}")
    
    print(f"\nWeight statistics:")
    print(f"  Weight range: [{weights.min():.3f}, {weights.max():.3f}]")
    print(f"  Mean weight: {weights.mean():.3f}")

Choosing Between Re-sampling and Re-weighting

Re-weighting is generally preferred over re-sampling when: (1) You can't afford to discard any data, (2) Your learning algorithm supports sample weights, (3) You want more continuous control over the fairness-utility tradeoff. Re-sampling is preferred when: (1) Sample weights are not supported, (2) You want a simpler, more transparent transformation.

Extensions to Multi-Class and Multi-Group Settings:

The Calders-Verwer approach extends naturally to multiple groups and labels:

$$w_{a,y} = \frac{P(A = a) \cdot P(Y = y)}{P(A = a, Y = y)}$$

For $|\mathcal{A}| = k$ groups and $|\mathcal{Y}| = m$ classes, we compute $k \times m$ weights.

Handling Intersectionality:

When multiple protected attributes exist, we can:

Treat intersections as groups: For gender $\times$ race, create 4-6 intersectional groups
Weight multiplicatively: $w_i = w_{gender_i} \cdot w_{race_i}$
Use multi-way independence: Enforce $P(A_1, A_2, Y) = P(A_1) \cdot P(A_2) \cdot P(Y)$

Each approach has tradeoffs—intersectional groups can become very small, multiplicative weights may over-correct, and multi-way independence may be too restrictive.

Disparate Impact Remover

Disparate Impact Remover (DIR), introduced by Feldman et al. (2015), takes a fundamentally different approach: instead of changing which samples are used or how they're weighted, it transforms the feature values themselves to remove information about protected attributes.

Core Idea: Modify each feature's distribution so that the conditional distributions $P(X_j | A = a)$ become identical across groups. After transformation, knowing someone's group membership provides no information about their feature values—and vice versa.

Mathematical Foundation:

For each feature $X_j$, we apply a rank-preserving transformation that maps the group-specific distributions to a common target distribution. The transformation uses the quantile function (inverse CDF).

Algorithm:

For each feature $X_j$ and group $a$:

Compute the CDF of $X_j$ within group $a$: $F_{j,a}(x) = P(X_j \leq x | A = a)$
Define a target distribution (median of group distributions or uniform)
Transform: $X'{j,i} = F{target}^{-1}(F_{j,a_i}(X_{j,i}))$

This maps each value through its within-group quantile to the corresponding quantile in the target distribution.

Repair Level Parameter:

DIR includes a repair level $\lambda \in [0, 1]$ controlling the transformation strength:

$$X'{j,i} = (1 - \lambda) \cdot X{j,i} + \lambda \cdot \text{Repaired}(X_{j,i})$$

$\lambda = 0$: No transformation (original data)
$\lambda = 1$: Full repair (complete independence from $A$)
$0 < \lambda < 1$: Partial repair (tradeoff between utility and independence)

Disparate Impact Remover
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
import numpy as np
from scipy.interpolate import interp1d
from typing import Dict, Optional
 
class DisparateImpactRemover:
    """
    Remove disparate impact by transforming features to be independent 
    of protected attribute through quantile-based repair.
    
    Based on: Feldman et al. "Certifying and Removing Disparate Impact" (2015)
    """
    
    def __init__(self, repair_level: float = 1.0):
        """
        Args:
            repair_level: Float in [0, 1] controlling transformation strength.
                          0 = no change, 1 = full repair
        """
        assert 0 <= repair_level <= 1, "repair_level must be in [0, 1]"
        self.repair_level = repair_level
        self.cdfs_ = {}  # Group CDFs for each feature
        self.inv_cdfs_ = {}  # Inverse CDFs
        self.target_inv_cdf_ = {}  # Target inverse CDFs
        
    def fit(self, X: np.ndarray, protected: np.ndarray) -> 'DisparateImpactRemover':
        """
        Learn the quantile transformations from training data.
        """
        n_samples, n_features = X.shape
        groups = np.unique(protected)
        
        for j in range(n_features):
            self.cdfs_[j] = {}
            self.inv_cdfs_[j] = {}
            
            # Compute CDF and inverse CDF for each group
            for a in groups:
                mask = protected == a
                values = np.sort(X[mask, j])
                n_group = len(values)
                
                # Empirical CDF: F(x) = proportion of values <= x
                percentiles = np.linspace(0, 1, n_group)
                
                # Create interpolation functions
                # CDF: value -> percentile
                self.cdfs_[j][a] = interp1d(
                    values, percentiles, 
                    bounds_error=False, 
                    fill_value=(0, 1)
                )
                # Inverse CDF: percentile -> value
                self.inv_cdfs_[j][a] = interp1d(
                    percentiles, values,
                    bounds_error=False,
                    fill_value=(values[0], values[-1])
                )
            
            # Target distribution: median of group distributions
            # For each percentile, average the values across groups
            percentiles = np.linspace(0, 1, 1000)
            target_values = np.median([
                self.inv_cdfs_[j][a](percentiles) for a in groups
            ], axis=0)
            
            self.target_inv_cdf_[j] = interp1d(
                percentiles, target_values,
                bounds_error=False,
                fill_value=(target_values[0], target_values[-1])
            )
        
        return self
    
    def transform(self, X: np.ndarray, protected: np.ndarray) -> np.ndarray:
        """
        Apply the repair transformation to features.
        """
        n_samples, n_features = X.shape
        X_repaired = X.copy().astype(float)
        
        for j in range(n_features):
            for a in np.unique(protected):
                mask = protected == a
                
                if a in self.cdfs_[j]:
                    # Map value -> percentile -> target value
                    percentiles = self.cdfs_[j][a](X[mask, j])
                    repaired = self.target_inv_cdf_[j](percentiles)
                    
                    # Apply repair level blending
                    X_repaired[mask, j] = (
                        (1 - self.repair_level) * X[mask, j] +
                        self.repair_level * repaired
                    )
                    
        return X_repaired
    
    def fit_transform(self, X: np.ndarray, protected: np.ndarray) -> np.ndarray:
        """Fit and transform in one step."""
        return self.fit(X, protected).transform(X, protected)
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 2000
    
    # Create data where features differ by group
    protected = np.random.binomial(1, 0.5, n)
    X = np.column_stack([
        np.random.normal(5, 2, n) + 3 * protected,  # Feature differs by group
        np.random.normal(0, 1, n) + 1 * protected,  # Another biased feature
        np.random.normal(10, 3, n)  # Unbiased feature
    ])
    
    print("Before repair:")
    print(f"  Feature 0: Group 0 mean={X[protected==0, 0].mean():.2f}, "
          f"Group 1 mean={X[protected==1, 0].mean():.2f}")
    
    # Apply repair
    remover = DisparateImpactRemover(repair_level=1.0)
    X_repaired = remover.fit_transform(X, protected)
    
    print("\nAfter full repair:")
    print(f"  Feature 0: Group 0 mean={X_repaired[protected==0, 0].mean():.2f}, "
          f"Group 1 mean={X_repaired[protected==1, 0].mean():.2f}")

Rank Preservation Property

Disparate Impact Remover preserves within-group rankings: if person A had a higher feature value than person B within their group, this remains true after transformation. This is a desirable property when relative comparisons within groups are meaningful.

Fair Representation Learning

Fair Representation Learning learns a new feature representation that encodes useful information for prediction while being uninformative about protected attributes. Unlike Disparate Impact Remover which modifies individual features, representation learning methods create entirely new feature spaces using neural networks or other learnable mappings.

The Core Objective:

Learn encoder $E: \mathcal{X} \rightarrow \mathcal{Z}$ such that:

Utility: $Z = E(X)$ is predictive of $Y$: $\text{min}_g L(g(Z), Y)$
Fairness: $Z$ is uninformative about $A$: $I(Z; A) \approx 0$ or $P(Z|A=0) \approx P(Z|A=1)$

This is formulated as a multi-objective optimization problem.

Zemel et al. (2013) - Learning Fair Representations:

The original fair representation learning paper proposes mapping data to a multinomial distribution over $K$ prototypes, where the mapping satisfies:

Statistical Parity: $P(Z = k | A = 0) \approx P(Z = k | A = 1)$ for all prototypes $k$
Prediction Accuracy: Prototypes are predictive of the target
Reconstruction: The representation preserves input information

Objective Function: $$L = L_{reconstruct} + \alpha \cdot L_{accuracy} + \beta \cdot L_{fairness}$$

where:

$L_{reconstruct}$: Reconstruction error from representation
$L_{accuracy}$: Classification loss for target prediction
$L_{fairness}$: Divergence between group-specific prototype distributions

Variational Fair Autoencoders (VFAE):

Louizos et al. (2016) extend this with variational autoencoders. The model:

Encodes input to latent representation: $z = E_\phi(x)$
Ensures $z$ is independent of $a$ using Maximum Mean Discrepancy (MMD)
Reconstructs input: $\hat{x} = D_\theta(z)$
Predicts target: $\hat{y} = C_\psi(z)$

Loss Function: $$L = L_{ELBO} + \gamma \cdot \text{MMD}(p(z|a=0), p(z|a=1)) + \lambda \cdot L_{classify}$$

The MMD term penalizes differences between the latent distributions for different groups.

Fair Autoencoder Concept
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Tuple
 
class FairAutoencoder(nn.Module):
    """
    A fair autoencoder that learns representations independent of protected attributes.
    Uses adversarial training to remove protected information from latent space.
    """
    
    def __init__(self, 
                 input_dim: int,
                 latent_dim: int = 32,
                 hidden_dim: int = 64):
        super().__init__()
        
        # Encoder: X -> Z
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, latent_dim)
        )
        
        # Decoder: Z -> X (reconstruction)
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim)
        )
        
        # Target predictor: Z -> Y
        self.predictor = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, 1),
            nn.Sigmoid()
        )
        
        # Adversary: Z -> A (tries to predict protected attribute)
        self.adversary = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, 1),
            nn.Sigmoid()
        )
        
    def encode(self, x: torch.Tensor) -> torch.Tensor:
        """Map input to fair latent representation."""
        return self.encoder(x)
    
    def decode(self, z: torch.Tensor) -> torch.Tensor:
        """Reconstruct input from latent representation."""
        return self.decoder(z)
    
    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, ...]:
        z = self.encode(x)
        x_reconstructed = self.decode(z)
        y_pred = self.predictor(z)
        a_pred = self.adversary(z)
        return z, x_reconstructed, y_pred, a_pred
 
 
def compute_mmd(x: torch.Tensor, y: torch.Tensor, 
                kernel: str = 'rbf') -> torch.Tensor:
    """
    Compute Maximum Mean Discrepancy between two distributions.
    MMD = 0 iff distributions are identical.
    """
    def rbf_kernel(x, y, sigma=1.0):
        dist = torch.cdist(x, y, p=2)
        return torch.exp(-dist**2 / (2 * sigma**2))
    
    xx = rbf_kernel(x, x).mean()
    yy = rbf_kernel(y, y).mean()
    xy = rbf_kernel(x, y).mean()
    
    return xx + yy - 2 * xy
 
 
def train_fair_autoencoder(model: FairAutoencoder,
                           X: torch.Tensor,
                           protected: torch.Tensor,
                           labels: torch.Tensor,
                           epochs: int = 100,
                           fairness_weight: float = 1.0,
                           adversary_weight: float = 1.0):
    """
    Train fair autoencoder with adversarial debiasing.
    
    The adversary tries to predict protected attribute from Z.
    The encoder tries to fool the adversary (minimize adversary accuracy).
    """
    # Separate optimizers for encoder/decoder and adversary
    main_params = list(model.encoder.parameters()) + \
                  list(model.decoder.parameters()) + \
                  list(model.predictor.parameters())
    
    main_optimizer = torch.optim.Adam(main_params, lr=0.001)
    adversary_optimizer = torch.optim.Adam(model.adversary.parameters(), lr=0.001)
    
    for epoch in range(epochs):
        model.train()
        
        # Forward pass
        z, x_recon, y_pred, a_pred = model(X)
        
        # Losses
        recon_loss = F.mse_loss(x_recon, X)
        pred_loss = F.binary_cross_entropy(y_pred.squeeze(), labels.float())
        adversary_loss = F.binary_cross_entropy(a_pred.squeeze(), protected.float())
        
        # MMD fairness loss
        z_group0 = z[protected == 0]
        z_group1 = z[protected == 1]
        if len(z_group0) > 0 and len(z_group1) > 0:
            mmd_loss = compute_mmd(z_group0, z_group1)
        else:
            mmd_loss = torch.tensor(0.0)
        
        # Train adversary to predict protected attribute
        adversary_optimizer.zero_grad()
        adversary_loss.backward(retain_graph=True)
        adversary_optimizer.step()
        
        # Train main model: minimize reconstruction + prediction + MMD
        # Also minimize negative adversary loss (fool adversary)
        main_optimizer.zero_grad()
        main_loss = (recon_loss + 
                    pred_loss + 
                    fairness_weight * mmd_loss - 
                    adversary_weight * adversary_loss)
        main_loss.backward()
        main_optimizer.step()
        
        if (epoch + 1) % 20 == 0:
            print(f"Epoch {epoch+1}: Recon={recon_loss:.4f}, "
                  f"Pred={pred_loss:.4f}, MMD={mmd_loss:.4f}")

Representation Learning Advantages

Fair representation learning is particularly powerful when: (1) The relationship between X and A is complex and nonlinear, (2) You want a single representation usable for multiple downstream tasks, (3) You're building a data pipeline where model architectures may change but fairness requirements persist.

Massaging and Relabeling Techniques

Massaging and relabeling approaches modify the target labels rather than features. The intuition is that historical bias manifests in labels—if past decisions were discriminatory, the labels reflecting those decisions are biased.

Massaging (Kamiran & Calders, 2012):

Identify samples near the decision boundary where label changes would most reduce discrimination while least affecting classification:

Train a preliminary classifier on original data
Rank samples by predicted probability
Identify 'promotion candidates' in underprivileged group (negative labels but high predicted probability)
Identify 'demotion candidates' in privileged group (positive labels but low predicted probability)
Flip labels for selected candidates to equalize positive rates

Mathematical Formulation:

Let $D = \text{P}(Y=1|A=1) - \text{P}(Y=1|A=0)$ be the discrimination in positive rates.

To achieve $D' = 0$, we need to change approximately: $$M = \frac{|D| \cdot n_0 \cdot n_1}{n}$$

labels, where $n_a$ is the count of group $a$.

Promotion Candidates: In group $A=0$, samples with:

Original label $Y = 0$
High predicted probability $P(Y=1|X) > \theta_{promote}$

Demotion Candidates: In group $A=1$, samples with:

Original label $Y = 1$
Low predicted probability $P(Y=1|X) < \theta_{demote}$

Thresholds are set to change exactly $M/2$ labels in each direction.

Label Massaging Implementation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
import numpy as np
from sklearn.base import clone
from typing import Tuple
 
class LabelMassaging:
    """
    Pre-processing technique that modifies labels to reduce discriminatation.
    Changes labels for samples near decision boundary to equalize positive rates.
    """
    
    def __init__(self, base_estimator, target_discrimination: float = 0.0):
        """
        Args:
            base_estimator: Classifier used to rank samples by prediction confidence
            target_discrimination: Target difference in positive rates (default 0)
        """
        self.base_estimator = base_estimator
        self.target_discrimination = target_discrimination
        
    def fit_transform(self, X: np.ndarray, y: np.ndarray, 
                      protected: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Massage labels to reduce discrimination.
        
        Returns:
            Tuple of (X, y_massaged) with modified labels
        """
        # Train preliminary classifier
        temp_clf = clone(self.base_estimator)
        temp_clf.fit(X, y)
        probs = temp_clf.predict_proba(X)[:, 1]  # P(Y=1|X)
        
        # Compute current discrimination
        rate_0 = np.mean(y[protected == 0])  # P(Y=1|A=0)
        rate_1 = np.mean(y[protected == 1])  # P(Y=1|A=1)
        current_disc = rate_1 - rate_0
        
        n_0 = np.sum(protected == 0)
        n_1 = np.sum(protected == 1)
        n = len(y)
        
        # Number of labels to change
        target_change = abs(current_disc - self.target_discrimination) * n_0 * n_1 / n
        m = int(np.ceil(target_change / 2))  # Changes per direction
        
        y_massaged = y.copy()
        
        if current_disc > self.target_discrimination:
            # Need to promote group 0 (increase their positive rate)
            # or demote group 1 (decrease their positive rate)
            
            # Promotion candidates: group 0, y=0, high probability
            promo_mask = (protected == 0) & (y == 0)
            promo_indices = np.where(promo_mask)[0]
            promo_probs = probs[promo_mask]
            promo_sorted = promo_indices[np.argsort(-promo_probs)]  # Highest first
            
            # Demotion candidates: group 1, y=1, low probability
            demo_mask = (protected == 1) & (y == 1)
            demo_indices = np.where(demo_mask)[0]
            demo_probs = probs[demo_mask]
            demo_sorted = demo_indices[np.argsort(demo_probs)]  # Lowest first
            
            # Apply changes
            n_promo = min(m, len(promo_sorted))
            n_demo = min(m, len(demo_sorted))
            
            for idx in promo_sorted[:n_promo]:
                y_massaged[idx] = 1  # Promote: 0 -> 1
            for idx in demo_sorted[:n_demo]:
                y_massaged[idx] = 0  # Demote: 1 -> 0
                
        else:
            # Opposite direction: demote group 0, promote group 1
            # Demotion candidates: group 0, y=1, low probability
            demo_mask = (protected == 0) & (y == 1)
            demo_indices = np.where(demo_mask)[0]
            demo_probs = probs[demo_mask]
            demo_sorted = demo_indices[np.argsort(demo_probs)]
            
            # Promotion candidates: group 1, y=0, high probability
            promo_mask = (protected == 1) & (y == 0)
            promo_indices = np.where(promo_mask)[0]
            promo_probs = probs[promo_mask]
            promo_sorted = promo_indices[np.argsort(-promo_probs)]
            
            n_promo = min(m, len(promo_sorted))
            n_demo = min(m, len(demo_sorted))
            
            for idx in promo_sorted[:n_promo]:
                y_massaged[idx] = 1
            for idx in demo_sorted[:n_demo]:
                y_massaged[idx] = 0
        
        # Report changes
        changes = np.sum(y != y_massaged)
        print(f"Massaged {changes} labels ({100*changes/n:.1f}%)")
        print(f"Original: P(Y=1|A=0)={rate_0:.3f}, P(Y=1|A=1)={rate_1:.3f}")
        new_rate_0 = np.mean(y_massaged[protected == 0])
        new_rate_1 = np.mean(y_massaged[protected == 1])
        print(f"Massaged: P(Y=1|A=0)={new_rate_0:.3f}, P(Y=1|A=1)={new_rate_1:.3f}")
        
        return X, y_massaged

Ethical Considerations of Label Modification

Modifying labels is philosophically controversial. It assumes we know labels are wrong and can correct them. In practice, this may be justified when labels clearly reflect historical discrimination (e.g., biased hiring decisions). However, it requires careful judgment about when labels are 'biased' versus reflecting legitimate differences.

Comparing Pre-processing Methods

Each pre-processing method offers distinct tradeoffs. The optimal choice depends on your specific constraints, data characteristics, and fairness requirements.

Pre-processing Methods Comparison
Method	Strengths	Limitations	Best When
Re-sampling	Simple, interpretable, preserves feature values	May discard data, doesn't address X-A correlation	Imbalanced groups with sufficient data
Re-weighting	Preserves all data, continuous control	Requires algorithm support, variance increase	Algorithm supports weights, all data valuable
Disparate Impact Remover	Feature-level independence, rank-preserving	May distort features, single-attribute focus	Features encode protected info clearly
Fair Representations	Powerful, handles complex correlations	Requires training, less interpretable	Complex X-A relationships, reusable embeddings
Massaging/Relabeling	Directly addresses label bias	Philosophically controversial, needs ranker	Labels clearly reflect historical bias

Decision Framework:

Is your main concern imbalanced representation? → Start with re-sampling or re-weighting
Do your features encode protected information? → Consider Disparate Impact Remover or representation learning
Do you have labels from potentially biased historical decisions? → Consider massaging or relabeling approaches
Do you need reusable debiased data/representations? → Fair representation learning provides transferable embeddings
Is interpretability paramount? → Re-sampling and re-weighting are most transparent

Combining Methods:

Pre-processing methods can be combined. For example:

Apply re-weighting AND Disparate Impact Remover
Use label massaging followed by representation learning

Experimentation is essential—measure both fairness and accuracy impacts of each intervention.

Pre-processing as Part of a Broader Strategy

Pre-processing is rarely sufficient alone for achieving fairness. It works best as part of a comprehensive approach that includes careful problem formulation, appropriate fairness metrics, in-processing constraints when needed, and post-deployment monitoring. The following pages cover in-processing and post-processing methods that complement pre-processing.

Summary: Pre-processing Methods

Key Takeaways

•Pre-processing modifies data before training, enabling any algorithm to learn fairer models without modification.
•Re-sampling changes which examples appear in training, balancing representation across (group, label) combinations.
•Re-weighting assigns importance weights to samples, adjusting the effective training distribution without discarding data.
•Disparate Impact Remover transforms features to remove correlation with protected attributes while preserving within-group rankings.
•Fair Representation Learning creates new feature spaces that encode predictive information but not protected attribute information.
•Massaging modifies labels for samples near the decision boundary to reduce discrimination in the training signal.
•Each method has tradeoffs between simplicity, interpretability, data requirements, and fairness guarantees.
•Pre-processing works best as part of a comprehensive fairness strategy, combined with appropriate problem formulation and ongoing monitoring.

What's Next:

The next page explores in-processing methods—techniques that modify the training algorithm itself to incorporate fairness constraints or objectives directly into the optimization process. While pre-processing acts on data, in-processing integrates fairness into learning.

Page Complete

You now understand the major pre-processing approaches to bias mitigation—from simple re-sampling to sophisticated representation learning. These techniques provide a powerful first line of defense against discriminatory patterns in training data.

2 / 5

Loading learning content...

ML Interpretability & FairnessBias Detection and Mitigation

Bias Detection and Mitigation

LevelAdvanced

Duration90 mins

TopicBias Detection and Mitigation

2 / 5

Pre-processing Methods

The Pre-processing Philosophy

Learning Objectives

The Pre-processing Landscape:

Pre-processing methods fall into three broad categories:

Re-sampling: Change which samples appear in training data (undersampling majorities, oversampling minorities)
Re-weighting: Assign different importance weights to training samples
Data Transformation: Modify feature values or representations to remove discriminatory information

Mathematical Framework for Pre-processing

Before examining specific techniques, let's establish the mathematical foundation that unifies pre-processing approaches.

The Data Distribution Transformation Problem:

Let $D_{original} = {(x_i, a_i, y_i)}_{i=1}^n$ be the original training dataset, where:

$x_i \in \mathcal{X}$ are non-protected features
$a_i \in \mathcal{A}$ are protected attributes (e.g., race, gender)
$y_i \in \mathcal{Y}$ are target labels

Our goal is to construct a transformed dataset $D_{fair}$ such that any model trained on $D_{fair}$ satisfies a given fairness constraint while maintaining predictive utility.

Formal Objective:

$$D_{fair} = T(D_{original})$$

where transformation $T$ satisfies:

Fairness: Models trained on $D_{fair}$ satisfy $\mathcal{F}(h) \leq \epsilon$ for fairness criterion $\mathcal{F}$
Utility: Predictive performance degrades minimally: $L(h_{fair}) - L(h_{original}) \leq \delta$

The Fundamental Tradeoff

Why Pre-processing Works:

The key insight is that statistical dependence between protected attributes and outcomes can be decomposed into:

Legitimate Dependence: Correlations that arise from genuine, non-discriminatory causal pathways
Illegitimate Dependence: Correlations that arise from historical discrimination or bias

Pre-processing attempts to remove or reduce illegitimate dependence. The challenge is distinguishing between these two types, which is fundamentally a causal inference problem.

Statistical Independence Goal:

Many pre-processing methods target statistical independence between features and protected attributes:

$$P_{D_{fair}}(X | A) = P_{D_{fair}}(X)$$

This ensures features carry no information about protected attributes, preventing the model from using protected information (even implicitly) for prediction.

Re-sampling Methods

Types of Re-sampling

•Random Undersampling: Remove examples from overrepresented (group, label) combinations to balance proportions
•Random Oversampling: Duplicate examples from underrepresented (group, label) combinations
•Uniform Sampling: Ensure equal representation of all (group, label) combinations in training batches
•Stratified Sampling: Maintain proportional representation while ensuring each subgroup reaches minimum thresholds
•Synthetic Oversampling (SMOTE-variants): Generate synthetic examples for underrepresented combinations using interpolation

Undersampling for Fairness:

Undersampling removes examples to balance the joint distribution of $(A, Y)$. The goal is to create a dataset where:

$$P_{sampled}(A = a, Y = y) = \frac{1}{|\mathcal{A}| \cdot |\mathcal{Y}|}$$

Algorithm:

1. Compute counts N(a,y) for each (protected_attribute, label) combination
2. Find minimum count: N_min = min_{a,y} N(a,y)
3. For each combination (a,y):
   - Randomly sample N_min examples from the N(a,y) available
4. Combine all samples into the balanced dataset

Advantages:

Simple to implement
Guarantees balanced representation
No synthetic data generation

Disadvantages:

Discards potentially valuable information
May significantly reduce training set size
Doesn't address correlation between X and A

Fair Undersampling Implementation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import numpy as np
from collections import Counter
from typing import Tuple, List
 
def fair_undersample(X: np.ndarray, 
                     protected: np.ndarray, 
                     labels: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Balance dataset by undersampling overrepresented (group, label) combinations.
    
    Args:
        X: Feature matrix (n_samples, n_features)
        protected: Protected attribute values (n_samples,)
        labels: Target labels (n_samples,)
    
    Returns:
        Balanced (X, protected, labels) tuple
    """
    # Create (group, label) tuples
    combinations = list(zip(protected, labels))
    counts = Counter(combinations)
    
    # Find minimum count across all combinations
    min_count = min(counts.values())
    
    # Collect indices for balanced sampling
    balanced_indices = []
    for combo in counts.keys():
        combo_indices = [i for i, c in enumerate(combinations) if c == combo]
        sampled = np.random.choice(combo_indices, size=min_count, replace=False)
        balanced_indices.extend(sampled)
    
    # Shuffle to prevent ordering effects
    np.random.shuffle(balanced_indices)
    
    return X[balanced_indices], protected[balanced_indices], labels[balanced_indices]
 
 
def fair_oversample(X: np.ndarray,
                    protected: np.ndarray,
                    labels: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Balance dataset by oversampling underrepresented (group, label) combinations.
    
    Args:
        X: Feature matrix (n_samples, n_features)
        protected: Protected attribute values (n_samples,)
        labels: Target labels (n_samples,)
    
    Returns:
        Balanced (X, protected, labels) tuple
    """
    combinations = list(zip(protected, labels))
    counts = Counter(combinations)
    
    # Find maximum count across all combinations
    max_count = max(counts.values())
    
    # Collect indices with oversampling
    balanced_indices = []
    for combo in counts.keys():
        combo_indices = [i for i, c in enumerate(combinations) if c == combo]
        current_count = len(combo_indices)
        
        if current_count < max_count:
            # Sample with replacement to reach max_count
            oversampled = np.random.choice(
                combo_indices, 
                size=max_count, 
                replace=True
            )
            balanced_indices.extend(oversampled)
        else:
            balanced_indices.extend(combo_indices)
    
    np.random.shuffle(balanced_indices)
    
    return X[balanced_indices], protected[balanced_indices], labels[balanced_indices]

Re-sampling Limitations

Re-weighting Methods

The Re-weighting Objective:

Standard empirical risk minimization: $$\hat{h} = \arg\min_h \frac{1}{n} \sum_{i=1}^n L(h(x_i), y_i)$$

Weighted empirical risk minimization: $$\hat{h}{fair} = \arg\min_h \frac{1}{\sum_i w_i} \sum{i=1}^n w_i \cdot L(h(x_i), y_i)$$

where weights $w_i$ are chosen to counteract bias.

Calders-Verwer Reweighting:

Weight Computation:

For binary protected attribute $A \in {0, 1}$ and binary label $Y \in {0, 1}$:

$$w_{a,y} = \frac{P(A = a) \cdot P(Y = y)}{P(A = a, Y = y)}$$

This reweights to simulate independence between $A$ and $Y$.

Derivation:

In the original data: $P(Y=y | A=a) \neq P(Y=y)$ (groups have different positive rates)

After reweighting: $P_{weighted}(Y=y | A=a) = P(Y=y)$ (groups have equal positive rates)

The weight transforms the dependent distribution into an independent one: $$P_{weighted}(A, Y) = P(A) \cdot P(Y)$$

Calders-Verwer Reweighting
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import numpy as np
from typing import Dict, Tuple
 
def compute_calders_verwer_weights(protected: np.ndarray, 
                                    labels: np.ndarray) -> np.ndarray:
    """
    Compute sample weights following Calders-Verwer reweighting for demographic parity.
    
    The weights transform the joint distribution P(A, Y) to independence: P(A) * P(Y)
    
    Args:
        protected: Binary protected attribute (n_samples,)
        labels: Binary target labels (n_samples,)
    
    Returns:
        Sample weights (n_samples,)
    """
    n = len(protected)
    
    # Compute marginal probabilities
    p_a_1 = np.mean(protected)  # P(A = 1)
    p_a_0 = 1 - p_a_1           # P(A = 0)
    p_y_1 = np.mean(labels)     # P(Y = 1)
    p_y_0 = 1 - p_y_1           # P(Y = 0)
    
    # Compute joint probabilities
    p_a0_y0 = np.mean((protected == 0) & (labels == 0))
    p_a0_y1 = np.mean((protected == 0) & (labels == 1))
    p_a1_y0 = np.mean((protected == 1) & (labels == 0))
    p_a1_y1 = np.mean((protected == 1) & (labels == 1))
    
    # Compute weights: w(a,y) = P(A=a) * P(Y=y) / P(A=a, Y=y)
    weight_map = {
        (0, 0): (p_a_0 * p_y_0) / p_a0_y0 if p_a0_y0 > 0 else 1.0,
        (0, 1): (p_a_0 * p_y_1) / p_a0_y1 if p_a0_y1 > 0 else 1.0,
        (1, 0): (p_a_1 * p_y_0) / p_a1_y0 if p_a1_y0 > 0 else 1.0,
        (1, 1): (p_a_1 * p_y_1) / p_a1_y1 if p_a1_y1 > 0 else 1.0,
    }
    
    # Assign weights to each sample
    weights = np.array([
        weight_map[(int(a), int(y))] 
        for a, y in zip(protected, labels)
    ])
    
    # Normalize to sum to n (preserves effective sample size interpretation)
    weights = weights * (n / np.sum(weights))
    
    return weights
 
 
def train_weighted_model(X: np.ndarray, 
                         y: np.ndarray, 
                         weights: np.ndarray,
                         model_class,
                         **model_kwargs):
    """
    Train a model using sample weights for fairness.
    
    Most sklearn classifiers accept sample_weight parameter.
    """
    model = model_class(**model_kwargs)
    model.fit(X, y, sample_weight=weights)
    return model
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.linear_model import LogisticRegression
    
    # Generate synthetic biased data
    np.random.seed(42)
    n_samples = 10000
    
    protected = np.random.binomial(1, 0.4, n_samples)  # 40% in protected group
    # Biased labels: protected group has lower positive rate
    base_rate = 0.3 + 0.3 * (1 - protected)  # 60% for A=0, 30% for A=1
    labels = np.random.binomial(1, base_rate, n_samples)
    X = np.column_stack([
        np.random.randn(n_samples, 3),  # Random features
        protected * 0.5  # Feature slightly correlated with A
    ])
    
    # Compute fairness weights
    weights = compute_calders_verwer_weights(protected, labels)
    
    print("Original distribution:")
    print(f"  P(Y=1|A=0) = {np.mean(labels[protected == 0]):.3f}")
    print(f"  P(Y=1|A=1) = {np.mean(labels[protected == 1]):.3f}")
    
    print(f"\nWeight statistics:")
    print(f"  Weight range: [{weights.min():.3f}, {weights.max():.3f}]")
    print(f"  Mean weight: {weights.mean():.3f}")

Choosing Between Re-sampling and Re-weighting

Extensions to Multi-Class and Multi-Group Settings:

The Calders-Verwer approach extends naturally to multiple groups and labels:

$$w_{a,y} = \frac{P(A = a) \cdot P(Y = y)}{P(A = a, Y = y)}$$

For $|\mathcal{A}| = k$ groups and $|\mathcal{Y}| = m$ classes, we compute $k \times m$ weights.

Handling Intersectionality:

When multiple protected attributes exist, we can:

Treat intersections as groups: For gender $\times$ race, create 4-6 intersectional groups
Weight multiplicatively: $w_i = w_{gender_i} \cdot w_{race_i}$
Use multi-way independence: Enforce $P(A_1, A_2, Y) = P(A_1) \cdot P(A_2) \cdot P(Y)$

Each approach has tradeoffs—intersectional groups can become very small, multiplicative weights may over-correct, and multi-way independence may be too restrictive.

Disparate Impact Remover

Mathematical Foundation:

Algorithm:

For each feature $X_j$ and group $a$:

Compute the CDF of $X_j$ within group $a$: $F_{j,a}(x) = P(X_j \leq x | A = a)$
Define a target distribution (median of group distributions or uniform)
Transform: $X'{j,i} = F{target}^{-1}(F_{j,a_i}(X_{j,i}))$

This maps each value through its within-group quantile to the corresponding quantile in the target distribution.

Repair Level Parameter:

DIR includes a repair level $\lambda \in [0, 1]$ controlling the transformation strength:

$$X'{j,i} = (1 - \lambda) \cdot X{j,i} + \lambda \cdot \text{Repaired}(X_{j,i})$$

$\lambda = 0$: No transformation (original data)
$\lambda = 1$: Full repair (complete independence from $A$)
$0 < \lambda < 1$: Partial repair (tradeoff between utility and independence)

Disparate Impact Remover
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
import numpy as np
from scipy.interpolate import interp1d
from typing import Dict, Optional
 
class DisparateImpactRemover:
    """
    Remove disparate impact by transforming features to be independent 
    of protected attribute through quantile-based repair.
    
    Based on: Feldman et al. "Certifying and Removing Disparate Impact" (2015)
    """
    
    def __init__(self, repair_level: float = 1.0):
        """
        Args:
            repair_level: Float in [0, 1] controlling transformation strength.
                          0 = no change, 1 = full repair
        """
        assert 0 <= repair_level <= 1, "repair_level must be in [0, 1]"
        self.repair_level = repair_level
        self.cdfs_ = {}  # Group CDFs for each feature
        self.inv_cdfs_ = {}  # Inverse CDFs
        self.target_inv_cdf_ = {}  # Target inverse CDFs
        
    def fit(self, X: np.ndarray, protected: np.ndarray) -> 'DisparateImpactRemover':
        """
        Learn the quantile transformations from training data.
        """
        n_samples, n_features = X.shape
        groups = np.unique(protected)
        
        for j in range(n_features):
            self.cdfs_[j] = {}
            self.inv_cdfs_[j] = {}
            
            # Compute CDF and inverse CDF for each group
            for a in groups:
                mask = protected == a
                values = np.sort(X[mask, j])
                n_group = len(values)
                
                # Empirical CDF: F(x) = proportion of values <= x
                percentiles = np.linspace(0, 1, n_group)
                
                # Create interpolation functions
                # CDF: value -> percentile
                self.cdfs_[j][a] = interp1d(
                    values, percentiles, 
                    bounds_error=False, 
                    fill_value=(0, 1)
                )
                # Inverse CDF: percentile -> value
                self.inv_cdfs_[j][a] = interp1d(
                    percentiles, values,
                    bounds_error=False,
                    fill_value=(values[0], values[-1])
                )
            
            # Target distribution: median of group distributions
            # For each percentile, average the values across groups
            percentiles = np.linspace(0, 1, 1000)
            target_values = np.median([
                self.inv_cdfs_[j][a](percentiles) for a in groups
            ], axis=0)
            
            self.target_inv_cdf_[j] = interp1d(
                percentiles, target_values,
                bounds_error=False,
                fill_value=(target_values[0], target_values[-1])
            )
        
        return self
    
    def transform(self, X: np.ndarray, protected: np.ndarray) -> np.ndarray:
        """
        Apply the repair transformation to features.
        """
        n_samples, n_features = X.shape
        X_repaired = X.copy().astype(float)
        
        for j in range(n_features):
            for a in np.unique(protected):
                mask = protected == a
                
                if a in self.cdfs_[j]:
                    # Map value -> percentile -> target value
                    percentiles = self.cdfs_[j][a](X[mask, j])
                    repaired = self.target_inv_cdf_[j](percentiles)
                    
                    # Apply repair level blending
                    X_repaired[mask, j] = (
                        (1 - self.repair_level) * X[mask, j] +
                        self.repair_level * repaired
                    )
                    
        return X_repaired
    
    def fit_transform(self, X: np.ndarray, protected: np.ndarray) -> np.ndarray:
        """Fit and transform in one step."""
        return self.fit(X, protected).transform(X, protected)
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 2000
    
    # Create data where features differ by group
    protected = np.random.binomial(1, 0.5, n)
    X = np.column_stack([
        np.random.normal(5, 2, n) + 3 * protected,  # Feature differs by group
        np.random.normal(0, 1, n) + 1 * protected,  # Another biased feature
        np.random.normal(10, 3, n)  # Unbiased feature
    ])
    
    print("Before repair:")
    print(f"  Feature 0: Group 0 mean={X[protected==0, 0].mean():.2f}, "
          f"Group 1 mean={X[protected==1, 0].mean():.2f}")
    
    # Apply repair
    remover = DisparateImpactRemover(repair_level=1.0)
    X_repaired = remover.fit_transform(X, protected)
    
    print("\nAfter full repair:")
    print(f"  Feature 0: Group 0 mean={X_repaired[protected==0, 0].mean():.2f}, "
          f"Group 1 mean={X_repaired[protected==1, 0].mean():.2f}")

Rank Preservation Property

Fair Representation Learning

The Core Objective:

Learn encoder $E: \mathcal{X} \rightarrow \mathcal{Z}$ such that:

Utility: $Z = E(X)$ is predictive of $Y$: $\text{min}_g L(g(Z), Y)$
Fairness: $Z$ is uninformative about $A$: $I(Z; A) \approx 0$ or $P(Z|A=0) \approx P(Z|A=1)$

This is formulated as a multi-objective optimization problem.

Zemel et al. (2013) - Learning Fair Representations:

The original fair representation learning paper proposes mapping data to a multinomial distribution over $K$ prototypes, where the mapping satisfies:

Statistical Parity: $P(Z = k | A = 0) \approx P(Z = k | A = 1)$ for all prototypes $k$
Prediction Accuracy: Prototypes are predictive of the target
Reconstruction: The representation preserves input information

Objective Function: $$L = L_{reconstruct} + \alpha \cdot L_{accuracy} + \beta \cdot L_{fairness}$$

where:

$L_{reconstruct}$: Reconstruction error from representation
$L_{accuracy}$: Classification loss for target prediction
$L_{fairness}$: Divergence between group-specific prototype distributions

Variational Fair Autoencoders (VFAE):

Louizos et al. (2016) extend this with variational autoencoders. The model:

Encodes input to latent representation: $z = E_\phi(x)$
Ensures $z$ is independent of $a$ using Maximum Mean Discrepancy (MMD)
Reconstructs input: $\hat{x} = D_\theta(z)$
Predicts target: $\hat{y} = C_\psi(z)$

Loss Function: $$L = L_{ELBO} + \gamma \cdot \text{MMD}(p(z|a=0), p(z|a=1)) + \lambda \cdot L_{classify}$$

The MMD term penalizes differences between the latent distributions for different groups.

Fair Autoencoder Concept
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Tuple
 
class FairAutoencoder(nn.Module):
    """
    A fair autoencoder that learns representations independent of protected attributes.
    Uses adversarial training to remove protected information from latent space.
    """
    
    def __init__(self, 
                 input_dim: int,
                 latent_dim: int = 32,
                 hidden_dim: int = 64):
        super().__init__()
        
        # Encoder: X -> Z
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, latent_dim)
        )
        
        # Decoder: Z -> X (reconstruction)
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim)
        )
        
        # Target predictor: Z -> Y
        self.predictor = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, 1),
            nn.Sigmoid()
        )
        
        # Adversary: Z -> A (tries to predict protected attribute)
        self.adversary = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, 1),
            nn.Sigmoid()
        )
        
    def encode(self, x: torch.Tensor) -> torch.Tensor:
        """Map input to fair latent representation."""
        return self.encoder(x)
    
    def decode(self, z: torch.Tensor) -> torch.Tensor:
        """Reconstruct input from latent representation."""
        return self.decoder(z)
    
    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, ...]:
        z = self.encode(x)
        x_reconstructed = self.decode(z)
        y_pred = self.predictor(z)
        a_pred = self.adversary(z)
        return z, x_reconstructed, y_pred, a_pred
 
 
def compute_mmd(x: torch.Tensor, y: torch.Tensor, 
                kernel: str = 'rbf') -> torch.Tensor:
    """
    Compute Maximum Mean Discrepancy between two distributions.
    MMD = 0 iff distributions are identical.
    """
    def rbf_kernel(x, y, sigma=1.0):
        dist = torch.cdist(x, y, p=2)
        return torch.exp(-dist**2 / (2 * sigma**2))
    
    xx = rbf_kernel(x, x).mean()
    yy = rbf_kernel(y, y).mean()
    xy = rbf_kernel(x, y).mean()
    
    return xx + yy - 2 * xy
 
 
def train_fair_autoencoder(model: FairAutoencoder,
                           X: torch.Tensor,
                           protected: torch.Tensor,
                           labels: torch.Tensor,
                           epochs: int = 100,
                           fairness_weight: float = 1.0,
                           adversary_weight: float = 1.0):
    """
    Train fair autoencoder with adversarial debiasing.
    
    The adversary tries to predict protected attribute from Z.
    The encoder tries to fool the adversary (minimize adversary accuracy).
    """
    # Separate optimizers for encoder/decoder and adversary
    main_params = list(model.encoder.parameters()) + \
                  list(model.decoder.parameters()) + \
                  list(model.predictor.parameters())
    
    main_optimizer = torch.optim.Adam(main_params, lr=0.001)
    adversary_optimizer = torch.optim.Adam(model.adversary.parameters(), lr=0.001)
    
    for epoch in range(epochs):
        model.train()
        
        # Forward pass
        z, x_recon, y_pred, a_pred = model(X)
        
        # Losses
        recon_loss = F.mse_loss(x_recon, X)
        pred_loss = F.binary_cross_entropy(y_pred.squeeze(), labels.float())
        adversary_loss = F.binary_cross_entropy(a_pred.squeeze(), protected.float())
        
        # MMD fairness loss
        z_group0 = z[protected == 0]
        z_group1 = z[protected == 1]
        if len(z_group0) > 0 and len(z_group1) > 0:
            mmd_loss = compute_mmd(z_group0, z_group1)
        else:
            mmd_loss = torch.tensor(0.0)
        
        # Train adversary to predict protected attribute
        adversary_optimizer.zero_grad()
        adversary_loss.backward(retain_graph=True)
        adversary_optimizer.step()
        
        # Train main model: minimize reconstruction + prediction + MMD
        # Also minimize negative adversary loss (fool adversary)
        main_optimizer.zero_grad()
        main_loss = (recon_loss + 
                    pred_loss + 
                    fairness_weight * mmd_loss - 
                    adversary_weight * adversary_loss)
        main_loss.backward()
        main_optimizer.step()
        
        if (epoch + 1) % 20 == 0:
            print(f"Epoch {epoch+1}: Recon={recon_loss:.4f}, "
                  f"Pred={pred_loss:.4f}, MMD={mmd_loss:.4f}")

Representation Learning Advantages

Massaging and Relabeling Techniques

Massaging (Kamiran & Calders, 2012):

Identify samples near the decision boundary where label changes would most reduce discrimination while least affecting classification:

Train a preliminary classifier on original data
Rank samples by predicted probability
Identify 'promotion candidates' in underprivileged group (negative labels but high predicted probability)
Identify 'demotion candidates' in privileged group (positive labels but low predicted probability)
Flip labels for selected candidates to equalize positive rates

Mathematical Formulation:

Let $D = \text{P}(Y=1|A=1) - \text{P}(Y=1|A=0)$ be the discrimination in positive rates.

To achieve $D' = 0$, we need to change approximately: $$M = \frac{|D| \cdot n_0 \cdot n_1}{n}$$

labels, where $n_a$ is the count of group $a$.

Promotion Candidates: In group $A=0$, samples with:

Original label $Y = 0$
High predicted probability $P(Y=1|X) > \theta_{promote}$

Demotion Candidates: In group $A=1$, samples with:

Original label $Y = 1$
Low predicted probability $P(Y=1|X) < \theta_{demote}$

Thresholds are set to change exactly $M/2$ labels in each direction.

Label Massaging Implementation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
import numpy as np
from sklearn.base import clone
from typing import Tuple
 
class LabelMassaging:
    """
    Pre-processing technique that modifies labels to reduce discriminatation.
    Changes labels for samples near decision boundary to equalize positive rates.
    """
    
    def __init__(self, base_estimator, target_discrimination: float = 0.0):
        """
        Args:
            base_estimator: Classifier used to rank samples by prediction confidence
            target_discrimination: Target difference in positive rates (default 0)
        """
        self.base_estimator = base_estimator
        self.target_discrimination = target_discrimination
        
    def fit_transform(self, X: np.ndarray, y: np.ndarray, 
                      protected: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Massage labels to reduce discrimination.
        
        Returns:
            Tuple of (X, y_massaged) with modified labels
        """
        # Train preliminary classifier
        temp_clf = clone(self.base_estimator)
        temp_clf.fit(X, y)
        probs = temp_clf.predict_proba(X)[:, 1]  # P(Y=1|X)
        
        # Compute current discrimination
        rate_0 = np.mean(y[protected == 0])  # P(Y=1|A=0)
        rate_1 = np.mean(y[protected == 1])  # P(Y=1|A=1)
        current_disc = rate_1 - rate_0
        
        n_0 = np.sum(protected == 0)
        n_1 = np.sum(protected == 1)
        n = len(y)
        
        # Number of labels to change
        target_change = abs(current_disc - self.target_discrimination) * n_0 * n_1 / n
        m = int(np.ceil(target_change / 2))  # Changes per direction
        
        y_massaged = y.copy()
        
        if current_disc > self.target_discrimination:
            # Need to promote group 0 (increase their positive rate)
            # or demote group 1 (decrease their positive rate)
            
            # Promotion candidates: group 0, y=0, high probability
            promo_mask = (protected == 0) & (y == 0)
            promo_indices = np.where(promo_mask)[0]
            promo_probs = probs[promo_mask]
            promo_sorted = promo_indices[np.argsort(-promo_probs)]  # Highest first
            
            # Demotion candidates: group 1, y=1, low probability
            demo_mask = (protected == 1) & (y == 1)
            demo_indices = np.where(demo_mask)[0]
            demo_probs = probs[demo_mask]
            demo_sorted = demo_indices[np.argsort(demo_probs)]  # Lowest first
            
            # Apply changes
            n_promo = min(m, len(promo_sorted))
            n_demo = min(m, len(demo_sorted))
            
            for idx in promo_sorted[:n_promo]:
                y_massaged[idx] = 1  # Promote: 0 -> 1
            for idx in demo_sorted[:n_demo]:
                y_massaged[idx] = 0  # Demote: 1 -> 0
                
        else:
            # Opposite direction: demote group 0, promote group 1
            # Demotion candidates: group 0, y=1, low probability
            demo_mask = (protected == 0) & (y == 1)
            demo_indices = np.where(demo_mask)[0]
            demo_probs = probs[demo_mask]
            demo_sorted = demo_indices[np.argsort(demo_probs)]
            
            # Promotion candidates: group 1, y=0, high probability
            promo_mask = (protected == 1) & (y == 0)
            promo_indices = np.where(promo_mask)[0]
            promo_probs = probs[promo_mask]
            promo_sorted = promo_indices[np.argsort(-promo_probs)]
            
            n_promo = min(m, len(promo_sorted))
            n_demo = min(m, len(demo_sorted))
            
            for idx in promo_sorted[:n_promo]:
                y_massaged[idx] = 1
            for idx in demo_sorted[:n_demo]:
                y_massaged[idx] = 0
        
        # Report changes
        changes = np.sum(y != y_massaged)
        print(f"Massaged {changes} labels ({100*changes/n:.1f}%)")
        print(f"Original: P(Y=1|A=0)={rate_0:.3f}, P(Y=1|A=1)={rate_1:.3f}")
        new_rate_0 = np.mean(y_massaged[protected == 0])
        new_rate_1 = np.mean(y_massaged[protected == 1])
        print(f"Massaged: P(Y=1|A=0)={new_rate_0:.3f}, P(Y=1|A=1)={new_rate_1:.3f}")
        
        return X, y_massaged

Ethical Considerations of Label Modification

Comparing Pre-processing Methods

Each pre-processing method offers distinct tradeoffs. The optimal choice depends on your specific constraints, data characteristics, and fairness requirements.

Pre-processing Methods Comparison
Method	Strengths	Limitations	Best When
Re-sampling	Simple, interpretable, preserves feature values	May discard data, doesn't address X-A correlation	Imbalanced groups with sufficient data
Re-weighting	Preserves all data, continuous control	Requires algorithm support, variance increase	Algorithm supports weights, all data valuable
Disparate Impact Remover	Feature-level independence, rank-preserving	May distort features, single-attribute focus	Features encode protected info clearly
Fair Representations	Powerful, handles complex correlations	Requires training, less interpretable	Complex X-A relationships, reusable embeddings
Massaging/Relabeling	Directly addresses label bias	Philosophically controversial, needs ranker	Labels clearly reflect historical bias

Decision Framework:

Is your main concern imbalanced representation? → Start with re-sampling or re-weighting
Do your features encode protected information? → Consider Disparate Impact Remover or representation learning
Do you have labels from potentially biased historical decisions? → Consider massaging or relabeling approaches
Do you need reusable debiased data/representations? → Fair representation learning provides transferable embeddings
Is interpretability paramount? → Re-sampling and re-weighting are most transparent

Combining Methods:

Pre-processing methods can be combined. For example:

Apply re-weighting AND Disparate Impact Remover
Use label massaging followed by representation learning

Experimentation is essential—measure both fairness and accuracy impacts of each intervention.

Pre-processing as Part of a Broader Strategy

Summary: Pre-processing Methods

Key Takeaways

•Pre-processing modifies data before training, enabling any algorithm to learn fairer models without modification.
•Re-sampling changes which examples appear in training, balancing representation across (group, label) combinations.
•Re-weighting assigns importance weights to samples, adjusting the effective training distribution without discarding data.
•Disparate Impact Remover transforms features to remove correlation with protected attributes while preserving within-group rankings.
•Fair Representation Learning creates new feature spaces that encode predictive information but not protected attribute information.
•Massaging modifies labels for samples near the decision boundary to reduce discrimination in the training signal.
•Each method has tradeoffs between simplicity, interpretability, data requirements, and fairness guarantees.
•Pre-processing works best as part of a comprehensive fairness strategy, combined with appropriate problem formulation and ongoing monitoring.

What's Next:

Page Complete

2 / 5