Loading learning content...
Pre-processing methods for bias mitigation operate on a simple but powerful premise: if the data is the source of bias, fix the data. By modifying training data before it reaches the learning algorithm, we can remove or neutralize discriminatory patterns without requiring any changes to the model itself.
This approach has profound implications. It preserves model agnosticity—you can use any learning algorithm on fair pre-processed data. It enables reuse—debiased data can be shared and used across multiple modeling tasks. And it provides transparency—the transformations applied to data are explicit and auditable.
By the end of this page, you will be able to: (1) Explain the philosophy and tradeoffs of pre-processing approaches to fairness, (2) Implement re-sampling techniques for balancing training data, (3) Apply re-weighting methods to adjust sample importance, (4) Understand data transformation methods including Disparate Impact Remover and Fair Representation Learning, (5) Evaluate when pre-processing is appropriate versus other approaches.
The Pre-processing Landscape:
Pre-processing methods fall into three broad categories:
Each approach has distinct advantages and limitations. Re-sampling and re-weighting are simple to implement but may not address subtle correlations. Data transformation can achieve stronger fairness guarantees but may lose useful information. The right choice depends on your specific context, data, and fairness requirements.
Before examining specific techniques, let's establish the mathematical foundation that unifies pre-processing approaches.
The Data Distribution Transformation Problem:
Let $D_{original} = {(x_i, a_i, y_i)}_{i=1}^n$ be the original training dataset, where:
Our goal is to construct a transformed dataset $D_{fair}$ such that any model trained on $D_{fair}$ satisfies a given fairness constraint while maintaining predictive utility.
Formal Objective:
$$D_{fair} = T(D_{original})$$
where transformation $T$ satisfies:
Pre-processing inherently involves a fairness-utility tradeoff. The more aggressively we remove discriminatory information, the more predictive power we may sacrifice. This tradeoff is not a bug but a fundamental property of fairness in prediction.
Why Pre-processing Works:
The key insight is that statistical dependence between protected attributes and outcomes can be decomposed into:
Pre-processing attempts to remove or reduce illegitimate dependence. The challenge is distinguishing between these two types, which is fundamentally a causal inference problem.
Statistical Independence Goal:
Many pre-processing methods target statistical independence between features and protected attributes:
$$P_{D_{fair}}(X | A) = P_{D_{fair}}(X)$$
This ensures features carry no information about protected attributes, preventing the model from using protected information (even implicitly) for prediction.
Re-sampling modifies the training set composition by changing which examples appear and how often. This is the simplest class of pre-processing methods, requiring no feature transformation, only sample selection.
Core Intuition: If bias arises from imbalanced representation—more examples of one group with certain outcomes—we can rebalance by duplicating underrepresented combinations or removing overrepresented ones.
Undersampling for Fairness:
Undersampling removes examples to balance the joint distribution of $(A, Y)$. The goal is to create a dataset where:
$$P_{sampled}(A = a, Y = y) = \frac{1}{|\mathcal{A}| \cdot |\mathcal{Y}|}$$
Algorithm:
1. Compute counts N(a,y) for each (protected_attribute, label) combination
2. Find minimum count: N_min = min_{a,y} N(a,y)
3. For each combination (a,y):
- Randomly sample N_min examples from the N(a,y) available
4. Combine all samples into the balanced dataset
Advantages:
Disadvantages:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
import numpy as npfrom collections import Counterfrom typing import Tuple, List def fair_undersample(X: np.ndarray, protected: np.ndarray, labels: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """ Balance dataset by undersampling overrepresented (group, label) combinations. Args: X: Feature matrix (n_samples, n_features) protected: Protected attribute values (n_samples,) labels: Target labels (n_samples,) Returns: Balanced (X, protected, labels) tuple """ # Create (group, label) tuples combinations = list(zip(protected, labels)) counts = Counter(combinations) # Find minimum count across all combinations min_count = min(counts.values()) # Collect indices for balanced sampling balanced_indices = [] for combo in counts.keys(): combo_indices = [i for i, c in enumerate(combinations) if c == combo] sampled = np.random.choice(combo_indices, size=min_count, replace=False) balanced_indices.extend(sampled) # Shuffle to prevent ordering effects np.random.shuffle(balanced_indices) return X[balanced_indices], protected[balanced_indices], labels[balanced_indices] def fair_oversample(X: np.ndarray, protected: np.ndarray, labels: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """ Balance dataset by oversampling underrepresented (group, label) combinations. Args: X: Feature matrix (n_samples, n_features) protected: Protected attribute values (n_samples,) labels: Target labels (n_samples,) Returns: Balanced (X, protected, labels) tuple """ combinations = list(zip(protected, labels)) counts = Counter(combinations) # Find maximum count across all combinations max_count = max(counts.values()) # Collect indices with oversampling balanced_indices = [] for combo in counts.keys(): combo_indices = [i for i, c in enumerate(combinations) if c == combo] current_count = len(combo_indices) if current_count < max_count: # Sample with replacement to reach max_count oversampled = np.random.choice( combo_indices, size=max_count, replace=True ) balanced_indices.extend(oversampled) else: balanced_indices.extend(combo_indices) np.random.shuffle(balanced_indices) return X[balanced_indices], protected[balanced_indices], labels[balanced_indices]Re-sampling balances the marginal distribution of (A, Y) but doesn't address correlations between features X and protected attribute A. If features encode protected information (e.g., name encoding gender, zip code encoding race), re-sampling alone won't prevent discrimination.
Re-weighting assigns different importance weights to training samples, modifying the effective training distribution without changing the samples themselves. This preserves all data while adjusting the optimization objective.
The Re-weighting Objective:
Standard empirical risk minimization: $$\hat{h} = \arg\min_h \frac{1}{n} \sum_{i=1}^n L(h(x_i), y_i)$$
Weighted empirical risk minimization: $$\hat{h}{fair} = \arg\min_h \frac{1}{\sum_i w_i} \sum{i=1}^n w_i \cdot L(h(x_i), y_i)$$
where weights $w_i$ are chosen to counteract bias.
Calders-Verwer Reweighting:
The seminal reweighting approach by Calders and Verwer (2010) assigns weights to achieve demographic parity. The key insight is that discrimination arises from different base rates of positive outcomes across groups.
Weight Computation:
For binary protected attribute $A \in {0, 1}$ and binary label $Y \in {0, 1}$:
$$w_{a,y} = \frac{P(A = a) \cdot P(Y = y)}{P(A = a, Y = y)}$$
This reweights to simulate independence between $A$ and $Y$.
Derivation:
In the original data: $P(Y=y | A=a) \neq P(Y=y)$ (groups have different positive rates)
After reweighting: $P_{weighted}(Y=y | A=a) = P(Y=y)$ (groups have equal positive rates)
The weight transforms the dependent distribution into an independent one: $$P_{weighted}(A, Y) = P(A) \cdot P(Y)$$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
import numpy as npfrom typing import Dict, Tuple def compute_calders_verwer_weights(protected: np.ndarray, labels: np.ndarray) -> np.ndarray: """ Compute sample weights following Calders-Verwer reweighting for demographic parity. The weights transform the joint distribution P(A, Y) to independence: P(A) * P(Y) Args: protected: Binary protected attribute (n_samples,) labels: Binary target labels (n_samples,) Returns: Sample weights (n_samples,) """ n = len(protected) # Compute marginal probabilities p_a_1 = np.mean(protected) # P(A = 1) p_a_0 = 1 - p_a_1 # P(A = 0) p_y_1 = np.mean(labels) # P(Y = 1) p_y_0 = 1 - p_y_1 # P(Y = 0) # Compute joint probabilities p_a0_y0 = np.mean((protected == 0) & (labels == 0)) p_a0_y1 = np.mean((protected == 0) & (labels == 1)) p_a1_y0 = np.mean((protected == 1) & (labels == 0)) p_a1_y1 = np.mean((protected == 1) & (labels == 1)) # Compute weights: w(a,y) = P(A=a) * P(Y=y) / P(A=a, Y=y) weight_map = { (0, 0): (p_a_0 * p_y_0) / p_a0_y0 if p_a0_y0 > 0 else 1.0, (0, 1): (p_a_0 * p_y_1) / p_a0_y1 if p_a0_y1 > 0 else 1.0, (1, 0): (p_a_1 * p_y_0) / p_a1_y0 if p_a1_y0 > 0 else 1.0, (1, 1): (p_a_1 * p_y_1) / p_a1_y1 if p_a1_y1 > 0 else 1.0, } # Assign weights to each sample weights = np.array([ weight_map[(int(a), int(y))] for a, y in zip(protected, labels) ]) # Normalize to sum to n (preserves effective sample size interpretation) weights = weights * (n / np.sum(weights)) return weights def train_weighted_model(X: np.ndarray, y: np.ndarray, weights: np.ndarray, model_class, **model_kwargs): """ Train a model using sample weights for fairness. Most sklearn classifiers accept sample_weight parameter. """ model = model_class(**model_kwargs) model.fit(X, y, sample_weight=weights) return model # Example usageif __name__ == "__main__": from sklearn.linear_model import LogisticRegression # Generate synthetic biased data np.random.seed(42) n_samples = 10000 protected = np.random.binomial(1, 0.4, n_samples) # 40% in protected group # Biased labels: protected group has lower positive rate base_rate = 0.3 + 0.3 * (1 - protected) # 60% for A=0, 30% for A=1 labels = np.random.binomial(1, base_rate, n_samples) X = np.column_stack([ np.random.randn(n_samples, 3), # Random features protected * 0.5 # Feature slightly correlated with A ]) # Compute fairness weights weights = compute_calders_verwer_weights(protected, labels) print("Original distribution:") print(f" P(Y=1|A=0) = {np.mean(labels[protected == 0]):.3f}") print(f" P(Y=1|A=1) = {np.mean(labels[protected == 1]):.3f}") print(f"\nWeight statistics:") print(f" Weight range: [{weights.min():.3f}, {weights.max():.3f}]") print(f" Mean weight: {weights.mean():.3f}")Re-weighting is generally preferred over re-sampling when: (1) You can't afford to discard any data, (2) Your learning algorithm supports sample weights, (3) You want more continuous control over the fairness-utility tradeoff. Re-sampling is preferred when: (1) Sample weights are not supported, (2) You want a simpler, more transparent transformation.
Extensions to Multi-Class and Multi-Group Settings:
The Calders-Verwer approach extends naturally to multiple groups and labels:
$$w_{a,y} = \frac{P(A = a) \cdot P(Y = y)}{P(A = a, Y = y)}$$
For $|\mathcal{A}| = k$ groups and $|\mathcal{Y}| = m$ classes, we compute $k \times m$ weights.
Handling Intersectionality:
When multiple protected attributes exist, we can:
Each approach has tradeoffs—intersectional groups can become very small, multiplicative weights may over-correct, and multi-way independence may be too restrictive.
Disparate Impact Remover (DIR), introduced by Feldman et al. (2015), takes a fundamentally different approach: instead of changing which samples are used or how they're weighted, it transforms the feature values themselves to remove information about protected attributes.
Core Idea: Modify each feature's distribution so that the conditional distributions $P(X_j | A = a)$ become identical across groups. After transformation, knowing someone's group membership provides no information about their feature values—and vice versa.
Mathematical Foundation:
For each feature $X_j$, we apply a rank-preserving transformation that maps the group-specific distributions to a common target distribution. The transformation uses the quantile function (inverse CDF).
Algorithm:
For each feature $X_j$ and group $a$:
This maps each value through its within-group quantile to the corresponding quantile in the target distribution.
Repair Level Parameter:
DIR includes a repair level $\lambda \in [0, 1]$ controlling the transformation strength:
$$X'{j,i} = (1 - \lambda) \cdot X{j,i} + \lambda \cdot \text{Repaired}(X_{j,i})$$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
import numpy as npfrom scipy.interpolate import interp1dfrom typing import Dict, Optional class DisparateImpactRemover: """ Remove disparate impact by transforming features to be independent of protected attribute through quantile-based repair. Based on: Feldman et al. "Certifying and Removing Disparate Impact" (2015) """ def __init__(self, repair_level: float = 1.0): """ Args: repair_level: Float in [0, 1] controlling transformation strength. 0 = no change, 1 = full repair """ assert 0 <= repair_level <= 1, "repair_level must be in [0, 1]" self.repair_level = repair_level self.cdfs_ = {} # Group CDFs for each feature self.inv_cdfs_ = {} # Inverse CDFs self.target_inv_cdf_ = {} # Target inverse CDFs def fit(self, X: np.ndarray, protected: np.ndarray) -> 'DisparateImpactRemover': """ Learn the quantile transformations from training data. """ n_samples, n_features = X.shape groups = np.unique(protected) for j in range(n_features): self.cdfs_[j] = {} self.inv_cdfs_[j] = {} # Compute CDF and inverse CDF for each group for a in groups: mask = protected == a values = np.sort(X[mask, j]) n_group = len(values) # Empirical CDF: F(x) = proportion of values <= x percentiles = np.linspace(0, 1, n_group) # Create interpolation functions # CDF: value -> percentile self.cdfs_[j][a] = interp1d( values, percentiles, bounds_error=False, fill_value=(0, 1) ) # Inverse CDF: percentile -> value self.inv_cdfs_[j][a] = interp1d( percentiles, values, bounds_error=False, fill_value=(values[0], values[-1]) ) # Target distribution: median of group distributions # For each percentile, average the values across groups percentiles = np.linspace(0, 1, 1000) target_values = np.median([ self.inv_cdfs_[j][a](percentiles) for a in groups ], axis=0) self.target_inv_cdf_[j] = interp1d( percentiles, target_values, bounds_error=False, fill_value=(target_values[0], target_values[-1]) ) return self def transform(self, X: np.ndarray, protected: np.ndarray) -> np.ndarray: """ Apply the repair transformation to features. """ n_samples, n_features = X.shape X_repaired = X.copy().astype(float) for j in range(n_features): for a in np.unique(protected): mask = protected == a if a in self.cdfs_[j]: # Map value -> percentile -> target value percentiles = self.cdfs_[j][a](X[mask, j]) repaired = self.target_inv_cdf_[j](percentiles) # Apply repair level blending X_repaired[mask, j] = ( (1 - self.repair_level) * X[mask, j] + self.repair_level * repaired ) return X_repaired def fit_transform(self, X: np.ndarray, protected: np.ndarray) -> np.ndarray: """Fit and transform in one step.""" return self.fit(X, protected).transform(X, protected) # Demonstrationif __name__ == "__main__": np.random.seed(42) n = 2000 # Create data where features differ by group protected = np.random.binomial(1, 0.5, n) X = np.column_stack([ np.random.normal(5, 2, n) + 3 * protected, # Feature differs by group np.random.normal(0, 1, n) + 1 * protected, # Another biased feature np.random.normal(10, 3, n) # Unbiased feature ]) print("Before repair:") print(f" Feature 0: Group 0 mean={X[protected==0, 0].mean():.2f}, " f"Group 1 mean={X[protected==1, 0].mean():.2f}") # Apply repair remover = DisparateImpactRemover(repair_level=1.0) X_repaired = remover.fit_transform(X, protected) print("\nAfter full repair:") print(f" Feature 0: Group 0 mean={X_repaired[protected==0, 0].mean():.2f}, " f"Group 1 mean={X_repaired[protected==1, 0].mean():.2f}")Disparate Impact Remover preserves within-group rankings: if person A had a higher feature value than person B within their group, this remains true after transformation. This is a desirable property when relative comparisons within groups are meaningful.
Fair Representation Learning learns a new feature representation that encodes useful information for prediction while being uninformative about protected attributes. Unlike Disparate Impact Remover which modifies individual features, representation learning methods create entirely new feature spaces using neural networks or other learnable mappings.
The Core Objective:
Learn encoder $E: \mathcal{X} \rightarrow \mathcal{Z}$ such that:
This is formulated as a multi-objective optimization problem.
Zemel et al. (2013) - Learning Fair Representations:
The original fair representation learning paper proposes mapping data to a multinomial distribution over $K$ prototypes, where the mapping satisfies:
Objective Function: $$L = L_{reconstruct} + \alpha \cdot L_{accuracy} + \beta \cdot L_{fairness}$$
where:
Variational Fair Autoencoders (VFAE):
Louizos et al. (2016) extend this with variational autoencoders. The model:
Loss Function: $$L = L_{ELBO} + \gamma \cdot \text{MMD}(p(z|a=0), p(z|a=1)) + \lambda \cdot L_{classify}$$
The MMD term penalizes differences between the latent distributions for different groups.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
import torchimport torch.nn as nnimport torch.nn.functional as Ffrom typing import Tuple class FairAutoencoder(nn.Module): """ A fair autoencoder that learns representations independent of protected attributes. Uses adversarial training to remove protected information from latent space. """ def __init__(self, input_dim: int, latent_dim: int = 32, hidden_dim: int = 64): super().__init__() # Encoder: X -> Z self.encoder = nn.Sequential( nn.Linear(input_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, latent_dim) ) # Decoder: Z -> X (reconstruction) self.decoder = nn.Sequential( nn.Linear(latent_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, input_dim) ) # Target predictor: Z -> Y self.predictor = nn.Sequential( nn.Linear(latent_dim, hidden_dim // 2), nn.ReLU(), nn.Linear(hidden_dim // 2, 1), nn.Sigmoid() ) # Adversary: Z -> A (tries to predict protected attribute) self.adversary = nn.Sequential( nn.Linear(latent_dim, hidden_dim // 2), nn.ReLU(), nn.Linear(hidden_dim // 2, 1), nn.Sigmoid() ) def encode(self, x: torch.Tensor) -> torch.Tensor: """Map input to fair latent representation.""" return self.encoder(x) def decode(self, z: torch.Tensor) -> torch.Tensor: """Reconstruct input from latent representation.""" return self.decoder(z) def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, ...]: z = self.encode(x) x_reconstructed = self.decode(z) y_pred = self.predictor(z) a_pred = self.adversary(z) return z, x_reconstructed, y_pred, a_pred def compute_mmd(x: torch.Tensor, y: torch.Tensor, kernel: str = 'rbf') -> torch.Tensor: """ Compute Maximum Mean Discrepancy between two distributions. MMD = 0 iff distributions are identical. """ def rbf_kernel(x, y, sigma=1.0): dist = torch.cdist(x, y, p=2) return torch.exp(-dist**2 / (2 * sigma**2)) xx = rbf_kernel(x, x).mean() yy = rbf_kernel(y, y).mean() xy = rbf_kernel(x, y).mean() return xx + yy - 2 * xy def train_fair_autoencoder(model: FairAutoencoder, X: torch.Tensor, protected: torch.Tensor, labels: torch.Tensor, epochs: int = 100, fairness_weight: float = 1.0, adversary_weight: float = 1.0): """ Train fair autoencoder with adversarial debiasing. The adversary tries to predict protected attribute from Z. The encoder tries to fool the adversary (minimize adversary accuracy). """ # Separate optimizers for encoder/decoder and adversary main_params = list(model.encoder.parameters()) + \ list(model.decoder.parameters()) + \ list(model.predictor.parameters()) main_optimizer = torch.optim.Adam(main_params, lr=0.001) adversary_optimizer = torch.optim.Adam(model.adversary.parameters(), lr=0.001) for epoch in range(epochs): model.train() # Forward pass z, x_recon, y_pred, a_pred = model(X) # Losses recon_loss = F.mse_loss(x_recon, X) pred_loss = F.binary_cross_entropy(y_pred.squeeze(), labels.float()) adversary_loss = F.binary_cross_entropy(a_pred.squeeze(), protected.float()) # MMD fairness loss z_group0 = z[protected == 0] z_group1 = z[protected == 1] if len(z_group0) > 0 and len(z_group1) > 0: mmd_loss = compute_mmd(z_group0, z_group1) else: mmd_loss = torch.tensor(0.0) # Train adversary to predict protected attribute adversary_optimizer.zero_grad() adversary_loss.backward(retain_graph=True) adversary_optimizer.step() # Train main model: minimize reconstruction + prediction + MMD # Also minimize negative adversary loss (fool adversary) main_optimizer.zero_grad() main_loss = (recon_loss + pred_loss + fairness_weight * mmd_loss - adversary_weight * adversary_loss) main_loss.backward() main_optimizer.step() if (epoch + 1) % 20 == 0: print(f"Epoch {epoch+1}: Recon={recon_loss:.4f}, " f"Pred={pred_loss:.4f}, MMD={mmd_loss:.4f}")Fair representation learning is particularly powerful when: (1) The relationship between X and A is complex and nonlinear, (2) You want a single representation usable for multiple downstream tasks, (3) You're building a data pipeline where model architectures may change but fairness requirements persist.
Massaging and relabeling approaches modify the target labels rather than features. The intuition is that historical bias manifests in labels—if past decisions were discriminatory, the labels reflecting those decisions are biased.
Massaging (Kamiran & Calders, 2012):
Identify samples near the decision boundary where label changes would most reduce discrimination while least affecting classification:
Mathematical Formulation:
Let $D = \text{P}(Y=1|A=1) - \text{P}(Y=1|A=0)$ be the discrimination in positive rates.
To achieve $D' = 0$, we need to change approximately: $$M = \frac{|D| \cdot n_0 \cdot n_1}{n}$$
labels, where $n_a$ is the count of group $a$.
Promotion Candidates: In group $A=0$, samples with:
Demotion Candidates: In group $A=1$, samples with:
Thresholds are set to change exactly $M/2$ labels in each direction.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
import numpy as npfrom sklearn.base import clonefrom typing import Tuple class LabelMassaging: """ Pre-processing technique that modifies labels to reduce discriminatation. Changes labels for samples near decision boundary to equalize positive rates. """ def __init__(self, base_estimator, target_discrimination: float = 0.0): """ Args: base_estimator: Classifier used to rank samples by prediction confidence target_discrimination: Target difference in positive rates (default 0) """ self.base_estimator = base_estimator self.target_discrimination = target_discrimination def fit_transform(self, X: np.ndarray, y: np.ndarray, protected: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """ Massage labels to reduce discrimination. Returns: Tuple of (X, y_massaged) with modified labels """ # Train preliminary classifier temp_clf = clone(self.base_estimator) temp_clf.fit(X, y) probs = temp_clf.predict_proba(X)[:, 1] # P(Y=1|X) # Compute current discrimination rate_0 = np.mean(y[protected == 0]) # P(Y=1|A=0) rate_1 = np.mean(y[protected == 1]) # P(Y=1|A=1) current_disc = rate_1 - rate_0 n_0 = np.sum(protected == 0) n_1 = np.sum(protected == 1) n = len(y) # Number of labels to change target_change = abs(current_disc - self.target_discrimination) * n_0 * n_1 / n m = int(np.ceil(target_change / 2)) # Changes per direction y_massaged = y.copy() if current_disc > self.target_discrimination: # Need to promote group 0 (increase their positive rate) # or demote group 1 (decrease their positive rate) # Promotion candidates: group 0, y=0, high probability promo_mask = (protected == 0) & (y == 0) promo_indices = np.where(promo_mask)[0] promo_probs = probs[promo_mask] promo_sorted = promo_indices[np.argsort(-promo_probs)] # Highest first # Demotion candidates: group 1, y=1, low probability demo_mask = (protected == 1) & (y == 1) demo_indices = np.where(demo_mask)[0] demo_probs = probs[demo_mask] demo_sorted = demo_indices[np.argsort(demo_probs)] # Lowest first # Apply changes n_promo = min(m, len(promo_sorted)) n_demo = min(m, len(demo_sorted)) for idx in promo_sorted[:n_promo]: y_massaged[idx] = 1 # Promote: 0 -> 1 for idx in demo_sorted[:n_demo]: y_massaged[idx] = 0 # Demote: 1 -> 0 else: # Opposite direction: demote group 0, promote group 1 # Demotion candidates: group 0, y=1, low probability demo_mask = (protected == 0) & (y == 1) demo_indices = np.where(demo_mask)[0] demo_probs = probs[demo_mask] demo_sorted = demo_indices[np.argsort(demo_probs)] # Promotion candidates: group 1, y=0, high probability promo_mask = (protected == 1) & (y == 0) promo_indices = np.where(promo_mask)[0] promo_probs = probs[promo_mask] promo_sorted = promo_indices[np.argsort(-promo_probs)] n_promo = min(m, len(promo_sorted)) n_demo = min(m, len(demo_sorted)) for idx in promo_sorted[:n_promo]: y_massaged[idx] = 1 for idx in demo_sorted[:n_demo]: y_massaged[idx] = 0 # Report changes changes = np.sum(y != y_massaged) print(f"Massaged {changes} labels ({100*changes/n:.1f}%)") print(f"Original: P(Y=1|A=0)={rate_0:.3f}, P(Y=1|A=1)={rate_1:.3f}") new_rate_0 = np.mean(y_massaged[protected == 0]) new_rate_1 = np.mean(y_massaged[protected == 1]) print(f"Massaged: P(Y=1|A=0)={new_rate_0:.3f}, P(Y=1|A=1)={new_rate_1:.3f}") return X, y_massagedModifying labels is philosophically controversial. It assumes we know labels are wrong and can correct them. In practice, this may be justified when labels clearly reflect historical discrimination (e.g., biased hiring decisions). However, it requires careful judgment about when labels are 'biased' versus reflecting legitimate differences.
Each pre-processing method offers distinct tradeoffs. The optimal choice depends on your specific constraints, data characteristics, and fairness requirements.
| Method | Strengths | Limitations | Best When |
|---|---|---|---|
| Re-sampling | Simple, interpretable, preserves feature values | May discard data, doesn't address X-A correlation | Imbalanced groups with sufficient data |
| Re-weighting | Preserves all data, continuous control | Requires algorithm support, variance increase | Algorithm supports weights, all data valuable |
| Disparate Impact Remover | Feature-level independence, rank-preserving | May distort features, single-attribute focus | Features encode protected info clearly |
| Fair Representations | Powerful, handles complex correlations | Requires training, less interpretable | Complex X-A relationships, reusable embeddings |
| Massaging/Relabeling | Directly addresses label bias | Philosophically controversial, needs ranker | Labels clearly reflect historical bias |
Decision Framework:
Is your main concern imbalanced representation? → Start with re-sampling or re-weighting
Do your features encode protected information? → Consider Disparate Impact Remover or representation learning
Do you have labels from potentially biased historical decisions? → Consider massaging or relabeling approaches
Do you need reusable debiased data/representations? → Fair representation learning provides transferable embeddings
Is interpretability paramount? → Re-sampling and re-weighting are most transparent
Combining Methods:
Pre-processing methods can be combined. For example:
Experimentation is essential—measure both fairness and accuracy impacts of each intervention.
Pre-processing is rarely sufficient alone for achieving fairness. It works best as part of a comprehensive approach that includes careful problem formulation, appropriate fairness metrics, in-processing constraints when needed, and post-deployment monitoring. The following pages cover in-processing and post-processing methods that complement pre-processing.
What's Next:
The next page explores in-processing methods—techniques that modify the training algorithm itself to incorporate fairness constraints or objectives directly into the optimization process. While pre-processing acts on data, in-processing integrates fairness into learning.
You now understand the major pre-processing approaches to bias mitigation—from simple re-sampling to sophisticated representation learning. These techniques provide a powerful first line of defense against discriminatory patterns in training data.