Loading learning content...
In 2016, ProPublica published a groundbreaking investigation revealing that COMPAS, a widely-used algorithmic risk assessment tool for criminal justice, was twice as likely to falsely label Black defendants as high-risk compared to white defendants. This wasn't a bug in the traditional sense—the algorithm was optimized for accuracy. But accuracy alone doesn't capture what we intuitively understand as fairness.
This revelation sparked a profound reckoning in the machine learning community. How do we define fairness mathematically? Can algorithms be both accurate and fair? And when different notions of fairness conflict—as they inevitably do—how do we choose?
These aren't merely technical questions. They represent one of the most significant challenges facing modern AI: translating human values into mathematical constraints.
By the end of this page, you will understand the philosophical foundations of fairness, master the formal mathematical definitions used in ML systems, recognize the different categories of fairness criteria, and appreciate why no single definition can satisfy all fairness requirements simultaneously.
Machine learning models are increasingly deployed in high-stakes decision-making contexts that directly impact human lives:
Unlike traditional software bugs that cause system crashes, fairness failures cause harm to people—often to those already marginalized by systemic inequities. When a credit algorithm discriminates based on race (even implicitly), real families are denied housing. When a hiring algorithm penalizes women, career opportunities evaporate.
The stakes demand rigorous understanding of what fairness means and how to achieve it.
ML systems don't just reflect bias—they amplify it. A biased hiring algorithm rejects qualified candidates from underrepresented groups. Those groups then have fewer employees in the training data. The next model iteration becomes more biased. This feedback loop can entrench discrimination at scale.
The Fundamental Challenge:
Fairness is not a single, universally-agreed concept. Different stakeholders, contexts, and ethical frameworks lead to different definitions. A definition that seems fair from one perspective may be deeply unfair from another.
Consider a university admissions algorithm:
These definitions can't all be satisfied simultaneously. Understanding this landscape of competing definitions is essential for any ML practitioner.
Before diving into mathematical definitions, we must understand the philosophical traditions that inform different fairness concepts. These aren't just academic distinctions—they fundamentally shape what we optimize for.
The Three Major Frameworks:
John Rawls and the Veil of Ignorance:
One influential framework comes from philosopher John Rawls. He proposed a thought experiment: imagine designing a society without knowing your position in it (behind a 'veil of ignorance'). What rules would you choose?
Rawls argued we'd choose two principles:
In ML terms, Rawls might suggest we should optimize for the worst-off group, not average performance. This leads to minimax fairness approaches that ensure no group is left behind.
When choosing a fairness definition, you're implicitly choosing a philosophical framework. Be explicit about this choice. Stakeholder discussions should begin with: 'What do we mean by fair?' before any modeling begins.
| Framework | Core Principle | ML Fairness Concept | Example Metric |
|---|---|---|---|
| Egalitarianism | Equal treatment/outcomes | Demographic Parity | P(Ŷ=1|A=0) = P(Ŷ=1|A=1) |
| Libertarianism | Individual merit | Individual Fairness | Similar individuals → similar outcomes |
| Utilitarianism | Maximize total welfare | Overall Accuracy | Maximize correct predictions |
| Rawlsian | Protect worst-off | Minimax Fairness | Maximize accuracy of worst-performing group |
Group fairness definitions compare outcomes across different demographic groups. These are the most commonly used fairness criteria in practice because they're measurable from data.
Let's establish formal notation:
The Three Fundamental Group Fairness Criteria:
Demographic Parity (also called Statistical Parity or Independence) requires that the prediction Ŷ is independent of the protected attribute A.
Formal Definition:
$$P(\hat{Y} = 1 | A = 0) = P(\hat{Y} = 1 | A = 1)$$
In words: the probability of receiving a positive prediction should be the same across all groups.
Example: In a hiring algorithm, demographic parity would require that men and women are selected at equal rates.
Advantages:
Disadvantages:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
def demographic_parity_difference(y_pred, protected_attr): """ Calculate demographic parity difference. Returns 0 if perfect parity, positive if group A=1 is favored. Args: y_pred: Binary predictions (0 or 1) protected_attr: Binary protected attribute (0 or 1) Returns: float: Difference in positive prediction rates between groups """ import numpy as np y_pred = np.array(y_pred) protected_attr = np.array(protected_attr) # Positive prediction rate for group A=0 group_0_rate = y_pred[protected_attr == 0].mean() # Positive prediction rate for group A=1 group_1_rate = y_pred[protected_attr == 1].mean() return group_1_rate - group_0_rate def demographic_parity_ratio(y_pred, protected_attr): """ Calculate demographic parity ratio (disparate impact ratio). Returns 1 if perfect parity. Values < 1 indicate group A=0 is favored. Values > 1 indicate group A=1 is favored. """ import numpy as np y_pred = np.array(y_pred) protected_attr = np.array(protected_attr) group_0_rate = y_pred[protected_attr == 0].mean() group_1_rate = y_pred[protected_attr == 1].mean() # Avoid division by zero if group_0_rate == 0: return float('inf') if group_1_rate > 0 else 1.0 return group_1_rate / group_0_rate # Example usageif __name__ == "__main__": # Simulated hiring decisions predictions = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1] protected = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] # Group membership dp_diff = demographic_parity_difference(predictions, protected) dp_ratio = demographic_parity_ratio(predictions, protected) print(f"Demographic Parity Difference: {dp_diff:.3f}") print(f"Demographic Parity Ratio: {dp_ratio:.3f}") # The 80% rule: ratio should be >= 0.8 print(f"Passes 80% rule: {dp_ratio >= 0.8 and dp_ratio <= 1.25}")While group fairness focuses on statistical parity across demographic groups, individual fairness addresses a more intuitive principle: similar individuals should be treated similarly, regardless of group membership.
Formal Definition (Dwork et al., 2012):
$$d_Y(M(x_1), M(x_2)) \leq L \cdot d_X(x_1, x_2)$$
Where:
In words: if two individuals are 'close' in relevant characteristics, their predicted outcomes should also be 'close.'
The devil is in the details. Who defines the distance metric d_X? What constitutes 'similar' individuals? This metric must capture 'task-relevant similarity' while excluding protected attributes—a deeply non-trivial design choice that often requires domain expertise and stakeholder input.
Example: Loan Approval
Consider two loan applicants:
These applicants are nearly identical on legitimate credit factors. Individual fairness demands they receive similar loan decisions—regardless of gender. If the model approves A but rejects B, it violates individual fairness even if group-level statistics look balanced.
Relationship to Group Fairness:
Individual fairness and group fairness are distinct concepts that neither implies the other:
| Aspect | Group Fairness | Individual Fairness |
|---|---|---|
| Unit of Analysis | Demographic groups | Individual pairs |
| Core Principle | Equal statistics across groups | Similar inputs → similar outputs |
| Measurement | Compare group-level rates | Requires distance metric over individuals |
| Advantage | Easy to measure from data | Intuitive, treats people as individuals |
| Challenge | May violate individual merit | Defining 'similarity' is hard |
| Legal Analog | Disparate impact doctrine | Individual discrimination claims |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
def individual_fairness_violation(X, predictions, distance_metric, outcome_distance, lipschitz_bound=1.0): """ Measure individual fairness violations using Lipschitz constraint. For each pair of individuals, checks: d_Y(M(x1), M(x2)) <= L * d_X(x1, x2) Args: X: Feature matrix (n_samples, n_features) predictions: Model predictions or probabilities distance_metric: Function computing d_X(x1, x2) outcome_distance: Function computing d_Y(pred1, pred2) lipschitz_bound: Maximum allowed ratio L Returns: dict: Violation statistics """ import numpy as np from itertools import combinations n = len(predictions) violations = [] total_pairs = 0 # Check all pairs (expensive for large n) for i, j in combinations(range(n), 2): d_x = distance_metric(X[i], X[j]) d_y = outcome_distance(predictions[i], predictions[j]) max_allowed = lipschitz_bound * d_x if d_y > max_allowed and d_x > 0: violations.append({ 'pair': (i, j), 'd_x': d_x, 'd_y': d_y, 'violation_amount': d_y - max_allowed }) total_pairs += 1 return { 'total_pairs': total_pairs, 'num_violations': len(violations), 'violation_rate': len(violations) / total_pairs if total_pairs > 0 else 0, 'violations': violations[:10] # Return first 10 for inspection } def compute_individual_fairness_metric(X, probabilities, sensitive_cols=None): """ Compute consistency metric for individual fairness. Based on k-NN: similar individuals should have similar predictions. Args: X: Feature matrix probabilities: Predicted probabilities sensitive_cols: Indices of sensitive columns to exclude Returns: float: Consistency score (higher = more individually fair) """ import numpy as np from sklearn.neighbors import NearestNeighbors # Remove sensitive attributes for distance computation if sensitive_cols is not None: X_fair = np.delete(X, sensitive_cols, axis=1) else: X_fair = X # Find k nearest neighbors k = 5 nn = NearestNeighbors(n_neighbors=k+1) # +1 includes self nn.fit(X_fair) _, indices = nn.kneighbors(X_fair) neighbor_indices = indices[:, 1:] # Exclude self # Measure prediction consistency with neighbors consistency_scores = [] for i in range(len(probabilities)): pred_i = probabilities[i] neighbor_preds = probabilities[neighbor_indices[i]] # Average absolute difference with neighbors diff = np.abs(neighbor_preds - pred_i).mean() consistency = 1 - diff # Convert to similarity consistency_scores.append(consistency) return np.mean(consistency_scores) # Example usageif __name__ == "__main__": import numpy as np np.random.seed(42) # Create feature matrix (excluding protected attribute for distance) n = 100 X = np.random.randn(n, 3) # 3 features # Simulated predictions (some inconsistency) probabilities = 1 / (1 + np.exp(-X[:, 0] - 0.5 * X[:, 1])) # Add noise to create some individual fairness violations probabilities += np.random.randn(n) * 0.1 probabilities = np.clip(probabilities, 0, 1) consistency = compute_individual_fairness_metric(X, probabilities) print(f"Individual Fairness Consistency: {consistency:.3f}")One of the most important theoretical results in ML fairness is the impossibility theorem (Chouldechova, 2017; Kleinberg et al., 2016). It proves that certain fairness criteria cannot be satisfied simultaneously when base rates differ between groups.
The Setup:
Consider a binary classifier and two groups with different base rates:
The impossibility theorem states that you cannot simultaneously achieve:
unless either:
This isn't a limitation of current technology—it's a mathematical fact. No algorithm, no matter how sophisticated, can satisfy all three conditions when base rates differ. This means every fairness intervention involves choosing which fairness criteria to prioritize.
Intuition Behind the Impossibility:
Imagine you have a calibrated risk score. For Group A (10% base rate), a score of 30% is already 3x their average—these are relatively high-risk individuals within their group. For Group B (30% base rate), a score of 30% is exactly average.
Now, if you set a threshold and predict positive for everyone above 30%:
These groups will have different error rate profiles because the same threshold means different things relative to each group's distribution.
To equalize error rates, you'd need different thresholds—but then the scores wouldn't be calibrated anymore (a 30% score would mean different things for different groups).
Practical Implications:
Beyond the fundamental trio of demographic parity, equalized odds, and calibration, researchers have proposed dozens of fairness definitions. Understanding this taxonomy helps you select appropriate criteria for specific applications.
Categorization by Statistical Relationship:
| Category | Definition Name | Mathematical Condition | Intuition |
|---|---|---|---|
| Independence<br/>(Ŷ ⊥ A) | Demographic Parity | P(Ŷ|A=0) = P(Ŷ|A=1) | Equal selection rates |
| Conditional Statistical Parity | P(Ŷ|A=0,L) = P(Ŷ|A=1,L) | Equal rates conditioning on legitimate factors | |
| Separation<br/>(Ŷ ⊥ A | Y) | Equalized Odds | P(Ŷ|Y,A=0) = P(Ŷ|Y,A=1) | Equal TPR and FPR |
| Equal Opportunity | P(Ŷ=1|Y=1,A=0) = P(Ŷ=1|Y=1,A=1) | Equal TPR only | |
| Predictive Equality | P(Ŷ=1|Y=0,A=0) = P(Ŷ=1|Y=0,A=1) | Equal FPR only | |
| Sufficiency<br/>(Y ⊥ A | Ŷ) | Calibration | P(Y|Ŷ=ŷ,A=0) = P(Y|Ŷ=ŷ,A=1) | Scores mean the same thing |
| Predictive Parity | P(Y=1|Ŷ=1,A=0) = P(Y=1|Ŷ=1,A=1) | Equal PPV | |
| Balance for Positive Class | E[R|Y=1,A=0] = E[R|Y=1,A=1) | Equal average score among positives | |
| Counterfactual | Counterfactual Fairness | P(Ŷ_a|A=a) = P(Ŷ_a'|A=a) | Same prediction if A were different |
Interpretation Guide:
Independence-based (Anti-classification):
Separation-based (Error balance):
Sufficiency-based (Calibration):
Counterfactual:
Start with the question: 'What harm are we trying to prevent?' If groups shouldn't have different selection rates → demographic parity. If false accusations disproportionately harm one group → equalized odds. If risk scores inform individual decisions → calibration. The application context determines the appropriate fairness notion.
Most fairness research focuses on binary classification, but many ML applications involve ranking (search results, recommendations) or regression (salary prediction, credit limits). These require extended fairness definitions.
Fairness in Ranking:
For ranked lists (e.g., job candidates, search results), fairness involves exposure and attention allocation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
def normalized_discounted_kl_divergence(ranking, protected_attr, k=10): """ Measure fairness in top-k ranking using normalized discounted KL divergence. Accounts for position bias: higher positions matter more. Args: ranking: Indices of items in ranked order protected_attr: Protected attribute for each item (0 or 1) k: Number of top positions to evaluate Returns: float: NDKL score (0 = perfect fairness) """ import numpy as np protected_attr = np.array(protected_attr) # Target proportion in population target_prop = protected_attr.mean() # Position discounts (logarithmic) discounts = 1 / np.log2(np.arange(2, k + 2)) normalizer = discounts.sum() # Calculate exposure for protected group top_k_indices = ranking[:k] top_k_protected = protected_attr[top_k_indices] # Weighted proportion at each position weighted_exposure = (top_k_protected * discounts).sum() / normalizer # KL divergence from target distribution # Using symmetric version for stability epsilon = 1e-10 kl = weighted_exposure * np.log((weighted_exposure + epsilon) / (target_prop + epsilon)) kl += (1 - weighted_exposure) * np.log((1 - weighted_exposure + epsilon) / (1 - target_prop + epsilon)) return kl def group_exposure_ratio(ranking, protected_attr, position_weights=None): """ Calculate ratio of exposure between groups. Args: ranking: Indices in ranked order protected_attr: Protected attribute for each item position_weights: Weight for each position (default: 1/position) Returns: float: Ratio of group 1 exposure to group 0 exposure """ import numpy as np n = len(ranking) if position_weights is None: position_weights = 1. / np.arange(1, n + 1) protected_attr = np.array(protected_attr) group_0_exposure = 0 group_1_exposure = 0 for pos, item_idx in enumerate(ranking): weight = position_weights[pos] if protected_attr[item_idx] == 0: group_0_exposure += weight else: group_1_exposure += weight # Normalize by group sizes n_group_0 = (protected_attr == 0).sum() n_group_1 = (protected_attr == 1).sum() if n_group_0 > 0: group_0_exposure /= n_group_0 if n_group_1 > 0: group_1_exposure /= n_group_1 if group_0_exposure == 0: return float('inf') return group_1_exposure / group_0_exposure # Example usageif __name__ == "__main__": import numpy as np np.random.seed(42) n = 50 # Protected attributes (40% group 1) protected = np.random.binomial(1, 0.4, n) # Biased ranking: group 0 tends to rank higher scores = np.random.randn(n) - protected * 0.5 ranking = np.argsort(-scores) # Descending ndkl = normalized_discounted_kl_divergence(ranking, protected, k=10) exposure_ratio = group_exposure_ratio(ranking, protected) print(f"Normalized Discounted KL Divergence: {ndkl:.4f}") print(f"Exposure Ratio (Group 1 / Group 0): {exposure_ratio:.3f}") print(f"Fair exposure ratio should be: ~1.0")Fairness in Regression:
For continuous outcomes (salary, loan amount, estimated home value), fairness definitions adapt:
We've traversed the rich landscape of fairness definitions in machine learning. Let's consolidate the key insights:
What's Next:
Now that we understand the formal definitions of fairness, the next page examines protected attributes—the specific characteristics (race, gender, age, disability) around which fairness concerns arise. We'll explore legal frameworks, proxy discrimination, and the complex questions of which attributes deserve protection and how to handle them in ML systems.
You now have a comprehensive understanding of fairness definitions in ML. These formal concepts provide the vocabulary and mathematical tools necessary to analyze, measure, and improve fairness in machine learning systems. The journey continues with protected attributes and the legal dimensions of algorithmic fairness.