ML Interpretability & FairnessBias Detection and Mitigation

Bias Detection and Mitigation

LevelAdvanced

Duration90 mins

TopicBias Detection and Mitigation

5 / 5

Fairness-Accuracy Tradeoffs

The Fundamental Tension

Throughout this module, we've encountered a recurring theme: improving fairness often comes at the cost of predictive accuracy, and vice versa. This is not a failure of technique or a problem to be solved—it's a fundamental property of fair machine learning that emerges from deep mathematical and philosophical considerations.

Understanding this tradeoff is essential for responsible ML practice. It transforms fairness from a checklist item ('did we add the fairness constraint?') into a thoughtful design choice ('what level of accuracy are we willing to sacrifice for what level of fairness, and who decides?').

Learning Objectives

By the end of this page, you will be able to: (1) Explain why fairness-accuracy tradeoffs are mathematically unavoidable, (2) State and interpret key impossibility theorems in ML fairness, (3) Construct and analyze Pareto frontiers for fairness-accuracy tradeoffs, (4) Apply practical strategies for navigating tradeoffs in real applications, (5) Design organizational processes for fairness decisions that acknowledge tradeoffs.

Why Tradeoffs Are Unavoidable:

At its core, a machine learning model is an optimization engine. When we train on historical data to maximize accuracy, we find the model that best captures patterns in that data—including any discriminatory patterns. When we add fairness constraints, we're asking the optimizer to deviate from the accuracy-maximizing solution.

Mathematically, if we denote the unconstrained optimal classifier as $h^*$ and a fairness-constrained classifier as $h_F$:

$$L(h_F) \geq L(h^*)$$

with equality only when $h^*$ already satisfies the fairness constraint. This is simply the nature of constrained optimization—adding constraints cannot improve the objective.

Impossibility Theorems: What Can't Be Done

Several landmark results demonstrate that certain combinations of fairness criteria are mathematically impossible to satisfy simultaneously. These aren't limitations of current algorithms—they're fundamental constraints on what any algorithm can achieve.

Impossibility Theorem 1: Chouldechova (2017)

For a binary classifier with different base rates across groups ($P(Y=1|A=0) \neq P(Y=1|A=1)$), the following three conditions cannot all hold simultaneously:

Calibration: $P(Y=1 | \hat{Y}=1, A=a) = P(Y=1 | \hat{Y}=1)$ for all $a$
Equal False Positive Rates: $P(\hat{Y}=1 | Y=0, A=0) = P(\hat{Y}=1 | Y=0, A=1)$
Equal False Negative Rates: $P(\hat{Y}=0 | Y=1, A=0) = P(\hat{Y}=0 | Y=1, A=1)$

Except in degenerate cases (perfect prediction or equal base rates).

The COMPAS Controversy in Context

The COMPAS recidivism algorithm was criticized for having unequal false positive rates across races. ProPublica argued this was unfair. Northpointe (the vendor) responded that the algorithm was calibrated—equal scores meant equal risk. Both were correct. Chouldechova's theorem shows they were arguing about which fairness criterion should take priority, not about whether the algorithm was implemented correctly.

Impossibility Theorem 2: Kleinberg, Mullainathan & Raghavan (2016)

Building on similar intuitions, this work proves that except when base rates are equal or prediction is perfect:

Calibration within groups
Balance for the positive class (equal TPR)
Balance for the negative class (equal FPR)

cannot all be satisfied. At least one must be violated.

Intuition Behind the Impossibility:

Consider two groups where 50% of Group A and 20% of Group B will re-offend. A calibrated algorithm that predicts 'high risk' for 50% of Group A and 20% of Group B:

Has different selection rates (violates demographic parity)
May have different error rates when the underlying distributions differ

Trying to equalize error rates requires mis-calibration—predicting higher risk than justified for one group or lower for another.

Fairness Criteria Compatibility
Criteria Combination	Compatible?	When Compatible?
Calibration + Equal FPR + Equal FNR	❌ No	Only with equal base rates or perfect prediction
Demographic Parity + Calibration	❌ No	Only with equal base rates
Equalized Odds + Calibration	❌ No	Only with equal base rates or trivial classifier
Equal TPR + Equal FPR	✅ Yes	Achievable (equalized odds)
Demographic Parity + Equal Accuracy	✅ Yes	Often achievable with appropriate thresholds

Implications of Impossibility:

No Universal Fairness: There is no single 'fair' algorithm. Fairness requires choosing which criteria matter most in context.
Normative Decisions Required: Selecting fairness criteria is an ethical and policy choice, not a technical one. Different stakeholders may legitimately disagree.
Perfect Fairness is Impossible: When base rates differ, some unfairness (by some measure) is mathematically inevitable. The goal is to minimize harm, not achieve perfection.
Context Matters: The 'right' fairness criterion depends on the application. Equal opportunity may matter most in hiring; calibration may matter most in medicine.

The Pareto Frontier: Mapping the Tradeoff

The Pareto frontier (or Pareto boundary) is a powerful tool for visualizing and analyzing fairness-accuracy tradeoffs. It represents the set of solutions where you cannot improve one objective without worsening another.

Formal Definition:

A solution $(accuracy, fairness)$ is Pareto optimal if there is no other achievable solution that:

Has higher accuracy with equal or better fairness, OR
Has better fairness with equal or higher accuracy

The Pareto frontier is the set of all Pareto optimal solutions.

Constructing the Frontier:

Define accuracy metric $A(h)$ (e.g., AUC, accuracy, F1)
Define fairness metric $F(h)$ (e.g., demographic parity gap, equalized odds gap)
For various values of fairness constraint $\epsilon$:
- Train model maximizing $A(h)$ subject to $F(h) \leq \epsilon$
- Record $(A(h), F(h))$
Plot the achievable points; the upper boundary is the Pareto frontier

Pareto Frontier Construction
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from typing import List, Tuple, Dict
 
def compute_dp_gap(predictions: np.ndarray, protected: np.ndarray) -> float:
    """Demographic parity gap: |P(Ŷ=1|A=0) - P(Ŷ=1|A=1)|"""
    rate_0 = np.mean(predictions[protected == 0])
    rate_1 = np.mean(predictions[protected == 1])
    return abs(rate_0 - rate_1)
 
def compute_eo_gap(predictions: np.ndarray, 
                   protected: np.ndarray, 
                   labels: np.ndarray) -> float:
    """Equalized odds gap: |TPR_0 - TPR_1| + |FPR_0 - FPR_1|"""
    tpr_0 = np.mean(predictions[(protected == 0) & (labels == 1)])
    tpr_1 = np.mean(predictions[(protected == 1) & (labels == 1)])
    fpr_0 = np.mean(predictions[(protected == 0) & (labels == 0)])
    fpr_1 = np.mean(predictions[(protected == 1) & (labels == 0)])
    return abs(tpr_0 - tpr_1) + abs(fpr_0 - fpr_1)
 
 
class ThresholdSearcher:
    """Search over group-specific thresholds to map Pareto frontier."""
    
    def __init__(self, n_thresholds: int = 20):
        self.n_thresholds = n_thresholds
        
    def compute_pareto_frontier(self,
                                 scores: np.ndarray,
                                 labels: np.ndarray,
                                 protected: np.ndarray,
                                 fairness_metric: str = 'dp') -> List[Dict]:
        """
        Compute achievable (accuracy, fairness) points by varying thresholds.
        
        Returns list of dicts with accuracy, fairness_gap, thresholds.
        """
        thresholds = np.linspace(0.05, 0.95, self.n_thresholds)
        results = []
        
        # Try all combinations of group-specific thresholds
        for t0 in thresholds:
            for t1 in thresholds:
                # Apply group-specific thresholds
                predictions = np.zeros(len(scores))
                predictions[(protected == 0) & (scores >= t0)] = 1
                predictions[(protected == 1) & (scores >= t1)] = 1
                
                # Compute metrics
                acc = accuracy_score(labels, predictions)
                
                if fairness_metric == 'dp':
                    gap = compute_dp_gap(predictions, protected)
                else:
                    gap = compute_eo_gap(predictions, protected, labels)
                
                results.append({
                    'accuracy': acc,
                    'fairness_gap': gap,
                    'threshold_0': t0,
                    'threshold_1': t1
                })
        
        return results
    
    def extract_pareto_optimal(self, 
                                results: List[Dict]) -> List[Dict]:
        """Extract Pareto optimal points (maximize accuracy, minimize gap)."""
        pareto = []
        
        for point in results:
            dominated = False
            for other in results:
                # Check if 'other' dominates 'point'
                if (other['accuracy'] > point['accuracy'] and 
                    other['fairness_gap'] <= point['fairness_gap']):
                    dominated = True
                    break
                if (other['accuracy'] >= point['accuracy'] and 
                    other['fairness_gap'] < point['fairness_gap']):
                    dominated = True
                    break
            
            if not dominated:
                pareto.append(point)
        
        # Sort by accuracy
        pareto.sort(key=lambda x: x['accuracy'])
        return pareto
 
 
def plot_pareto_frontier(results: List[Dict], 
                         pareto_points: List[Dict],
                         title: str = "Fairness-Accuracy Pareto Frontier"):
    """Visualize the Pareto frontier."""
    
    # All points
    all_acc = [r['accuracy'] for r in results]
    all_gap = [r['fairness_gap'] for r in results]
    
    # Pareto points
    pareto_acc = [p['accuracy'] for p in pareto_points]
    pareto_gap = [p['fairness_gap'] for p in pareto_points]
    
    plt.figure(figsize=(10, 6))
    plt.scatter(all_gap, all_acc, alpha=0.3, label='All achievable points')
    plt.plot(pareto_gap, pareto_acc, 'r-o', markersize=8, 
             linewidth=2, label='Pareto frontier')
    
    plt.xlabel('Fairness Gap (lower is fairer)', fontsize=12)
    plt.ylabel('Accuracy (higher is better)', fontsize=12)
    plt.title(title, fontsize=14)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    
    return plt
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 5000
    
    # Generate biased data
    protected = np.random.binomial(1, 0.4, n)
    X = np.random.randn(n, 3)
    X[:, 0] += 0.5 * protected  # Feature correlates with protected
    
    # Biased labels
    logits = X[:, 0] + X[:, 1] + 0.3 * protected
    labels = (logits + np.random.randn(n) * 0.3 > 0.5).astype(int)
    
    # Train model
    model = LogisticRegression()
    model.fit(X, labels)
    scores = model.predict_proba(X)[:, 1]
    
    # Compute Pareto frontier
    searcher = ThresholdSearcher(n_thresholds=30)
    results = searcher.compute_pareto_frontier(scores, labels, protected, 'dp')
    pareto = searcher.extract_pareto_optimal(results)
    
    print(f"Found {len(pareto)} Pareto optimal points:")
    for i, p in enumerate(pareto[::len(pareto)//5 + 1]):  # Sample some
        print(f"  {i}: Acc={p['accuracy']:.3f}, Gap={p['fairness_gap']:.3f}")
    
    # Find some key points
    most_accurate = max(pareto, key=lambda x: x['accuracy'])
    most_fair = min(pareto, key=lambda x: x['fairness_gap'])
    
    print(f"\nMost accurate: Acc={most_accurate['accuracy']:.3f}, "
          f"Gap={most_accurate['fairness_gap']:.3f}")
    print(f"Most fair: Acc={most_fair['accuracy']:.3f}, "
          f"Gap={most_fair['fairness_gap']:.3f}")

Reading the Pareto Frontier

The shape of the Pareto frontier tells you about the tradeoff: (1) A steep frontier means small fairness improvements require large accuracy sacrifices, (2) A flat frontier means fairness can be improved 'cheaply', (3) Points far from the frontier are inefficient—you can do better on both dimensions.

Key Properties of the Pareto Frontier:

Monotonicity: The frontier is generally monotonic—more fairness costs accuracy (or at least doesn't improve it).
Context Dependence: Different datasets and models produce different frontiers. The 'cost' of fairness varies.
Beyond the Frontier: Points inside the frontier (dominated points) represent suboptimal choices—you could do better on both dimensions.
Multi-Objective View: With multiple fairness criteria, the frontier becomes a surface in higher dimensions.

Using the Frontier for Decision-Making:

The Pareto frontier makes tradeoffs explicit. Rather than asking 'is this model fair?' (binary), ask:

'What accuracy can we achieve at various fairness levels?'
'How much accuracy are we sacrificing for this level of fairness?'
'Is the marginal accuracy cost of additional fairness acceptable?'

Quantifying the Cost of Fairness

A natural question arises: How much does fairness actually cost? This can be measured as the 'price of fairness' (PoF)—the accuracy loss incurred by imposing fairness constraints.

Formal Definition:

Let $h^$ be the unconstrained optimal classifier and $h^_F$ be the optimal classifier satisfying fairness constraint $F$:

$$\text{Price of Fairness} = L(h^_F) - L(h^)$$

or as a ratio: $$\text{PoF Ratio} = \frac{L(h^_F)}{L(h^)}$$

Empirical Findings on the Price of Fairness:

Research has found that the price of fairness varies significantly:

Often Moderate: Many studies find accuracy drops of 1-5% for significant fairness improvements. Fairness isn't always expensive.
Depends on Base Rate Gap: When group base rates are similar, fairness is cheap. When they differ dramatically, it's expensive.
Depends on Feature Correlation: When features are highly correlated with protected attributes, removing discrimination is costlier.
Model Complexity Matters: More flexible models (e.g., neural networks) can sometimes achieve both high accuracy and fairness, while simpler models face starker tradeoffs.
Marginal Cost Increases: The first fairness improvements are often cheap; approaching perfect fairness becomes increasingly expensive.

Price of Fairness in Various Studies (Illustrative)
Study/Dataset	Fairness Criterion	Accuracy Drop	Context
Adult Income (Census)	Demographic Parity	2-4%	Income prediction with gender as protected
COMPAS Recidivism	Equalized Odds	3-6%	Recidivism prediction with race as protected
Credit Default	Equal Opportunity	1-3%	Credit risk with age/gender as protected
Hiring Simulation	Demographic Parity	5-10%	Synthetic hiring with strong historical bias
Medical Diagnosis	Calibration by Group	<1%	When calibration was already near-fair

The Cost is Context-Dependent

The price of fairness is not fixed—it depends on the data, the model, the fairness criterion, and how 'tight' the constraint is. Always compute the Pareto frontier for your specific application rather than relying on general estimates.

When is the Price Low?

Near-fair data: When historical data is already approximately fair, constraints cost little.
Redundant features: If protected attributes are encoded in multiple features, removing one pathway may not hurt predictions much.
Suboptimal baseline: If the unconstrained model isn't fully optimized, imposing fairness + better optimization may improve both.
Constraint slack: When the fairness constraint isn't binding (already satisfied), there's no cost.

When is the Price High?

Large base rate gaps: Forcing equal predictions when groups genuinely differ sacrifices accuracy.
Protected attribute is predictive: When the protected attribute directly predicts the outcome (e.g., age in medical contexts), hiding it loses information.
Limited features: With few features, each carries more predictive weight, making it costlier to ignore correlations.
Tight constraints: Demanding perfect fairness (ε=0) is more expensive than approximate fairness.

Strategies for Navigating Tradeoffs

Given that tradeoffs are unavoidable, how should practitioners navigate them? Here are principled strategies.

Strategy 1: Stakeholder-Driven Constraint Selection

Different stakeholders have different fairness priorities:

Affected communities: May prioritize equal treatment or equal outcomes
Legal/compliance: May require specific non-discrimination criteria
Business: May balance fairness with operational needs
Regulators: May mandate specific metrics or thresholds

Approach: Engage stakeholders early to determine which fairness criteria matter most. Let this drive metric selection rather than choosing post-hoc.

Practical Navigation Strategies

•Compute the Full Frontier: Before committing to a point, map the entire Pareto frontier. Understand what's achievable.
•Identify Acceptable Regions: Based on stakeholder input, define minimum acceptable accuracy and maximum acceptable fairness gap.
•Optimistic Start: Begin with the hope that both can be satisfied. Often, significant fairness improvements have modest accuracy costs.
•Sensitivity Analysis: Test how sensitive fairness is to small accuracy changes. Sometimes small tweaks achieve large fairness gains.
•Worst-Case Focus: Consider focusing on the worst-off group. Rawlsian fairness suggests maximizing the minimum group's outcomes.
•Iterative Refinement: Start with a solution, deploy, monitor, and adjust based on real-world impact.

Strategy 2: Multi-Objective Optimization

Rather than treating fairness as a hard constraint, optimize a weighted combination:

$$L_{total} = \alpha \cdot L_{accuracy} + (1-\alpha) \cdot L_{fairness}$$

Varying $\alpha$ traces out the Pareto frontier. This approach:

Makes the tradeoff explicit
Allows continuous rather than binary fairness levels
Enables finding 'sweet spots' that balance both objectives

Strategy 3: Fairness as Constraint with Slack

Set fairness constraints with slack variables that are penalized but not hard:

$$\min L_{accuracy} + \lambda \cdot \max(0, \text{FairnessGap} - \epsilon)$$

This allows small violations when they dramatically improve accuracy, while still incentivizing fairness.

The 'Free Fairness' Region

Often, there's a region where fairness improvements are nearly 'free'—small accuracy costs for significant fairness gains. This is typically where the Pareto frontier is nearly flat. Exploiting this region provides the best value for fairness investments.

Beyond the Binary: Rethinking the Tradeoff

The framing of 'fairness vs. accuracy' may itself be problematic. Several perspectives suggest the tradeoff is more nuanced than it appears.

Perspective 1: Fairness IS Accuracy (for Subgroups)

Traditional accuracy averages over the entire population, potentially masking poor performance for minority groups. If we define accuracy as 'minimum subgroup accuracy,' then improving fairness (equalizing group performance) directly improves this alternative accuracy measure.

$$\text{Worst-Group Accuracy} = \min_a \text{Accuracy}(h; A=a)$$

Optimizing worst-group accuracy explicitly connects fairness and accuracy.

Perspective 2: Long-Term vs. Short-Term Accuracy

An unfair model might have higher short-term accuracy but:

Feedback loops: Biased predictions create biased future data, reducing long-term accuracy
Population shift: Alienated subgroups may leave the population, changing the distribution
Strategic behavior: Individuals may game unfair systems, undermining their validity

Considering temporal dynamics may reveal that fair models have better long-term accuracy.

Perspective 3: The Right Metric Might Not Show a Tradeoff

Accuracy metrics are choices. If we measure the right thing, there may be no tradeoff:

Instead of predicting arrests (biased), predict actual criminal behavior (unbiased)
Instead of predicting healthcare costs (reflects access), predict health needs

The tradeoff often reflects measuring proxies rather than true outcomes.

Philosophical Reframe

Perhaps the question isn't 'How much accuracy should we sacrifice for fairness?' but 'What are we actually trying to predict, and for whom?' Reframing the problem often reveals that the perceived tradeoff was an artifact of flawed problem formulation.

Perspective 4: Costs of Unfairness

The 'cost' of fairness is only half the equation. What's the cost of unfairness?

Legal Risk: Discrimination lawsuits, regulatory penalties
Reputational Damage: Public backlash, loss of trust
Market Exclusion: Underserved groups represent lost opportunity
Human Harm: Real impacts on real people's lives

A full accounting includes both the accuracy cost of fairness AND the business/ethical cost of unfairness. Often, the latter dwarfs the former.

Organizational Processes for Fairness Decisions

Fairness tradeoffs cannot—and should not—be resolved by individual engineers. They require organizational processes that involve appropriate stakeholders and create accountability.

Key Principles:

Tradeoffs are Policy Decisions: Choosing the operating point on the Pareto frontier is a policy choice, not a technical one. It should involve leadership, legal, ethics, and affected communities.
Document and Justify: Every choice should be documented with explicit justification for why this point was chosen over alternatives.
Create Accountability: Someone (a role, not just a person) should be responsible for fairness outcomes and have authority to require changes.
Enable Review: Fairness decisions should be reviewable and reversible based on new information or changing values.

Recommended Process Elements

•Fairness Review Board: Cross-functional team reviewing ML systems for fairness. Includes legal, ethics, domain experts, and community representatives.
•Fairness Impact Assessment: Document required before deployment: What fairness risks exist? What metrics are tracked? What's the operating point and why?
•Model Cards: Standardized documentation including fairness evaluations, known limitations, and intended use cases (Mitchell et al., 2019).
•Continuous Monitoring: Post-deployment tracking of fairness metrics with alerts for drift or degradation.
•Escalation Procedures: Clear paths for raising fairness concerns, with protection from retaliation.
•Regular Audits: Periodic third-party or internal audits of deployed systems against fairness standards.

Example: Fairness Decision Framework

A structured process for choosing an operating point:

Define Stakeholders: Who is affected? Who has authority? Who has relevant expertise?
Map the Frontier: Compute achievable (accuracy, fairness) points for relevant fairness criteria.
Identify Constraints: Are there hard legal or policy constraints? What's the minimum acceptable accuracy?
Present Options: Show stakeholders 3-5 representative points on the frontier with concrete implications.
Deliberate: Allow discussion of values, priorities, and downstream impacts.
Decide and Document: Record the chosen point, rationale, dissenting views, and conditions for revisiting.
Monitor and Revisit: Track performance and revisit the decision periodically or when conditions change.

Avoiding 'Ethics Washing'

Organizational processes can become rubber stamps that create an illusion of ethical oversight without genuine accountability. Effective processes require: (1) Real authority to stop or change projects, (2) Diversity of perspectives including those from affected communities, (3) Transparency about tradeoffs made, (4) Consequences for violations.

Future Directions and Open Questions

The study of fairness-accuracy tradeoffs is an active research area with many open questions.

Open Technical Questions:

Tighter Characterization: Can we better characterize when tradeoffs are severe vs. mild? What data properties predict the 'price of fairness'?
Beyond Binary: Most theory considers binary protected attributes and binary outcomes. How do results extend to multi-class, multi-group, and continuous settings?
Causal Approaches: Can causal modeling help distinguish 'legitimate' from 'illegitimate' correlations, reducing the apparent tradeoff?
Dynamic Settings: How do tradeoffs evolve in online learning settings with feedback loops and distribution shift?

Open Normative Questions:

Who Decides? What's the right process for determining fairness criteria and acceptable tradeoffs? How do we include affected communities meaningfully?
Intersectionality: How should we handle fairness for intersectional identities (e.g., Black women)? Optimizing for each attribute separately may not help intersections.
Individual vs. Group: When are group fairness criteria appropriate vs. individual fairness? How do we reconcile them?
Across Applications: Should we have domain-specific fairness standards (e.g., stricter for criminal justice than advertising)?

Promising Research Directions

•Causal fairness with more nuanced notions of discrimination
•Individual fairness with learned metric spaces
•Multi-stakeholder fairness with game-theoretic models
•Fairness in foundation models and LLMs
•Robustness of fairness under distribution shift

Persistent Challenges

•Lack of consensus on fairness definitions
•Difficulty measuring fairness with limited data
•Tension between privacy and fairness auditing
•Gaming and strategic manipulation
•Scale and computational constraints

Summary: Fairness-Accuracy Tradeoffs

Key Takeaways

•Tradeoffs are fundamental, arising from the mathematics of constrained optimization. Adding fairness constraints cannot improve accuracy.
•Impossibility theorems prove that certain combinations of fairness criteria cannot be simultaneously satisfied when base rates differ.
•The Pareto frontier maps achievable (accuracy, fairness) combinations, making tradeoffs explicit and enabling informed decisions.
•The price of fairness varies widely depending on data properties, model complexity, and fairness criteria—often it's moderate (1-5% accuracy).
•Multiple strategies exist for navigating tradeoffs: stakeholder engagement, multi-objective optimization, sensitivity analysis, and iterative refinement.
•Reframing the problem (what we're measuring, long-term vs. short-term) sometimes dissolves the apparent tradeoff.
•Organizational processes are essential—fairness decisions are policy decisions requiring appropriate stakeholders and accountability.
•The field is evolving with open questions in both technical (causal, multi-group) and normative (who decides, how) dimensions.

Module Complete:

You have now completed Module 5: Bias Detection and Mitigation. You understand where bias originates (bias sources), how to intervene before training (pre-processing), during training (in-processing), and after training (post-processing), as well as the fundamental tradeoffs that govern fair machine learning.

This knowledge equips you to build ML systems that are not just accurate, but fair—systems that work for everyone, not just the majority.

Module Complete

Congratulations on completing this comprehensive module on bias detection and mitigation! You now have the theoretical foundations and practical tools to build fairer ML systems. Remember: fairness is not a one-time fix but an ongoing commitment that requires continuous attention, measurement, and improvement.

5 / 5

Loading learning content...

ML Interpretability & FairnessBias Detection and Mitigation

Bias Detection and Mitigation

LevelAdvanced

Duration90 mins

TopicBias Detection and Mitigation

5 / 5

Fairness-Accuracy Tradeoffs

The Fundamental Tension

Learning Objectives

Why Tradeoffs Are Unavoidable:

Mathematically, if we denote the unconstrained optimal classifier as $h^*$ and a fairness-constrained classifier as $h_F$:

$$L(h_F) \geq L(h^*)$$

with equality only when $h^*$ already satisfies the fairness constraint. This is simply the nature of constrained optimization—adding constraints cannot improve the objective.

Impossibility Theorems: What Can't Be Done

Impossibility Theorem 1: Chouldechova (2017)

For a binary classifier with different base rates across groups ($P(Y=1|A=0) \neq P(Y=1|A=1)$), the following three conditions cannot all hold simultaneously:

Calibration: $P(Y=1 | \hat{Y}=1, A=a) = P(Y=1 | \hat{Y}=1)$ for all $a$
Equal False Positive Rates: $P(\hat{Y}=1 | Y=0, A=0) = P(\hat{Y}=1 | Y=0, A=1)$
Equal False Negative Rates: $P(\hat{Y}=0 | Y=1, A=0) = P(\hat{Y}=0 | Y=1, A=1)$

Except in degenerate cases (perfect prediction or equal base rates).

The COMPAS Controversy in Context

Impossibility Theorem 2: Kleinberg, Mullainathan & Raghavan (2016)

Building on similar intuitions, this work proves that except when base rates are equal or prediction is perfect:

Calibration within groups
Balance for the positive class (equal TPR)
Balance for the negative class (equal FPR)

cannot all be satisfied. At least one must be violated.

Intuition Behind the Impossibility:

Consider two groups where 50% of Group A and 20% of Group B will re-offend. A calibrated algorithm that predicts 'high risk' for 50% of Group A and 20% of Group B:

Has different selection rates (violates demographic parity)
May have different error rates when the underlying distributions differ

Trying to equalize error rates requires mis-calibration—predicting higher risk than justified for one group or lower for another.

Fairness Criteria Compatibility
Criteria Combination	Compatible?	When Compatible?
Calibration + Equal FPR + Equal FNR	❌ No	Only with equal base rates or perfect prediction
Demographic Parity + Calibration	❌ No	Only with equal base rates
Equalized Odds + Calibration	❌ No	Only with equal base rates or trivial classifier
Equal TPR + Equal FPR	✅ Yes	Achievable (equalized odds)
Demographic Parity + Equal Accuracy	✅ Yes	Often achievable with appropriate thresholds

Implications of Impossibility:

No Universal Fairness: There is no single 'fair' algorithm. Fairness requires choosing which criteria matter most in context.
Normative Decisions Required: Selecting fairness criteria is an ethical and policy choice, not a technical one. Different stakeholders may legitimately disagree.
Perfect Fairness is Impossible: When base rates differ, some unfairness (by some measure) is mathematically inevitable. The goal is to minimize harm, not achieve perfection.
Context Matters: The 'right' fairness criterion depends on the application. Equal opportunity may matter most in hiring; calibration may matter most in medicine.

The Pareto Frontier: Mapping the Tradeoff

Formal Definition:

A solution $(accuracy, fairness)$ is Pareto optimal if there is no other achievable solution that:

Has higher accuracy with equal or better fairness, OR
Has better fairness with equal or higher accuracy

The Pareto frontier is the set of all Pareto optimal solutions.

Constructing the Frontier:

Define accuracy metric $A(h)$ (e.g., AUC, accuracy, F1)
Define fairness metric $F(h)$ (e.g., demographic parity gap, equalized odds gap)
For various values of fairness constraint $\epsilon$:
- Train model maximizing $A(h)$ subject to $F(h) \leq \epsilon$
- Record $(A(h), F(h))$
Plot the achievable points; the upper boundary is the Pareto frontier

Pareto Frontier Construction
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from typing import List, Tuple, Dict
 
def compute_dp_gap(predictions: np.ndarray, protected: np.ndarray) -> float:
    """Demographic parity gap: |P(Ŷ=1|A=0) - P(Ŷ=1|A=1)|"""
    rate_0 = np.mean(predictions[protected == 0])
    rate_1 = np.mean(predictions[protected == 1])
    return abs(rate_0 - rate_1)
 
def compute_eo_gap(predictions: np.ndarray, 
                   protected: np.ndarray, 
                   labels: np.ndarray) -> float:
    """Equalized odds gap: |TPR_0 - TPR_1| + |FPR_0 - FPR_1|"""
    tpr_0 = np.mean(predictions[(protected == 0) & (labels == 1)])
    tpr_1 = np.mean(predictions[(protected == 1) & (labels == 1)])
    fpr_0 = np.mean(predictions[(protected == 0) & (labels == 0)])
    fpr_1 = np.mean(predictions[(protected == 1) & (labels == 0)])
    return abs(tpr_0 - tpr_1) + abs(fpr_0 - fpr_1)
 
 
class ThresholdSearcher:
    """Search over group-specific thresholds to map Pareto frontier."""
    
    def __init__(self, n_thresholds: int = 20):
        self.n_thresholds = n_thresholds
        
    def compute_pareto_frontier(self,
                                 scores: np.ndarray,
                                 labels: np.ndarray,
                                 protected: np.ndarray,
                                 fairness_metric: str = 'dp') -> List[Dict]:
        """
        Compute achievable (accuracy, fairness) points by varying thresholds.
        
        Returns list of dicts with accuracy, fairness_gap, thresholds.
        """
        thresholds = np.linspace(0.05, 0.95, self.n_thresholds)
        results = []
        
        # Try all combinations of group-specific thresholds
        for t0 in thresholds:
            for t1 in thresholds:
                # Apply group-specific thresholds
                predictions = np.zeros(len(scores))
                predictions[(protected == 0) & (scores >= t0)] = 1
                predictions[(protected == 1) & (scores >= t1)] = 1
                
                # Compute metrics
                acc = accuracy_score(labels, predictions)
                
                if fairness_metric == 'dp':
                    gap = compute_dp_gap(predictions, protected)
                else:
                    gap = compute_eo_gap(predictions, protected, labels)
                
                results.append({
                    'accuracy': acc,
                    'fairness_gap': gap,
                    'threshold_0': t0,
                    'threshold_1': t1
                })
        
        return results
    
    def extract_pareto_optimal(self, 
                                results: List[Dict]) -> List[Dict]:
        """Extract Pareto optimal points (maximize accuracy, minimize gap)."""
        pareto = []
        
        for point in results:
            dominated = False
            for other in results:
                # Check if 'other' dominates 'point'
                if (other['accuracy'] > point['accuracy'] and 
                    other['fairness_gap'] <= point['fairness_gap']):
                    dominated = True
                    break
                if (other['accuracy'] >= point['accuracy'] and 
                    other['fairness_gap'] < point['fairness_gap']):
                    dominated = True
                    break
            
            if not dominated:
                pareto.append(point)
        
        # Sort by accuracy
        pareto.sort(key=lambda x: x['accuracy'])
        return pareto
 
 
def plot_pareto_frontier(results: List[Dict], 
                         pareto_points: List[Dict],
                         title: str = "Fairness-Accuracy Pareto Frontier"):
    """Visualize the Pareto frontier."""
    
    # All points
    all_acc = [r['accuracy'] for r in results]
    all_gap = [r['fairness_gap'] for r in results]
    
    # Pareto points
    pareto_acc = [p['accuracy'] for p in pareto_points]
    pareto_gap = [p['fairness_gap'] for p in pareto_points]
    
    plt.figure(figsize=(10, 6))
    plt.scatter(all_gap, all_acc, alpha=0.3, label='All achievable points')
    plt.plot(pareto_gap, pareto_acc, 'r-o', markersize=8, 
             linewidth=2, label='Pareto frontier')
    
    plt.xlabel('Fairness Gap (lower is fairer)', fontsize=12)
    plt.ylabel('Accuracy (higher is better)', fontsize=12)
    plt.title(title, fontsize=14)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    
    return plt
 
 
# Demonstration
if __name__ == "__main__":
    np.random.seed(42)
    n = 5000
    
    # Generate biased data
    protected = np.random.binomial(1, 0.4, n)
    X = np.random.randn(n, 3)
    X[:, 0] += 0.5 * protected  # Feature correlates with protected
    
    # Biased labels
    logits = X[:, 0] + X[:, 1] + 0.3 * protected
    labels = (logits + np.random.randn(n) * 0.3 > 0.5).astype(int)
    
    # Train model
    model = LogisticRegression()
    model.fit(X, labels)
    scores = model.predict_proba(X)[:, 1]
    
    # Compute Pareto frontier
    searcher = ThresholdSearcher(n_thresholds=30)
    results = searcher.compute_pareto_frontier(scores, labels, protected, 'dp')
    pareto = searcher.extract_pareto_optimal(results)
    
    print(f"Found {len(pareto)} Pareto optimal points:")
    for i, p in enumerate(pareto[::len(pareto)//5 + 1]):  # Sample some
        print(f"  {i}: Acc={p['accuracy']:.3f}, Gap={p['fairness_gap']:.3f}")
    
    # Find some key points
    most_accurate = max(pareto, key=lambda x: x['accuracy'])
    most_fair = min(pareto, key=lambda x: x['fairness_gap'])
    
    print(f"\nMost accurate: Acc={most_accurate['accuracy']:.3f}, "
          f"Gap={most_accurate['fairness_gap']:.3f}")
    print(f"Most fair: Acc={most_fair['accuracy']:.3f}, "
          f"Gap={most_fair['fairness_gap']:.3f}")

Reading the Pareto Frontier

Key Properties of the Pareto Frontier:

Monotonicity: The frontier is generally monotonic—more fairness costs accuracy (or at least doesn't improve it).
Context Dependence: Different datasets and models produce different frontiers. The 'cost' of fairness varies.
Beyond the Frontier: Points inside the frontier (dominated points) represent suboptimal choices—you could do better on both dimensions.
Multi-Objective View: With multiple fairness criteria, the frontier becomes a surface in higher dimensions.

Using the Frontier for Decision-Making:

The Pareto frontier makes tradeoffs explicit. Rather than asking 'is this model fair?' (binary), ask:

'What accuracy can we achieve at various fairness levels?'
'How much accuracy are we sacrificing for this level of fairness?'
'Is the marginal accuracy cost of additional fairness acceptable?'

Quantifying the Cost of Fairness

A natural question arises: How much does fairness actually cost? This can be measured as the 'price of fairness' (PoF)—the accuracy loss incurred by imposing fairness constraints.

Formal Definition:

Let $h^$ be the unconstrained optimal classifier and $h^_F$ be the optimal classifier satisfying fairness constraint $F$:

$$\text{Price of Fairness} = L(h^_F) - L(h^)$$

or as a ratio: $$\text{PoF Ratio} = \frac{L(h^_F)}{L(h^)}$$

Empirical Findings on the Price of Fairness:

Research has found that the price of fairness varies significantly:

Often Moderate: Many studies find accuracy drops of 1-5% for significant fairness improvements. Fairness isn't always expensive.
Depends on Base Rate Gap: When group base rates are similar, fairness is cheap. When they differ dramatically, it's expensive.
Depends on Feature Correlation: When features are highly correlated with protected attributes, removing discrimination is costlier.
Model Complexity Matters: More flexible models (e.g., neural networks) can sometimes achieve both high accuracy and fairness, while simpler models face starker tradeoffs.
Marginal Cost Increases: The first fairness improvements are often cheap; approaching perfect fairness becomes increasingly expensive.

Price of Fairness in Various Studies (Illustrative)
Study/Dataset	Fairness Criterion	Accuracy Drop	Context
Adult Income (Census)	Demographic Parity	2-4%	Income prediction with gender as protected
COMPAS Recidivism	Equalized Odds	3-6%	Recidivism prediction with race as protected
Credit Default	Equal Opportunity	1-3%	Credit risk with age/gender as protected
Hiring Simulation	Demographic Parity	5-10%	Synthetic hiring with strong historical bias
Medical Diagnosis	Calibration by Group	<1%	When calibration was already near-fair

The Cost is Context-Dependent

When is the Price Low?

Near-fair data: When historical data is already approximately fair, constraints cost little.
Redundant features: If protected attributes are encoded in multiple features, removing one pathway may not hurt predictions much.
Suboptimal baseline: If the unconstrained model isn't fully optimized, imposing fairness + better optimization may improve both.
Constraint slack: When the fairness constraint isn't binding (already satisfied), there's no cost.

When is the Price High?

Large base rate gaps: Forcing equal predictions when groups genuinely differ sacrifices accuracy.
Protected attribute is predictive: When the protected attribute directly predicts the outcome (e.g., age in medical contexts), hiding it loses information.
Limited features: With few features, each carries more predictive weight, making it costlier to ignore correlations.
Tight constraints: Demanding perfect fairness (ε=0) is more expensive than approximate fairness.

Strategies for Navigating Tradeoffs

Given that tradeoffs are unavoidable, how should practitioners navigate them? Here are principled strategies.

Strategy 1: Stakeholder-Driven Constraint Selection

Different stakeholders have different fairness priorities:

Affected communities: May prioritize equal treatment or equal outcomes
Legal/compliance: May require specific non-discrimination criteria
Business: May balance fairness with operational needs
Regulators: May mandate specific metrics or thresholds

Approach: Engage stakeholders early to determine which fairness criteria matter most. Let this drive metric selection rather than choosing post-hoc.

Practical Navigation Strategies

•Compute the Full Frontier: Before committing to a point, map the entire Pareto frontier. Understand what's achievable.
•Identify Acceptable Regions: Based on stakeholder input, define minimum acceptable accuracy and maximum acceptable fairness gap.
•Optimistic Start: Begin with the hope that both can be satisfied. Often, significant fairness improvements have modest accuracy costs.
•Sensitivity Analysis: Test how sensitive fairness is to small accuracy changes. Sometimes small tweaks achieve large fairness gains.
•Worst-Case Focus: Consider focusing on the worst-off group. Rawlsian fairness suggests maximizing the minimum group's outcomes.
•Iterative Refinement: Start with a solution, deploy, monitor, and adjust based on real-world impact.

Strategy 2: Multi-Objective Optimization

Rather than treating fairness as a hard constraint, optimize a weighted combination:

$$L_{total} = \alpha \cdot L_{accuracy} + (1-\alpha) \cdot L_{fairness}$$

Varying $\alpha$ traces out the Pareto frontier. This approach:

Makes the tradeoff explicit
Allows continuous rather than binary fairness levels
Enables finding 'sweet spots' that balance both objectives

Strategy 3: Fairness as Constraint with Slack

Set fairness constraints with slack variables that are penalized but not hard:

$$\min L_{accuracy} + \lambda \cdot \max(0, \text{FairnessGap} - \epsilon)$$

This allows small violations when they dramatically improve accuracy, while still incentivizing fairness.

The 'Free Fairness' Region

Beyond the Binary: Rethinking the Tradeoff

The framing of 'fairness vs. accuracy' may itself be problematic. Several perspectives suggest the tradeoff is more nuanced than it appears.

Perspective 1: Fairness IS Accuracy (for Subgroups)

$$\text{Worst-Group Accuracy} = \min_a \text{Accuracy}(h; A=a)$$

Optimizing worst-group accuracy explicitly connects fairness and accuracy.

Perspective 2: Long-Term vs. Short-Term Accuracy

An unfair model might have higher short-term accuracy but:

Feedback loops: Biased predictions create biased future data, reducing long-term accuracy
Population shift: Alienated subgroups may leave the population, changing the distribution
Strategic behavior: Individuals may game unfair systems, undermining their validity

Considering temporal dynamics may reveal that fair models have better long-term accuracy.

Perspective 3: The Right Metric Might Not Show a Tradeoff

Accuracy metrics are choices. If we measure the right thing, there may be no tradeoff:

Instead of predicting arrests (biased), predict actual criminal behavior (unbiased)
Instead of predicting healthcare costs (reflects access), predict health needs

The tradeoff often reflects measuring proxies rather than true outcomes.

Philosophical Reframe

Perspective 4: Costs of Unfairness

The 'cost' of fairness is only half the equation. What's the cost of unfairness?

Legal Risk: Discrimination lawsuits, regulatory penalties
Reputational Damage: Public backlash, loss of trust
Market Exclusion: Underserved groups represent lost opportunity
Human Harm: Real impacts on real people's lives

A full accounting includes both the accuracy cost of fairness AND the business/ethical cost of unfairness. Often, the latter dwarfs the former.

Organizational Processes for Fairness Decisions

Fairness tradeoffs cannot—and should not—be resolved by individual engineers. They require organizational processes that involve appropriate stakeholders and create accountability.

Key Principles:

Tradeoffs are Policy Decisions: Choosing the operating point on the Pareto frontier is a policy choice, not a technical one. It should involve leadership, legal, ethics, and affected communities.
Document and Justify: Every choice should be documented with explicit justification for why this point was chosen over alternatives.
Create Accountability: Someone (a role, not just a person) should be responsible for fairness outcomes and have authority to require changes.
Enable Review: Fairness decisions should be reviewable and reversible based on new information or changing values.

Recommended Process Elements

•Fairness Review Board: Cross-functional team reviewing ML systems for fairness. Includes legal, ethics, domain experts, and community representatives.
•Fairness Impact Assessment: Document required before deployment: What fairness risks exist? What metrics are tracked? What's the operating point and why?
•Model Cards: Standardized documentation including fairness evaluations, known limitations, and intended use cases (Mitchell et al., 2019).
•Continuous Monitoring: Post-deployment tracking of fairness metrics with alerts for drift or degradation.
•Escalation Procedures: Clear paths for raising fairness concerns, with protection from retaliation.
•Regular Audits: Periodic third-party or internal audits of deployed systems against fairness standards.

Example: Fairness Decision Framework

A structured process for choosing an operating point:

Define Stakeholders: Who is affected? Who has authority? Who has relevant expertise?
Map the Frontier: Compute achievable (accuracy, fairness) points for relevant fairness criteria.
Identify Constraints: Are there hard legal or policy constraints? What's the minimum acceptable accuracy?
Present Options: Show stakeholders 3-5 representative points on the frontier with concrete implications.
Deliberate: Allow discussion of values, priorities, and downstream impacts.
Decide and Document: Record the chosen point, rationale, dissenting views, and conditions for revisiting.
Monitor and Revisit: Track performance and revisit the decision periodically or when conditions change.

Avoiding 'Ethics Washing'

Future Directions and Open Questions

The study of fairness-accuracy tradeoffs is an active research area with many open questions.

Open Technical Questions:

Tighter Characterization: Can we better characterize when tradeoffs are severe vs. mild? What data properties predict the 'price of fairness'?
Beyond Binary: Most theory considers binary protected attributes and binary outcomes. How do results extend to multi-class, multi-group, and continuous settings?
Causal Approaches: Can causal modeling help distinguish 'legitimate' from 'illegitimate' correlations, reducing the apparent tradeoff?
Dynamic Settings: How do tradeoffs evolve in online learning settings with feedback loops and distribution shift?

Open Normative Questions:

Who Decides? What's the right process for determining fairness criteria and acceptable tradeoffs? How do we include affected communities meaningfully?
Intersectionality: How should we handle fairness for intersectional identities (e.g., Black women)? Optimizing for each attribute separately may not help intersections.
Individual vs. Group: When are group fairness criteria appropriate vs. individual fairness? How do we reconcile them?
Across Applications: Should we have domain-specific fairness standards (e.g., stricter for criminal justice than advertising)?

Promising Research Directions

•Causal fairness with more nuanced notions of discrimination
•Individual fairness with learned metric spaces
•Multi-stakeholder fairness with game-theoretic models
•Fairness in foundation models and LLMs
•Robustness of fairness under distribution shift

Persistent Challenges

•Lack of consensus on fairness definitions
•Difficulty measuring fairness with limited data
•Tension between privacy and fairness auditing
•Gaming and strategic manipulation
•Scale and computational constraints

Summary: Fairness-Accuracy Tradeoffs

Key Takeaways

•Tradeoffs are fundamental, arising from the mathematics of constrained optimization. Adding fairness constraints cannot improve accuracy.
•Impossibility theorems prove that certain combinations of fairness criteria cannot be simultaneously satisfied when base rates differ.
•The Pareto frontier maps achievable (accuracy, fairness) combinations, making tradeoffs explicit and enabling informed decisions.
•The price of fairness varies widely depending on data properties, model complexity, and fairness criteria—often it's moderate (1-5% accuracy).
•Multiple strategies exist for navigating tradeoffs: stakeholder engagement, multi-objective optimization, sensitivity analysis, and iterative refinement.
•Reframing the problem (what we're measuring, long-term vs. short-term) sometimes dissolves the apparent tradeoff.
•Organizational processes are essential—fairness decisions are policy decisions requiring appropriate stakeholders and accountability.
•The field is evolving with open questions in both technical (causal, multi-group) and normative (who decides, how) dimensions.

Module Complete:

This knowledge equips you to build ML systems that are not just accurate, but fair—systems that work for everyone, not just the majority.

Module Complete

5 / 5