Machine LearningML Interpretability & Fairness

Fairness in Machine Learning

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

1 / 5

Fairness Definitions

The Fairness Imperative in Machine Learning

In 2016, ProPublica published a groundbreaking investigation revealing that COMPAS, a widely-used algorithmic risk assessment tool for criminal justice, was twice as likely to falsely label Black defendants as high-risk compared to white defendants. This wasn't a bug in the traditional sense—the algorithm was optimized for accuracy. But accuracy alone doesn't capture what we intuitively understand as fairness.

This revelation sparked a profound reckoning in the machine learning community. How do we define fairness mathematically? Can algorithms be both accurate and fair? And when different notions of fairness conflict—as they inevitably do—how do we choose?

These aren't merely technical questions. They represent one of the most significant challenges facing modern AI: translating human values into mathematical constraints.

What You Will Learn

By the end of this page, you will understand the philosophical foundations of fairness, master the formal mathematical definitions used in ML systems, recognize the different categories of fairness criteria, and appreciate why no single definition can satisfy all fairness requirements simultaneously.

Why Fairness in ML Matters

Machine learning models are increasingly deployed in high-stakes decision-making contexts that directly impact human lives:

Criminal justice: Risk scoring for bail, sentencing, and parole decisions
Financial services: Credit scoring, loan approvals, insurance pricing
Healthcare: Disease prediction, treatment recommendations, resource allocation
Employment: Resume screening, hiring decisions, performance evaluation
Education: Admissions, scholarship allocation, dropout prediction
Social services: Benefit eligibility, child welfare assessments

Unlike traditional software bugs that cause system crashes, fairness failures cause harm to people—often to those already marginalized by systemic inequities. When a credit algorithm discriminates based on race (even implicitly), real families are denied housing. When a hiring algorithm penalizes women, career opportunities evaporate.

The stakes demand rigorous understanding of what fairness means and how to achieve it.

The Feedback Loop Problem

ML systems don't just reflect bias—they amplify it. A biased hiring algorithm rejects qualified candidates from underrepresented groups. Those groups then have fewer employees in the training data. The next model iteration becomes more biased. This feedback loop can entrench discrimination at scale.

The Fundamental Challenge:

Fairness is not a single, universally-agreed concept. Different stakeholders, contexts, and ethical frameworks lead to different definitions. A definition that seems fair from one perspective may be deeply unfair from another.

Consider a university admissions algorithm:

One definition: Equal acceptance rates across demographic groups
Another definition: Equal acceptance rates among equally qualified applicants
Yet another: Outcomes that lead to proportional representation in the student body

These definitions can't all be satisfied simultaneously. Understanding this landscape of competing definitions is essential for any ML practitioner.

Philosophical Foundations of Fairness

Before diving into mathematical definitions, we must understand the philosophical traditions that inform different fairness concepts. These aren't just academic distinctions—they fundamentally shape what we optimize for.

The Three Major Frameworks:

Ethical Frameworks Underlying Fairness

•Egalitarianism — All individuals should be treated equally, regardless of group membership. This leads to definitions based on equal treatment and equal outcomes across groups. The challenge: treating everyone identically can perpetuate existing inequities.
•Libertarianism — Individuals should be judged on their individual merits alone. This leads to definitions focused on individual fairness and calibration. The challenge: 'merit' itself is often shaped by systemic inequities (e.g., educational opportunities).
•Utilitarianism — The goal is to maximize total welfare. This may justify some group-level disparities if they maximize overall good. The challenge: can justify sacrificing minorities for majority benefit.

John Rawls and the Veil of Ignorance:

One influential framework comes from philosopher John Rawls. He proposed a thought experiment: imagine designing a society without knowing your position in it (behind a 'veil of ignorance'). What rules would you choose?

Rawls argued we'd choose two principles:

Equal Liberty: Maximum freedom compatible with equal freedom for all
Difference Principle: Inequalities are only justified if they benefit the least advantaged members of society

In ML terms, Rawls might suggest we should optimize for the worst-off group, not average performance. This leads to minimax fairness approaches that ensure no group is left behind.

Practical Implication

When choosing a fairness definition, you're implicitly choosing a philosophical framework. Be explicit about this choice. Stakeholder discussions should begin with: 'What do we mean by fair?' before any modeling begins.

Philosophical Frameworks and Their ML Manifestations
Framework	Core Principle	ML Fairness Concept	Example Metric
Egalitarianism	Equal treatment/outcomes	Demographic Parity	P(Ŷ=1\|A=0) = P(Ŷ=1\|A=1)
Libertarianism	Individual merit	Individual Fairness	Similar individuals → similar outcomes
Utilitarianism	Maximize total welfare	Overall Accuracy	Maximize correct predictions
Rawlsian	Protect worst-off	Minimax Fairness	Maximize accuracy of worst-performing group

Group Fairness: Statistical Definitions

Group fairness definitions compare outcomes across different demographic groups. These are the most commonly used fairness criteria in practice because they're measurable from data.

Let's establish formal notation:

A = Protected attribute (e.g., race, gender)
X = Other features used for prediction
Y = True outcome (ground truth label)
Ŷ = Predicted outcome (model output)
R = Risk score (for probabilistic predictions)

The Three Fundamental Group Fairness Criteria:

Demographic Parity (also called Statistical Parity or Independence) requires that the prediction Ŷ is independent of the protected attribute A.

Formal Definition:

$$P(\hat{Y} = 1 | A = 0) = P(\hat{Y} = 1 | A = 1)$$

In words: the probability of receiving a positive prediction should be the same across all groups.

Example: In a hiring algorithm, demographic parity would require that men and women are selected at equal rates.

Advantages:

Simple to measure and enforce
Directly addresses representation disparities
Doesn't require access to ground truth labels

Disadvantages:

Ignores actual qualifications—may force selection of unqualified candidates or rejection of qualified ones
Can violate individual fairness (similar individuals treated differently)
May conflict with business necessity if groups have genuinely different base rates

demographic_parity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def demographic_parity_difference(y_pred, protected_attr):
    """
    Calculate demographic parity difference.
    Returns 0 if perfect parity, positive if group A=1 is favored.
    
    Args:
        y_pred: Binary predictions (0 or 1)
        protected_attr: Binary protected attribute (0 or 1)
    
    Returns:
        float: Difference in positive prediction rates between groups
    """
    import numpy as np
    
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    # Positive prediction rate for group A=0
    group_0_rate = y_pred[protected_attr == 0].mean()
    
    # Positive prediction rate for group A=1
    group_1_rate = y_pred[protected_attr == 1].mean()
    
    return group_1_rate - group_0_rate
 
 
def demographic_parity_ratio(y_pred, protected_attr):
    """
    Calculate demographic parity ratio (disparate impact ratio).
    Returns 1 if perfect parity.
    Values < 1 indicate group A=0 is favored.
    Values > 1 indicate group A=1 is favored.
    """
    import numpy as np
    
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    group_0_rate = y_pred[protected_attr == 0].mean()
    group_1_rate = y_pred[protected_attr == 1].mean()
    
    # Avoid division by zero
    if group_0_rate == 0:
        return float('inf') if group_1_rate > 0 else 1.0
    
    return group_1_rate / group_0_rate
 
 
# Example usage
if __name__ == "__main__":
    # Simulated hiring decisions
    predictions = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
    protected = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]  # Group membership
    
    dp_diff = demographic_parity_difference(predictions, protected)
    dp_ratio = demographic_parity_ratio(predictions, protected)
    
    print(f"Demographic Parity Difference: {dp_diff:.3f}")
    print(f"Demographic Parity Ratio: {dp_ratio:.3f}")
    
    # The 80% rule: ratio should be >= 0.8
    print(f"Passes 80% rule: {dp_ratio >= 0.8 and dp_ratio <= 1.25}")

Individual Fairness: Similar Individuals, Similar Outcomes

While group fairness focuses on statistical parity across demographic groups, individual fairness addresses a more intuitive principle: similar individuals should be treated similarly, regardless of group membership.

Formal Definition (Dwork et al., 2012):

$$d_Y(M(x_1), M(x_2)) \leq L \cdot d_X(x_1, x_2)$$

Where:

M is the ML model (mapping individuals to outcomes)
d_X is a distance metric over individuals in feature space
d_Y is a distance metric over outcomes
L is a Lipschitz constant bounding how much outcomes can differ

In words: if two individuals are 'close' in relevant characteristics, their predicted outcomes should also be 'close.'

The Metric Problem

The devil is in the details. Who defines the distance metric d_X? What constitutes 'similar' individuals? This metric must capture 'task-relevant similarity' while excluding protected attributes—a deeply non-trivial design choice that often requires domain expertise and stakeholder input.

Example: Loan Approval

Consider two loan applicants:

Applicant A: Income $50K, debt $10K, credit score 720, age 28, male
Applicant B: Income $52K, debt $9K, credit score 715, age 29, female

These applicants are nearly identical on legitimate credit factors. Individual fairness demands they receive similar loan decisions—regardless of gender. If the model approves A but rejects B, it violates individual fairness even if group-level statistics look balanced.

Relationship to Group Fairness:

Individual fairness and group fairness are distinct concepts that neither implies the other:

Individual fairness ⇏ Group fairness: You can treat similar individuals similarly but still have disparate group outcomes if the groups have different distributions of relevant features.
Group fairness ⇏ Individual fairness: You can achieve equal selection rates across groups while still treating similar individuals very differently (random selection satisfies demographic parity).

Individual vs. Group Fairness Comparison
Aspect	Group Fairness	Individual Fairness
Unit of Analysis	Demographic groups	Individual pairs
Core Principle	Equal statistics across groups	Similar inputs → similar outputs
Measurement	Compare group-level rates	Requires distance metric over individuals
Advantage	Easy to measure from data	Intuitive, treats people as individuals
Challenge	May violate individual merit	Defining 'similarity' is hard
Legal Analog	Disparate impact doctrine	Individual discrimination claims

individual_fairness.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def individual_fairness_violation(X, predictions, distance_metric,
                                    outcome_distance, lipschitz_bound=1.0):
    """
    Measure individual fairness violations using Lipschitz constraint.
    
    For each pair of individuals, checks:
    d_Y(M(x1), M(x2)) <= L * d_X(x1, x2)
    
    Args:
        X: Feature matrix (n_samples, n_features)
        predictions: Model predictions or probabilities
        distance_metric: Function computing d_X(x1, x2)
        outcome_distance: Function computing d_Y(pred1, pred2)
        lipschitz_bound: Maximum allowed ratio L
    
    Returns:
        dict: Violation statistics
    """
    import numpy as np
    from itertools import combinations
    
    n = len(predictions)
    violations = []
    total_pairs = 0
    
    # Check all pairs (expensive for large n)
    for i, j in combinations(range(n), 2):
        d_x = distance_metric(X[i], X[j])
        d_y = outcome_distance(predictions[i], predictions[j])
        
        max_allowed = lipschitz_bound * d_x
        
        if d_y > max_allowed and d_x > 0:
            violations.append({
                'pair': (i, j),
                'd_x': d_x,
                'd_y': d_y,
                'violation_amount': d_y - max_allowed
            })
        
        total_pairs += 1
    
    return {
        'total_pairs': total_pairs,
        'num_violations': len(violations),
        'violation_rate': len(violations) / total_pairs if total_pairs > 0 else 0,
        'violations': violations[:10]  # Return first 10 for inspection
    }
 
 
def compute_individual_fairness_metric(X, probabilities, 
                                        sensitive_cols=None):
    """
    Compute consistency metric for individual fairness.
    Based on k-NN: similar individuals should have similar predictions.
    
    Args:
        X: Feature matrix
        probabilities: Predicted probabilities
        sensitive_cols: Indices of sensitive columns to exclude
    
    Returns:
        float: Consistency score (higher = more individually fair)
    """
    import numpy as np
    from sklearn.neighbors import NearestNeighbors
    
    # Remove sensitive attributes for distance computation
    if sensitive_cols is not None:
        X_fair = np.delete(X, sensitive_cols, axis=1)
    else:
        X_fair = X
    
    # Find k nearest neighbors
    k = 5
    nn = NearestNeighbors(n_neighbors=k+1)  # +1 includes self
    nn.fit(X_fair)
    
    _, indices = nn.kneighbors(X_fair)
    neighbor_indices = indices[:, 1:]  # Exclude self
    
    # Measure prediction consistency with neighbors
    consistency_scores = []
    for i in range(len(probabilities)):
        pred_i = probabilities[i]
        neighbor_preds = probabilities[neighbor_indices[i]]
        
        # Average absolute difference with neighbors
        diff = np.abs(neighbor_preds - pred_i).mean()
        consistency = 1 - diff  # Convert to similarity
        consistency_scores.append(consistency)
    
    return np.mean(consistency_scores)
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    
    # Create feature matrix (excluding protected attribute for distance)
    n = 100
    X = np.random.randn(n, 3)  # 3 features
    
    # Simulated predictions (some inconsistency)
    probabilities = 1 / (1 + np.exp(-X[:, 0] - 0.5 * X[:, 1]))
    
    # Add noise to create some individual fairness violations
    probabilities += np.random.randn(n) * 0.1
    probabilities = np.clip(probabilities, 0, 1)
    
    consistency = compute_individual_fairness_metric(X, probabilities)
    print(f"Individual Fairness Consistency: {consistency:.3f}")

The Impossibility Theorem: Fairness Tradeoffs

One of the most important theoretical results in ML fairness is the impossibility theorem (Chouldechova, 2017; Kleinberg et al., 2016). It proves that certain fairness criteria cannot be satisfied simultaneously when base rates differ between groups.

The Setup:

Consider a binary classifier and two groups with different base rates:

Group A: 10% positive rate (e.g., 10% recidivism)
Group B: 30% positive rate (e.g., 30% recidivism)

The impossibility theorem states that you cannot simultaneously achieve:

Calibration: P(Y=1|R=r, A=a) = r for all groups
Equal FPR: P(Ŷ=1|Y=0, A=0) = P(Ŷ=1|Y=0, A=1)
Equal FNR: P(Ŷ=0|Y=1, A=0) = P(Ŷ=0|Y=1, A=1)

unless either:

The classifier is perfect (no errors), or
The base rates are equal across groups

Mathematical Impossibility

This isn't a limitation of current technology—it's a mathematical fact. No algorithm, no matter how sophisticated, can satisfy all three conditions when base rates differ. This means every fairness intervention involves choosing which fairness criteria to prioritize.

Intuition Behind the Impossibility:

Imagine you have a calibrated risk score. For Group A (10% base rate), a score of 30% is already 3x their average—these are relatively high-risk individuals within their group. For Group B (30% base rate), a score of 30% is exactly average.

Now, if you set a threshold and predict positive for everyone above 30%:

In Group A, you're catching the higher-risk individuals
In Group B, you're catching everyone at or above average

These groups will have different error rate profiles because the same threshold means different things relative to each group's distribution.

To equalize error rates, you'd need different thresholds—but then the scores wouldn't be calibrated anymore (a 30% score would mean different things for different groups).

Practical Implications:

Navigating the Impossibility

•No free lunch: Every fairness choice involves tradeoffs. Be explicit about which criteria you're prioritizing and why.
•Context matters: Different applications may warrant different fairness criteria. Criminal justice may prioritize equal FPR; medical screening may prioritize calibration.
•Stakeholder input: The choice between fairness criteria is fundamentally a values question, not a technical one. Affected communities should have input.
•Address base rate disparities: Instead of just adjusting algorithms, consider interventions that address why base rates differ (upstream inequities).
•Transparency: Document which fairness criteria the system optimizes for and which it sacrifices. Users and auditors need this information.

Converting Mermaid diagram...

A Complete Taxonomy of Fairness Definitions

Beyond the fundamental trio of demographic parity, equalized odds, and calibration, researchers have proposed dozens of fairness definitions. Understanding this taxonomy helps you select appropriate criteria for specific applications.

Categorization by Statistical Relationship:

Complete Fairness Definitions Taxonomy
Category	Definition Name	Mathematical Condition	Intuition
Independence<br/>(Ŷ ⊥ A)	Demographic Parity	P(Ŷ\|A=0) = P(Ŷ\|A=1)	Equal selection rates
	Conditional Statistical Parity	P(Ŷ\|A=0,L) = P(Ŷ\|A=1,L)	Equal rates conditioning on legitimate factors
Separation<br/>(Ŷ ⊥ A \| Y)	Equalized Odds	P(Ŷ\|Y,A=0) = P(Ŷ\|Y,A=1)	Equal TPR and FPR
	Equal Opportunity	P(Ŷ=1\|Y=1,A=0) = P(Ŷ=1\|Y=1,A=1)	Equal TPR only
	Predictive Equality	P(Ŷ=1\|Y=0,A=0) = P(Ŷ=1\|Y=0,A=1)	Equal FPR only
Sufficiency<br/>(Y ⊥ A \| Ŷ)	Calibration	P(Y\|Ŷ=ŷ,A=0) = P(Y\|Ŷ=ŷ,A=1)	Scores mean the same thing
	Predictive Parity	P(Y=1\|Ŷ=1,A=0) = P(Y=1\|Ŷ=1,A=1)	Equal PPV
	Balance for Positive Class	E[R\|Y=1,A=0] = E[R\|Y=1,A=1)	Equal average score among positives
Counterfactual	Counterfactual Fairness	P(Ŷ_a\|A=a) = P(Ŷ_a'\|A=a)	Same prediction if A were different

Interpretation Guide:

Independence-based (Anti-classification):

Don't look at protected attribute
Predictions should be independent of group membership
Use case: When any group-based disparity is unacceptable (affirmative action contexts)

Separation-based (Error balance):

Equal errors across groups
Conditions on true outcome
Use case: When false predictions have concrete harms (criminal justice, medical diagnosis)

Sufficiency-based (Calibration):

Predictions mean the same thing for everyone
Supports individual decision-making
Use case: When people act on predicted probabilities (risk scores, insurance)

Counterfactual:

Uses causal reasoning
Asks 'what if this person belonged to different group?'
Use case: When you have causal knowledge of how attributes affect outcomes

Choosing the Right Definition

Start with the question: 'What harm are we trying to prevent?' If groups shouldn't have different selection rates → demographic parity. If false accusations disproportionately harm one group → equalized odds. If risk scores inform individual decisions → calibration. The application context determines the appropriate fairness notion.

Fairness Beyond Binary Classification

Most fairness research focuses on binary classification, but many ML applications involve ranking (search results, recommendations) or regression (salary prediction, credit limits). These require extended fairness definitions.

Fairness in Ranking:

For ranked lists (e.g., job candidates, search results), fairness involves exposure and attention allocation:

Demographic Parity in Exposure: Each group should receive proportional exposure across ranking positions
Equity of Attention: Users from different groups should have equal probability of appearing in top-k positions
Position Bias Correction: Account for the fact that positions have different visibility (top results get more attention)

ranking_fairness.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
def normalized_discounted_kl_divergence(ranking, protected_attr, k=10):
    """
    Measure fairness in top-k ranking using normalized discounted KL divergence.
    
    Accounts for position bias: higher positions matter more.
    
    Args:
        ranking: Indices of items in ranked order
        protected_attr: Protected attribute for each item (0 or 1)
        k: Number of top positions to evaluate
    
    Returns:
        float: NDKL score (0 = perfect fairness)
    """
    import numpy as np
    
    protected_attr = np.array(protected_attr)
    
    # Target proportion in population
    target_prop = protected_attr.mean()
    
    # Position discounts (logarithmic)
    discounts = 1 / np.log2(np.arange(2, k + 2))
    normalizer = discounts.sum()
    
    # Calculate exposure for protected group
    top_k_indices = ranking[:k]
    top_k_protected = protected_attr[top_k_indices]
    
    # Weighted proportion at each position
    weighted_exposure = (top_k_protected * discounts).sum() / normalizer
    
    # KL divergence from target distribution
    # Using symmetric version for stability
    epsilon = 1e-10
    kl = weighted_exposure * np.log((weighted_exposure + epsilon) / (target_prop + epsilon))
    kl += (1 - weighted_exposure) * np.log((1 - weighted_exposure + epsilon) / (1 - target_prop + epsilon))
    
    return kl
 
 
def group_exposure_ratio(ranking, protected_attr, position_weights=None):
    """
    Calculate ratio of exposure between groups.
    
    Args:
        ranking: Indices in ranked order
        protected_attr: Protected attribute for each item
        position_weights: Weight for each position (default: 1/position)
    
    Returns:
        float: Ratio of group 1 exposure to group 0 exposure
    """
    import numpy as np
    
    n = len(ranking)
    if position_weights is None:
        position_weights = 1. / np.arange(1, n + 1)
    
    protected_attr = np.array(protected_attr)
    
    group_0_exposure = 0
    group_1_exposure = 0
    
    for pos, item_idx in enumerate(ranking):
        weight = position_weights[pos]
        if protected_attr[item_idx] == 0:
            group_0_exposure += weight
        else:
            group_1_exposure += weight
    
    # Normalize by group sizes
    n_group_0 = (protected_attr == 0).sum()
    n_group_1 = (protected_attr == 1).sum()
    
    if n_group_0 > 0:
        group_0_exposure /= n_group_0
    if n_group_1 > 0:
        group_1_exposure /= n_group_1
    
    if group_0_exposure == 0:
        return float('inf')
    
    return group_1_exposure / group_0_exposure
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    n = 50
    
    # Protected attributes (40% group 1)
    protected = np.random.binomial(1, 0.4, n)
    
    # Biased ranking: group 0 tends to rank higher
    scores = np.random.randn(n) - protected * 0.5
    ranking = np.argsort(-scores)  # Descending
    
    ndkl = normalized_discounted_kl_divergence(ranking, protected, k=10)
    exposure_ratio = group_exposure_ratio(ranking, protected)
    
    print(f"Normalized Discounted KL Divergence: {ndkl:.4f}")
    print(f"Exposure Ratio (Group 1 / Group 0): {exposure_ratio:.3f}")
    print(f"Fair exposure ratio should be: ~1.0")

Fairness in Regression:

For continuous outcomes (salary, loan amount, estimated home value), fairness definitions adapt:

Equal Mean Prediction: E[Ŷ|A=0] = E[Ŷ|A=1]
Equal Residuals: E[Y-Ŷ|A=0] = E[Y-Ŷ|A=1] (no systematic over/under-prediction)
Bounded Group Loss: Max group-specific loss is minimized

Regression Fairness Considerations

•Scale-sensitivity: Small percentage biases can mean large absolute differences at high values (1% of $1M vs 1% of $50K)
•Heterogeneous effects: Bias may vary across the outcome distribution (larger at extremes)
•Intersectionality: Bias patterns may differ at intersections of multiple protected attributes
•Residual analysis: Check residuals by group to identify systematic prediction bias

Summary: The Fairness Landscape

We've traversed the rich landscape of fairness definitions in machine learning. Let's consolidate the key insights:

Key Takeaways

•Fairness has multiple valid definitions — Demographic parity, equalized odds, calibration, and individual fairness capture different aspects of what we consider 'fair.'
•Philosophical foundations matter — Each definition reflects underlying ethical frameworks (egalitarian, libertarian, utilitarian, Rawlsian). Be explicit about your framework.
•The impossibility theorem is fundamental — When base rates differ, you cannot simultaneously achieve calibration and equal error rates. Every choice involves tradeoffs.
•Context determines appropriate criteria — Criminal justice may prioritize equal FPR; medical diagnosis may prioritize calibration; hiring may prioritize demographic parity. No universal answer exists.
•Individual and group fairness are distinct — Satisfying one does not guarantee the other. Consider both perspectives in system design.
•Fairness extends beyond classification — Ranking, regression, and other ML tasks require adapted fairness definitions that account for their unique characteristics.

What's Next:

Now that we understand the formal definitions of fairness, the next page examines protected attributes—the specific characteristics (race, gender, age, disability) around which fairness concerns arise. We'll explore legal frameworks, proxy discrimination, and the complex questions of which attributes deserve protection and how to handle them in ML systems.

Page Complete

You now have a comprehensive understanding of fairness definitions in ML. These formal concepts provide the vocabulary and mathematical tools necessary to analyze, measure, and improve fairness in machine learning systems. The journey continues with protected attributes and the legal dimensions of algorithmic fairness.

1 / 5

Loading learning content...

Machine LearningML Interpretability & Fairness

Fairness in Machine Learning

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

1 / 5

Fairness Definitions

The Fairness Imperative in Machine Learning

These aren't merely technical questions. They represent one of the most significant challenges facing modern AI: translating human values into mathematical constraints.

What You Will Learn

Why Fairness in ML Matters

Machine learning models are increasingly deployed in high-stakes decision-making contexts that directly impact human lives:

Criminal justice: Risk scoring for bail, sentencing, and parole decisions
Financial services: Credit scoring, loan approvals, insurance pricing
Healthcare: Disease prediction, treatment recommendations, resource allocation
Employment: Resume screening, hiring decisions, performance evaluation
Education: Admissions, scholarship allocation, dropout prediction
Social services: Benefit eligibility, child welfare assessments

The stakes demand rigorous understanding of what fairness means and how to achieve it.

The Feedback Loop Problem

The Fundamental Challenge:

Consider a university admissions algorithm:

One definition: Equal acceptance rates across demographic groups
Another definition: Equal acceptance rates among equally qualified applicants
Yet another: Outcomes that lead to proportional representation in the student body

These definitions can't all be satisfied simultaneously. Understanding this landscape of competing definitions is essential for any ML practitioner.

Philosophical Foundations of Fairness

The Three Major Frameworks:

Ethical Frameworks Underlying Fairness

•Egalitarianism — All individuals should be treated equally, regardless of group membership. This leads to definitions based on equal treatment and equal outcomes across groups. The challenge: treating everyone identically can perpetuate existing inequities.
•Libertarianism — Individuals should be judged on their individual merits alone. This leads to definitions focused on individual fairness and calibration. The challenge: 'merit' itself is often shaped by systemic inequities (e.g., educational opportunities).
•Utilitarianism — The goal is to maximize total welfare. This may justify some group-level disparities if they maximize overall good. The challenge: can justify sacrificing minorities for majority benefit.

John Rawls and the Veil of Ignorance:

Rawls argued we'd choose two principles:

Equal Liberty: Maximum freedom compatible with equal freedom for all
Difference Principle: Inequalities are only justified if they benefit the least advantaged members of society

In ML terms, Rawls might suggest we should optimize for the worst-off group, not average performance. This leads to minimax fairness approaches that ensure no group is left behind.

Practical Implication

Philosophical Frameworks and Their ML Manifestations
Framework	Core Principle	ML Fairness Concept	Example Metric
Egalitarianism	Equal treatment/outcomes	Demographic Parity	P(Ŷ=1\|A=0) = P(Ŷ=1\|A=1)
Libertarianism	Individual merit	Individual Fairness	Similar individuals → similar outcomes
Utilitarianism	Maximize total welfare	Overall Accuracy	Maximize correct predictions
Rawlsian	Protect worst-off	Minimax Fairness	Maximize accuracy of worst-performing group

Group Fairness: Statistical Definitions

Group fairness definitions compare outcomes across different demographic groups. These are the most commonly used fairness criteria in practice because they're measurable from data.

Let's establish formal notation:

A = Protected attribute (e.g., race, gender)
X = Other features used for prediction
Y = True outcome (ground truth label)
Ŷ = Predicted outcome (model output)
R = Risk score (for probabilistic predictions)

The Three Fundamental Group Fairness Criteria:

Demographic Parity (also called Statistical Parity or Independence) requires that the prediction Ŷ is independent of the protected attribute A.

Formal Definition:

$$P(\hat{Y} = 1 | A = 0) = P(\hat{Y} = 1 | A = 1)$$

In words: the probability of receiving a positive prediction should be the same across all groups.

Example: In a hiring algorithm, demographic parity would require that men and women are selected at equal rates.

Advantages:

Simple to measure and enforce
Directly addresses representation disparities
Doesn't require access to ground truth labels

Disadvantages:

Ignores actual qualifications—may force selection of unqualified candidates or rejection of qualified ones
Can violate individual fairness (similar individuals treated differently)
May conflict with business necessity if groups have genuinely different base rates

demographic_parity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def demographic_parity_difference(y_pred, protected_attr):
    """
    Calculate demographic parity difference.
    Returns 0 if perfect parity, positive if group A=1 is favored.
    
    Args:
        y_pred: Binary predictions (0 or 1)
        protected_attr: Binary protected attribute (0 or 1)
    
    Returns:
        float: Difference in positive prediction rates between groups
    """
    import numpy as np
    
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    # Positive prediction rate for group A=0
    group_0_rate = y_pred[protected_attr == 0].mean()
    
    # Positive prediction rate for group A=1
    group_1_rate = y_pred[protected_attr == 1].mean()
    
    return group_1_rate - group_0_rate
 
 
def demographic_parity_ratio(y_pred, protected_attr):
    """
    Calculate demographic parity ratio (disparate impact ratio).
    Returns 1 if perfect parity.
    Values < 1 indicate group A=0 is favored.
    Values > 1 indicate group A=1 is favored.
    """
    import numpy as np
    
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    group_0_rate = y_pred[protected_attr == 0].mean()
    group_1_rate = y_pred[protected_attr == 1].mean()
    
    # Avoid division by zero
    if group_0_rate == 0:
        return float('inf') if group_1_rate > 0 else 1.0
    
    return group_1_rate / group_0_rate
 
 
# Example usage
if __name__ == "__main__":
    # Simulated hiring decisions
    predictions = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
    protected = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]  # Group membership
    
    dp_diff = demographic_parity_difference(predictions, protected)
    dp_ratio = demographic_parity_ratio(predictions, protected)
    
    print(f"Demographic Parity Difference: {dp_diff:.3f}")
    print(f"Demographic Parity Ratio: {dp_ratio:.3f}")
    
    # The 80% rule: ratio should be >= 0.8
    print(f"Passes 80% rule: {dp_ratio >= 0.8 and dp_ratio <= 1.25}")

Individual Fairness: Similar Individuals, Similar Outcomes

Formal Definition (Dwork et al., 2012):

$$d_Y(M(x_1), M(x_2)) \leq L \cdot d_X(x_1, x_2)$$

Where:

M is the ML model (mapping individuals to outcomes)
d_X is a distance metric over individuals in feature space
d_Y is a distance metric over outcomes
L is a Lipschitz constant bounding how much outcomes can differ

In words: if two individuals are 'close' in relevant characteristics, their predicted outcomes should also be 'close.'

The Metric Problem

Example: Loan Approval

Consider two loan applicants:

Applicant A: Income $50K, debt $10K, credit score 720, age 28, male
Applicant B: Income $52K, debt $9K, credit score 715, age 29, female

Relationship to Group Fairness:

Individual fairness and group fairness are distinct concepts that neither implies the other:

Individual fairness ⇏ Group fairness: You can treat similar individuals similarly but still have disparate group outcomes if the groups have different distributions of relevant features.
Group fairness ⇏ Individual fairness: You can achieve equal selection rates across groups while still treating similar individuals very differently (random selection satisfies demographic parity).

Individual vs. Group Fairness Comparison
Aspect	Group Fairness	Individual Fairness
Unit of Analysis	Demographic groups	Individual pairs
Core Principle	Equal statistics across groups	Similar inputs → similar outputs
Measurement	Compare group-level rates	Requires distance metric over individuals
Advantage	Easy to measure from data	Intuitive, treats people as individuals
Challenge	May violate individual merit	Defining 'similarity' is hard
Legal Analog	Disparate impact doctrine	Individual discrimination claims

individual_fairness.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def individual_fairness_violation(X, predictions, distance_metric,
                                    outcome_distance, lipschitz_bound=1.0):
    """
    Measure individual fairness violations using Lipschitz constraint.
    
    For each pair of individuals, checks:
    d_Y(M(x1), M(x2)) <= L * d_X(x1, x2)
    
    Args:
        X: Feature matrix (n_samples, n_features)
        predictions: Model predictions or probabilities
        distance_metric: Function computing d_X(x1, x2)
        outcome_distance: Function computing d_Y(pred1, pred2)
        lipschitz_bound: Maximum allowed ratio L
    
    Returns:
        dict: Violation statistics
    """
    import numpy as np
    from itertools import combinations
    
    n = len(predictions)
    violations = []
    total_pairs = 0
    
    # Check all pairs (expensive for large n)
    for i, j in combinations(range(n), 2):
        d_x = distance_metric(X[i], X[j])
        d_y = outcome_distance(predictions[i], predictions[j])
        
        max_allowed = lipschitz_bound * d_x
        
        if d_y > max_allowed and d_x > 0:
            violations.append({
                'pair': (i, j),
                'd_x': d_x,
                'd_y': d_y,
                'violation_amount': d_y - max_allowed
            })
        
        total_pairs += 1
    
    return {
        'total_pairs': total_pairs,
        'num_violations': len(violations),
        'violation_rate': len(violations) / total_pairs if total_pairs > 0 else 0,
        'violations': violations[:10]  # Return first 10 for inspection
    }
 
 
def compute_individual_fairness_metric(X, probabilities, 
                                        sensitive_cols=None):
    """
    Compute consistency metric for individual fairness.
    Based on k-NN: similar individuals should have similar predictions.
    
    Args:
        X: Feature matrix
        probabilities: Predicted probabilities
        sensitive_cols: Indices of sensitive columns to exclude
    
    Returns:
        float: Consistency score (higher = more individually fair)
    """
    import numpy as np
    from sklearn.neighbors import NearestNeighbors
    
    # Remove sensitive attributes for distance computation
    if sensitive_cols is not None:
        X_fair = np.delete(X, sensitive_cols, axis=1)
    else:
        X_fair = X
    
    # Find k nearest neighbors
    k = 5
    nn = NearestNeighbors(n_neighbors=k+1)  # +1 includes self
    nn.fit(X_fair)
    
    _, indices = nn.kneighbors(X_fair)
    neighbor_indices = indices[:, 1:]  # Exclude self
    
    # Measure prediction consistency with neighbors
    consistency_scores = []
    for i in range(len(probabilities)):
        pred_i = probabilities[i]
        neighbor_preds = probabilities[neighbor_indices[i]]
        
        # Average absolute difference with neighbors
        diff = np.abs(neighbor_preds - pred_i).mean()
        consistency = 1 - diff  # Convert to similarity
        consistency_scores.append(consistency)
    
    return np.mean(consistency_scores)
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    
    # Create feature matrix (excluding protected attribute for distance)
    n = 100
    X = np.random.randn(n, 3)  # 3 features
    
    # Simulated predictions (some inconsistency)
    probabilities = 1 / (1 + np.exp(-X[:, 0] - 0.5 * X[:, 1]))
    
    # Add noise to create some individual fairness violations
    probabilities += np.random.randn(n) * 0.1
    probabilities = np.clip(probabilities, 0, 1)
    
    consistency = compute_individual_fairness_metric(X, probabilities)
    print(f"Individual Fairness Consistency: {consistency:.3f}")

The Impossibility Theorem: Fairness Tradeoffs

The Setup:

Consider a binary classifier and two groups with different base rates:

Group A: 10% positive rate (e.g., 10% recidivism)
Group B: 30% positive rate (e.g., 30% recidivism)

The impossibility theorem states that you cannot simultaneously achieve:

Calibration: P(Y=1|R=r, A=a) = r for all groups
Equal FPR: P(Ŷ=1|Y=0, A=0) = P(Ŷ=1|Y=0, A=1)
Equal FNR: P(Ŷ=0|Y=1, A=0) = P(Ŷ=0|Y=1, A=1)

unless either:

The classifier is perfect (no errors), or
The base rates are equal across groups

Mathematical Impossibility

Intuition Behind the Impossibility:

Now, if you set a threshold and predict positive for everyone above 30%:

In Group A, you're catching the higher-risk individuals
In Group B, you're catching everyone at or above average

These groups will have different error rate profiles because the same threshold means different things relative to each group's distribution.

To equalize error rates, you'd need different thresholds—but then the scores wouldn't be calibrated anymore (a 30% score would mean different things for different groups).

Practical Implications:

Navigating the Impossibility

•No free lunch: Every fairness choice involves tradeoffs. Be explicit about which criteria you're prioritizing and why.
•Context matters: Different applications may warrant different fairness criteria. Criminal justice may prioritize equal FPR; medical screening may prioritize calibration.
•Stakeholder input: The choice between fairness criteria is fundamentally a values question, not a technical one. Affected communities should have input.
•Address base rate disparities: Instead of just adjusting algorithms, consider interventions that address why base rates differ (upstream inequities).
•Transparency: Document which fairness criteria the system optimizes for and which it sacrifices. Users and auditors need this information.

Converting Mermaid diagram...

A Complete Taxonomy of Fairness Definitions

Categorization by Statistical Relationship:

Complete Fairness Definitions Taxonomy
Category	Definition Name	Mathematical Condition	Intuition
Independence<br/>(Ŷ ⊥ A)	Demographic Parity	P(Ŷ\|A=0) = P(Ŷ\|A=1)	Equal selection rates
	Conditional Statistical Parity	P(Ŷ\|A=0,L) = P(Ŷ\|A=1,L)	Equal rates conditioning on legitimate factors
Separation<br/>(Ŷ ⊥ A \| Y)	Equalized Odds	P(Ŷ\|Y,A=0) = P(Ŷ\|Y,A=1)	Equal TPR and FPR
	Equal Opportunity	P(Ŷ=1\|Y=1,A=0) = P(Ŷ=1\|Y=1,A=1)	Equal TPR only
	Predictive Equality	P(Ŷ=1\|Y=0,A=0) = P(Ŷ=1\|Y=0,A=1)	Equal FPR only
Sufficiency<br/>(Y ⊥ A \| Ŷ)	Calibration	P(Y\|Ŷ=ŷ,A=0) = P(Y\|Ŷ=ŷ,A=1)	Scores mean the same thing
	Predictive Parity	P(Y=1\|Ŷ=1,A=0) = P(Y=1\|Ŷ=1,A=1)	Equal PPV
	Balance for Positive Class	E[R\|Y=1,A=0] = E[R\|Y=1,A=1)	Equal average score among positives
Counterfactual	Counterfactual Fairness	P(Ŷ_a\|A=a) = P(Ŷ_a'\|A=a)	Same prediction if A were different

Interpretation Guide:

Independence-based (Anti-classification):

Don't look at protected attribute
Predictions should be independent of group membership
Use case: When any group-based disparity is unacceptable (affirmative action contexts)

Separation-based (Error balance):

Equal errors across groups
Conditions on true outcome
Use case: When false predictions have concrete harms (criminal justice, medical diagnosis)

Sufficiency-based (Calibration):

Predictions mean the same thing for everyone
Supports individual decision-making
Use case: When people act on predicted probabilities (risk scores, insurance)

Counterfactual:

Uses causal reasoning
Asks 'what if this person belonged to different group?'
Use case: When you have causal knowledge of how attributes affect outcomes

Choosing the Right Definition

Fairness Beyond Binary Classification

Fairness in Ranking:

For ranked lists (e.g., job candidates, search results), fairness involves exposure and attention allocation:

Demographic Parity in Exposure: Each group should receive proportional exposure across ranking positions
Equity of Attention: Users from different groups should have equal probability of appearing in top-k positions
Position Bias Correction: Account for the fact that positions have different visibility (top results get more attention)

ranking_fairness.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
def normalized_discounted_kl_divergence(ranking, protected_attr, k=10):
    """
    Measure fairness in top-k ranking using normalized discounted KL divergence.
    
    Accounts for position bias: higher positions matter more.
    
    Args:
        ranking: Indices of items in ranked order
        protected_attr: Protected attribute for each item (0 or 1)
        k: Number of top positions to evaluate
    
    Returns:
        float: NDKL score (0 = perfect fairness)
    """
    import numpy as np
    
    protected_attr = np.array(protected_attr)
    
    # Target proportion in population
    target_prop = protected_attr.mean()
    
    # Position discounts (logarithmic)
    discounts = 1 / np.log2(np.arange(2, k + 2))
    normalizer = discounts.sum()
    
    # Calculate exposure for protected group
    top_k_indices = ranking[:k]
    top_k_protected = protected_attr[top_k_indices]
    
    # Weighted proportion at each position
    weighted_exposure = (top_k_protected * discounts).sum() / normalizer
    
    # KL divergence from target distribution
    # Using symmetric version for stability
    epsilon = 1e-10
    kl = weighted_exposure * np.log((weighted_exposure + epsilon) / (target_prop + epsilon))
    kl += (1 - weighted_exposure) * np.log((1 - weighted_exposure + epsilon) / (1 - target_prop + epsilon))
    
    return kl
 
 
def group_exposure_ratio(ranking, protected_attr, position_weights=None):
    """
    Calculate ratio of exposure between groups.
    
    Args:
        ranking: Indices in ranked order
        protected_attr: Protected attribute for each item
        position_weights: Weight for each position (default: 1/position)
    
    Returns:
        float: Ratio of group 1 exposure to group 0 exposure
    """
    import numpy as np
    
    n = len(ranking)
    if position_weights is None:
        position_weights = 1. / np.arange(1, n + 1)
    
    protected_attr = np.array(protected_attr)
    
    group_0_exposure = 0
    group_1_exposure = 0
    
    for pos, item_idx in enumerate(ranking):
        weight = position_weights[pos]
        if protected_attr[item_idx] == 0:
            group_0_exposure += weight
        else:
            group_1_exposure += weight
    
    # Normalize by group sizes
    n_group_0 = (protected_attr == 0).sum()
    n_group_1 = (protected_attr == 1).sum()
    
    if n_group_0 > 0:
        group_0_exposure /= n_group_0
    if n_group_1 > 0:
        group_1_exposure /= n_group_1
    
    if group_0_exposure == 0:
        return float('inf')
    
    return group_1_exposure / group_0_exposure
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    n = 50
    
    # Protected attributes (40% group 1)
    protected = np.random.binomial(1, 0.4, n)
    
    # Biased ranking: group 0 tends to rank higher
    scores = np.random.randn(n) - protected * 0.5
    ranking = np.argsort(-scores)  # Descending
    
    ndkl = normalized_discounted_kl_divergence(ranking, protected, k=10)
    exposure_ratio = group_exposure_ratio(ranking, protected)
    
    print(f"Normalized Discounted KL Divergence: {ndkl:.4f}")
    print(f"Exposure Ratio (Group 1 / Group 0): {exposure_ratio:.3f}")
    print(f"Fair exposure ratio should be: ~1.0")

Fairness in Regression:

For continuous outcomes (salary, loan amount, estimated home value), fairness definitions adapt:

Equal Mean Prediction: E[Ŷ|A=0] = E[Ŷ|A=1]
Equal Residuals: E[Y-Ŷ|A=0] = E[Y-Ŷ|A=1] (no systematic over/under-prediction)
Bounded Group Loss: Max group-specific loss is minimized

Regression Fairness Considerations

•Scale-sensitivity: Small percentage biases can mean large absolute differences at high values (1% of $1M vs 1% of $50K)
•Heterogeneous effects: Bias may vary across the outcome distribution (larger at extremes)
•Intersectionality: Bias patterns may differ at intersections of multiple protected attributes
•Residual analysis: Check residuals by group to identify systematic prediction bias

Summary: The Fairness Landscape

We've traversed the rich landscape of fairness definitions in machine learning. Let's consolidate the key insights:

Key Takeaways

•Fairness has multiple valid definitions — Demographic parity, equalized odds, calibration, and individual fairness capture different aspects of what we consider 'fair.'
•Philosophical foundations matter — Each definition reflects underlying ethical frameworks (egalitarian, libertarian, utilitarian, Rawlsian). Be explicit about your framework.
•The impossibility theorem is fundamental — When base rates differ, you cannot simultaneously achieve calibration and equal error rates. Every choice involves tradeoffs.
•Context determines appropriate criteria — Criminal justice may prioritize equal FPR; medical diagnosis may prioritize calibration; hiring may prioritize demographic parity. No universal answer exists.
•Individual and group fairness are distinct — Satisfying one does not guarantee the other. Consider both perspectives in system design.
•Fairness extends beyond classification — Ranking, regression, and other ML tasks require adapted fairness definitions that account for their unique characteristics.

What's Next:

Page Complete

1 / 5