Machine LearningML Interpretability & Fairness

Fairness in Machine Learning

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

3 / 5

Disparate Impact

When Neutral Systems Produce Unequal Outcomes

In 1971, the Supreme Court's Griggs v. Duke Power Co. decision established a principle that would reshape employment law and eventually ML fairness: practices that appear neutral but produce discriminatory outcomes can constitute illegal discrimination.

Duke Power required high school diplomas and IQ test scores for certain positions. Neither requirement mentioned race. But Black applicants were disproportionately excluded because historic discrimination had limited their educational opportunities. The Court held this was unlawful disparate impact.

Five decades later, this doctrine applies directly to machine learning. A credit scoring algorithm that never explicitly uses race can still produce racially discriminatory outcomes—and face legal liability for it.

What You Will Learn

By the end of this page, you will master the formal measurement of disparate impact, understand the legal burden-shifting framework, learn to conduct rigorous disparate impact analysis on ML systems, and develop strategies for mitigation while maintaining model utility.

Defining and Measuring Disparate Impact

Disparate impact occurs when a facially neutral practice disproportionately affects members of a protected class. Unlike disparate treatment (intentional discrimination), disparate impact focuses on outcomes, not intent.

The Mathematical Definition:

Let Y be a binary favorable outcome (hired, approved, etc.) and A be a protected attribute with values {0, 1}. Disparate impact exists when:

$$\frac{P(Y=1|A=0)}{P(Y=1|A=1)} < \tau$$

Where τ is typically 0.8 (the 80% or "four-fifths" rule).

Disparate Impact Metrics
Metric	Formula	Threshold	Interpretation
Adverse Impact Ratio	min(rate) / max(rate)	≥ 0.8	Four-fifths rule compliance
Statistical Parity Difference	\|P(Y=1\|A=0) - P(Y=1\|A=1)\|	< 0.1	Rate difference between groups
Odds Ratio	[P(Y=1\|A=0)/(1-P(Y=1\|A=0))] / [same for A=1]	0.5-2.0	Relative odds comparison
Risk Ratio	P(Y=1\|A=0) / P(Y=1\|A=1)	0.8-1.25	Relative risk assessment

disparate_impact_metrics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import numpy as np
from typing import Dict, List, Tuple
 
def comprehensive_disparate_impact_analysis(
    outcomes: np.ndarray,
    protected_attr: np.ndarray,
    group_labels: Dict[int, str] = None
) -> Dict:
    """
    Comprehensive disparate impact analysis with multiple metrics.
    
    Args:
        outcomes: Binary outcomes (1 = favorable)
        protected_attr: Group membership (0 or 1)
        group_labels: Optional names for groups
    
    Returns:
        Dict containing all disparate impact metrics
    """
    outcomes = np.array(outcomes)
    protected_attr = np.array(protected_attr)
    
    if group_labels is None:
        group_labels = {0: "Group 0", 1: "Group 1"}
    
    # Selection rates by group
    rate_0 = outcomes[protected_attr == 0].mean()
    rate_1 = outcomes[protected_attr == 1].mean()
    
    # Adverse Impact Ratio (four-fifths rule)
    min_rate, max_rate = min(rate_0, rate_1), max(rate_0, rate_1)
    air = min_rate / max_rate if max_rate > 0 else 1.0
    
    # Statistical Parity Difference
    spd = abs(rate_0 - rate_1)
    
    # Odds Ratio
    odds_0 = rate_0 / (1 - rate_0) if rate_0 < 1 else float('inf')
    odds_1 = rate_1 / (1 - rate_1) if rate_1 < 1 else float('inf')
    odds_ratio = odds_0 / odds_1 if odds_1 > 0 else float('inf')
    
    # Risk Ratio
    risk_ratio = rate_0 / rate_1 if rate_1 > 0 else float('inf')
    
    # Determine advantaged/disadvantaged groups
    advantaged = group_labels[0] if rate_0 > rate_1 else group_labels[1]
    disadvantaged = group_labels[1] if rate_0 > rate_1 else group_labels[0]
    
    return {
        "selection_rates": {
            group_labels[0]: rate_0,
            group_labels[1]: rate_1,
        },
        "adverse_impact_ratio": air,
        "passes_four_fifths": air >= 0.8,
        "statistical_parity_difference": spd,
        "odds_ratio": odds_ratio,
        "risk_ratio": risk_ratio,
        "advantaged_group": advantaged,
        "disadvantaged_group": disadvantaged,
        "sample_sizes": {
            group_labels[0]: (protected_attr == 0).sum(),
            group_labels[1]: (protected_attr == 1).sum(),
        }
    }
 
 
def bootstrap_confidence_interval(
    outcomes: np.ndarray,
    protected_attr: np.ndarray,
    metric_func,
    n_bootstrap: int = 1000,
    confidence: float = 0.95
) -> Tuple[float, float, float]:
    """
    Bootstrap confidence interval for disparate impact metric.
    
    Returns:
        Tuple of (estimate, lower_bound, upper_bound)
    """
    n = len(outcomes)
    bootstrap_estimates = []
    
    for _ in range(n_bootstrap):
        indices = np.random.choice(n, size=n, replace=True)
        boot_outcomes = outcomes[indices]
        boot_protected = protected_attr[indices]
        
        estimate = metric_func(boot_outcomes, boot_protected)
        if np.isfinite(estimate):
            bootstrap_estimates.append(estimate)
    
    point_estimate = metric_func(outcomes, protected_attr)
    alpha = 1 - confidence
    lower = np.percentile(bootstrap_estimates, 100 * alpha / 2)
    upper = np.percentile(bootstrap_estimates, 100 * (1 - alpha / 2))
    
    return point_estimate, lower, upper

The Legal Burden-Shifting Framework

Understanding legal doctrine is essential for ML practitioners. The disparate impact framework involves a three-step burden-shifting process established in Griggs and refined in subsequent cases.

The Three-Step Framework:

Burden-Shifting Analysis

•Prima Facie Case (Plaintiff's Burden): Demonstrate statistically significant disparity in outcomes. Typically uses 80% rule as threshold, though statistical significance may also be required.
•Business Necessity (Defendant's Burden): Show the practice is job-related and consistent with business necessity. For ML, this means demonstrating the model's features and predictions are necessary for legitimate purposes.
•Less Discriminatory Alternative (Plaintiff's Rebuttal): Plaintiff can prevail by showing an alternative practice exists that serves the same business purpose with less discriminatory impact.

The 80% Rule Is Not a Safe Harbor

Passing the four-fifths rule doesn't guarantee legal compliance. Courts may find disparate impact even at higher ratios if statistical significance is demonstrated. Conversely, small sample sizes may fail to establish significance even with large disparities.

ML-Specific Considerations:

Feature selection: Each feature should be justifiable as necessary for prediction
Model complexity: Unnecessarily complex models may be challenged as not demonstrably necessary
Alternative models: Must consider whether simpler or different models could achieve comparable results with less impact
Threshold selection: Classification thresholds directly affect disparate impact and can be adjusted

Root Causes of Disparate Impact in ML

Disparate impact in ML systems arises from multiple sources. Understanding these causes is essential for effective mitigation.

Sources of Disparate Impact in ML Systems
Source	Mechanism	Example	Mitigation Approach
Historical Bias	Training data reflects past discrimination	Hiring data excludes historically rejected groups	Data augmentation, reweighting
Representation Bias	Groups underrepresented in training data	Facial recognition trained mostly on one ethnicity	Balanced data collection, oversampling
Measurement Bias	Proxy targets correlate with protected attributes	Using arrests (biased by policing) as crime proxy	Better outcome definitions, causal analysis
Feature Selection	Features proxy protected attributes	ZIP code correlating with race	Proxy removal, feature auditing
Label Bias	Human-generated labels embed stereotypes	Resume ratings reflecting gender bias	Label auditing, multi-rater calibration
Threshold Effects	Classification thresholds affect groups differently	Same threshold, different score distributions	Group-specific thresholds, score recalibration

Converting Mermaid diagram...

Conducting Rigorous Disparate Impact Analysis

A complete disparate impact analysis requires statistical rigor beyond simple ratio calculations. Here's a comprehensive framework:

rigorous_di_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import numpy as np
from scipy import stats
from typing import Dict, Tuple
 
class DisparateImpactAnalyzer:
    """Comprehensive disparate impact analysis framework."""
    
    def __init__(self, outcomes, protected_attr, 
                 group_names=None, alpha=0.05):
        self.outcomes = np.array(outcomes)
        self.protected = np.array(protected_attr)
        self.alpha = alpha
        self.groups = np.unique(self.protected)
        self.group_names = group_names or {g: f"Group_{g}" for g in self.groups}
        
    def selection_rates(self) -> Dict:
        """Calculate selection rates per group."""
        rates = {}
        for g in self.groups:
            mask = self.protected == g
            rates[self.group_names[g]] = {
                'rate': self.outcomes[mask].mean(),
                'n': mask.sum(),
                'selected': self.outcomes[mask].sum()
            }
        return rates
    
    def four_fifths_test(self) -> Dict:
        """Apply 80% rule across all group pairs."""
        rates = self.selection_rates()
        rate_values = [r['rate'] for r in rates.values()]
        
        min_rate = min(rate_values)
        max_rate = max(rate_values)
        ratio = min_rate / max_rate if max_rate > 0 else 1.0
        
        return {
            'adverse_impact_ratio': ratio,
            'passes_80_percent': ratio >= 0.8,
            'min_group': min(rates, key=lambda k: rates[k]['rate']),
            'max_group': max(rates, key=lambda k: rates[k]['rate'])
        }
    
    def chi_square_test(self) -> Dict:
        """Chi-square test for independence."""
        contingency = []
        for g in self.groups:
            mask = self.protected == g
            selected = self.outcomes[mask].sum()
            not_selected = mask.sum() - selected
            contingency.append([not_selected, selected])
        
        chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
        
        return {
            'chi2_statistic': chi2,
            'p_value': p_value,
            'significant': p_value < self.alpha,
            'degrees_of_freedom': dof
        }
    
    def fisher_exact_test(self) -> Dict:
        """Fisher's exact test for 2x2 tables."""
        if len(self.groups) != 2:
            return {'error': 'Fisher exact test requires exactly 2 groups'}
        
        contingency = []
        for g in self.groups:
            mask = self.protected == g
            selected = self.outcomes[mask].sum()
            not_selected = mask.sum() - selected
            contingency.append([int(not_selected), int(selected)])
        
        odds_ratio, p_value = stats.fisher_exact(contingency)
        
        return {
            'odds_ratio': odds_ratio,
            'p_value': p_value,
            'significant': p_value < self.alpha
        }
    
    def z_test_proportions(self) -> Dict:
        """Two-proportion z-test."""
        if len(self.groups) != 2:
            return {'error': 'Z-test requires exactly 2 groups'}
        
        n1 = (self.protected == self.groups[0]).sum()
        n2 = (self.protected == self.groups[1]).sum()
        p1 = self.outcomes[self.protected == self.groups[0]].mean()
        p2 = self.outcomes[self.protected == self.groups[1]].mean()
        
        # Pooled proportion
        p_pool = (p1 * n1 + p2 * n2) / (n1 + n2)
        
        # Standard error
        se = np.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
        
        # Z-statistic
        z = (p1 - p2) / se if se > 0 else 0
        p_value = 2 * (1 - stats.norm.cdf(abs(z)))
        
        return {
            'z_statistic': z,
            'p_value': p_value,
            'significant': p_value < self.alpha,
            'rate_difference': p1 - p2
        }
    
    def full_report(self) -> Dict:
        """Generate comprehensive disparate impact report."""
        return {
            'selection_rates': self.selection_rates(),
            'four_fifths_rule': self.four_fifths_test(),
            'chi_square': self.chi_square_test(),
            'fisher_exact': self.fisher_exact_test(),
            'z_test': self.z_test_proportions(),
            'recommendation': self._generate_recommendation()
        }
    
    def _generate_recommendation(self) -> str:
        ff = self.four_fifths_test()
        chi = self.chi_square_test()
        
        if not ff['passes_80_percent'] and chi['significant']:
            return "INVESTIGATE: Fails 80% rule AND statistically significant"
        elif not ff['passes_80_percent']:
            return "CAUTION: Fails 80% rule but not statistically significant"
        elif chi['significant']:
            return "MONITOR: Passes 80% rule but disparity is significant"
        else:
            return "OK: Passes 80% rule, no significant disparity"
 
 
# Example usage
if __name__ == "__main__":
    np.random.seed(42)
    n = 1000
    
    protected = np.random.binomial(1, 0.4, n)
    # Biased outcomes
    prob = 0.5 - 0.15 * protected
    outcomes = np.random.binomial(1, prob)
    
    analyzer = DisparateImpactAnalyzer(
        outcomes, protected,
        group_names={0: 'Majority', 1: 'Minority'}
    )
    
    report = analyzer.full_report()
    print("Selection Rates:", report['selection_rates'])
    print("Four-Fifths:", report['four_fifths_rule'])
    print("Chi-Square:", report['chi_square'])
    print("Recommendation:", report['recommendation'])

Mitigation Strategies

When disparate impact is detected, multiple intervention points exist. These strategies fall into three categories based on when they're applied:

•Reweighting: Assign higher weights to underrepresented or disadvantaged groups to balance their influence during training.
•Resampling: Oversample minority groups or undersample majority groups to achieve balanced representation.
•Disparate Impact Remover: Transform features to remove correlation with protected attributes while preserving rank-ordering within groups.
•Learning Fair Representations: Learn latent representations that are independent of protected attributes but preserve predictive information.

Choosing Mitigation Strategy

Pre-processing is model-agnostic but may lose information. In-processing offers tighter integration but requires model modification. Post-processing is simplest to implement but may reduce calibration. Consider the fairness-accuracy tradeoff and your specific constraints.

Summary: Addressing Disparate Impact

Key Takeaways

•Disparate impact focuses on outcomes — Intent doesn't matter; disproportionate effects on protected groups create liability.
•Multiple metrics exist — The 80% rule is a starting point; statistical significance testing adds rigor.
•Legal framework involves burden-shifting — Business necessity can justify impact, but less discriminatory alternatives must be considered.
•Root causes are diverse — Historical bias, proxy features, threshold effects, and label bias all contribute.
•Mitigation operates at three stages — Pre-processing, in-processing, and post-processing each offer different tradeoffs.

Page Complete

You now understand disparate impact analysis comprehensively. The next page explores Equality of Opportunity—a specific fairness criterion that balances error rates across groups while allowing for different base rates.

3 / 5

Loading learning content...

Machine LearningML Interpretability & Fairness

Fairness in Machine Learning

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

3 / 5

Disparate Impact

When Neutral Systems Produce Unequal Outcomes

What You Will Learn

Defining and Measuring Disparate Impact

The Mathematical Definition:

Let Y be a binary favorable outcome (hired, approved, etc.) and A be a protected attribute with values {0, 1}. Disparate impact exists when:

$$\frac{P(Y=1|A=0)}{P(Y=1|A=1)} < \tau$$

Where τ is typically 0.8 (the 80% or "four-fifths" rule).

Disparate Impact Metrics
Metric	Formula	Threshold	Interpretation
Adverse Impact Ratio	min(rate) / max(rate)	≥ 0.8	Four-fifths rule compliance
Statistical Parity Difference	\|P(Y=1\|A=0) - P(Y=1\|A=1)\|	< 0.1	Rate difference between groups
Odds Ratio	[P(Y=1\|A=0)/(1-P(Y=1\|A=0))] / [same for A=1]	0.5-2.0	Relative odds comparison
Risk Ratio	P(Y=1\|A=0) / P(Y=1\|A=1)	0.8-1.25	Relative risk assessment

disparate_impact_metrics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import numpy as np
from typing import Dict, List, Tuple
 
def comprehensive_disparate_impact_analysis(
    outcomes: np.ndarray,
    protected_attr: np.ndarray,
    group_labels: Dict[int, str] = None
) -> Dict:
    """
    Comprehensive disparate impact analysis with multiple metrics.
    
    Args:
        outcomes: Binary outcomes (1 = favorable)
        protected_attr: Group membership (0 or 1)
        group_labels: Optional names for groups
    
    Returns:
        Dict containing all disparate impact metrics
    """
    outcomes = np.array(outcomes)
    protected_attr = np.array(protected_attr)
    
    if group_labels is None:
        group_labels = {0: "Group 0", 1: "Group 1"}
    
    # Selection rates by group
    rate_0 = outcomes[protected_attr == 0].mean()
    rate_1 = outcomes[protected_attr == 1].mean()
    
    # Adverse Impact Ratio (four-fifths rule)
    min_rate, max_rate = min(rate_0, rate_1), max(rate_0, rate_1)
    air = min_rate / max_rate if max_rate > 0 else 1.0
    
    # Statistical Parity Difference
    spd = abs(rate_0 - rate_1)
    
    # Odds Ratio
    odds_0 = rate_0 / (1 - rate_0) if rate_0 < 1 else float('inf')
    odds_1 = rate_1 / (1 - rate_1) if rate_1 < 1 else float('inf')
    odds_ratio = odds_0 / odds_1 if odds_1 > 0 else float('inf')
    
    # Risk Ratio
    risk_ratio = rate_0 / rate_1 if rate_1 > 0 else float('inf')
    
    # Determine advantaged/disadvantaged groups
    advantaged = group_labels[0] if rate_0 > rate_1 else group_labels[1]
    disadvantaged = group_labels[1] if rate_0 > rate_1 else group_labels[0]
    
    return {
        "selection_rates": {
            group_labels[0]: rate_0,
            group_labels[1]: rate_1,
        },
        "adverse_impact_ratio": air,
        "passes_four_fifths": air >= 0.8,
        "statistical_parity_difference": spd,
        "odds_ratio": odds_ratio,
        "risk_ratio": risk_ratio,
        "advantaged_group": advantaged,
        "disadvantaged_group": disadvantaged,
        "sample_sizes": {
            group_labels[0]: (protected_attr == 0).sum(),
            group_labels[1]: (protected_attr == 1).sum(),
        }
    }
 
 
def bootstrap_confidence_interval(
    outcomes: np.ndarray,
    protected_attr: np.ndarray,
    metric_func,
    n_bootstrap: int = 1000,
    confidence: float = 0.95
) -> Tuple[float, float, float]:
    """
    Bootstrap confidence interval for disparate impact metric.
    
    Returns:
        Tuple of (estimate, lower_bound, upper_bound)
    """
    n = len(outcomes)
    bootstrap_estimates = []
    
    for _ in range(n_bootstrap):
        indices = np.random.choice(n, size=n, replace=True)
        boot_outcomes = outcomes[indices]
        boot_protected = protected_attr[indices]
        
        estimate = metric_func(boot_outcomes, boot_protected)
        if np.isfinite(estimate):
            bootstrap_estimates.append(estimate)
    
    point_estimate = metric_func(outcomes, protected_attr)
    alpha = 1 - confidence
    lower = np.percentile(bootstrap_estimates, 100 * alpha / 2)
    upper = np.percentile(bootstrap_estimates, 100 * (1 - alpha / 2))
    
    return point_estimate, lower, upper

The Legal Burden-Shifting Framework

Understanding legal doctrine is essential for ML practitioners. The disparate impact framework involves a three-step burden-shifting process established in Griggs and refined in subsequent cases.

The Three-Step Framework:

Burden-Shifting Analysis

•Prima Facie Case (Plaintiff's Burden): Demonstrate statistically significant disparity in outcomes. Typically uses 80% rule as threshold, though statistical significance may also be required.
•Business Necessity (Defendant's Burden): Show the practice is job-related and consistent with business necessity. For ML, this means demonstrating the model's features and predictions are necessary for legitimate purposes.
•Less Discriminatory Alternative (Plaintiff's Rebuttal): Plaintiff can prevail by showing an alternative practice exists that serves the same business purpose with less discriminatory impact.

The 80% Rule Is Not a Safe Harbor

ML-Specific Considerations:

Feature selection: Each feature should be justifiable as necessary for prediction
Model complexity: Unnecessarily complex models may be challenged as not demonstrably necessary
Alternative models: Must consider whether simpler or different models could achieve comparable results with less impact
Threshold selection: Classification thresholds directly affect disparate impact and can be adjusted

Root Causes of Disparate Impact in ML

Disparate impact in ML systems arises from multiple sources. Understanding these causes is essential for effective mitigation.

Sources of Disparate Impact in ML Systems
Source	Mechanism	Example	Mitigation Approach
Historical Bias	Training data reflects past discrimination	Hiring data excludes historically rejected groups	Data augmentation, reweighting
Representation Bias	Groups underrepresented in training data	Facial recognition trained mostly on one ethnicity	Balanced data collection, oversampling
Measurement Bias	Proxy targets correlate with protected attributes	Using arrests (biased by policing) as crime proxy	Better outcome definitions, causal analysis
Feature Selection	Features proxy protected attributes	ZIP code correlating with race	Proxy removal, feature auditing
Label Bias	Human-generated labels embed stereotypes	Resume ratings reflecting gender bias	Label auditing, multi-rater calibration
Threshold Effects	Classification thresholds affect groups differently	Same threshold, different score distributions	Group-specific thresholds, score recalibration

Converting Mermaid diagram...

Conducting Rigorous Disparate Impact Analysis

A complete disparate impact analysis requires statistical rigor beyond simple ratio calculations. Here's a comprehensive framework:

rigorous_di_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import numpy as np
from scipy import stats
from typing import Dict, Tuple
 
class DisparateImpactAnalyzer:
    """Comprehensive disparate impact analysis framework."""
    
    def __init__(self, outcomes, protected_attr, 
                 group_names=None, alpha=0.05):
        self.outcomes = np.array(outcomes)
        self.protected = np.array(protected_attr)
        self.alpha = alpha
        self.groups = np.unique(self.protected)
        self.group_names = group_names or {g: f"Group_{g}" for g in self.groups}
        
    def selection_rates(self) -> Dict:
        """Calculate selection rates per group."""
        rates = {}
        for g in self.groups:
            mask = self.protected == g
            rates[self.group_names[g]] = {
                'rate': self.outcomes[mask].mean(),
                'n': mask.sum(),
                'selected': self.outcomes[mask].sum()
            }
        return rates
    
    def four_fifths_test(self) -> Dict:
        """Apply 80% rule across all group pairs."""
        rates = self.selection_rates()
        rate_values = [r['rate'] for r in rates.values()]
        
        min_rate = min(rate_values)
        max_rate = max(rate_values)
        ratio = min_rate / max_rate if max_rate > 0 else 1.0
        
        return {
            'adverse_impact_ratio': ratio,
            'passes_80_percent': ratio >= 0.8,
            'min_group': min(rates, key=lambda k: rates[k]['rate']),
            'max_group': max(rates, key=lambda k: rates[k]['rate'])
        }
    
    def chi_square_test(self) -> Dict:
        """Chi-square test for independence."""
        contingency = []
        for g in self.groups:
            mask = self.protected == g
            selected = self.outcomes[mask].sum()
            not_selected = mask.sum() - selected
            contingency.append([not_selected, selected])
        
        chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
        
        return {
            'chi2_statistic': chi2,
            'p_value': p_value,
            'significant': p_value < self.alpha,
            'degrees_of_freedom': dof
        }
    
    def fisher_exact_test(self) -> Dict:
        """Fisher's exact test for 2x2 tables."""
        if len(self.groups) != 2:
            return {'error': 'Fisher exact test requires exactly 2 groups'}
        
        contingency = []
        for g in self.groups:
            mask = self.protected == g
            selected = self.outcomes[mask].sum()
            not_selected = mask.sum() - selected
            contingency.append([int(not_selected), int(selected)])
        
        odds_ratio, p_value = stats.fisher_exact(contingency)
        
        return {
            'odds_ratio': odds_ratio,
            'p_value': p_value,
            'significant': p_value < self.alpha
        }
    
    def z_test_proportions(self) -> Dict:
        """Two-proportion z-test."""
        if len(self.groups) != 2:
            return {'error': 'Z-test requires exactly 2 groups'}
        
        n1 = (self.protected == self.groups[0]).sum()
        n2 = (self.protected == self.groups[1]).sum()
        p1 = self.outcomes[self.protected == self.groups[0]].mean()
        p2 = self.outcomes[self.protected == self.groups[1]].mean()
        
        # Pooled proportion
        p_pool = (p1 * n1 + p2 * n2) / (n1 + n2)
        
        # Standard error
        se = np.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
        
        # Z-statistic
        z = (p1 - p2) / se if se > 0 else 0
        p_value = 2 * (1 - stats.norm.cdf(abs(z)))
        
        return {
            'z_statistic': z,
            'p_value': p_value,
            'significant': p_value < self.alpha,
            'rate_difference': p1 - p2
        }
    
    def full_report(self) -> Dict:
        """Generate comprehensive disparate impact report."""
        return {
            'selection_rates': self.selection_rates(),
            'four_fifths_rule': self.four_fifths_test(),
            'chi_square': self.chi_square_test(),
            'fisher_exact': self.fisher_exact_test(),
            'z_test': self.z_test_proportions(),
            'recommendation': self._generate_recommendation()
        }
    
    def _generate_recommendation(self) -> str:
        ff = self.four_fifths_test()
        chi = self.chi_square_test()
        
        if not ff['passes_80_percent'] and chi['significant']:
            return "INVESTIGATE: Fails 80% rule AND statistically significant"
        elif not ff['passes_80_percent']:
            return "CAUTION: Fails 80% rule but not statistically significant"
        elif chi['significant']:
            return "MONITOR: Passes 80% rule but disparity is significant"
        else:
            return "OK: Passes 80% rule, no significant disparity"
 
 
# Example usage
if __name__ == "__main__":
    np.random.seed(42)
    n = 1000
    
    protected = np.random.binomial(1, 0.4, n)
    # Biased outcomes
    prob = 0.5 - 0.15 * protected
    outcomes = np.random.binomial(1, prob)
    
    analyzer = DisparateImpactAnalyzer(
        outcomes, protected,
        group_names={0: 'Majority', 1: 'Minority'}
    )
    
    report = analyzer.full_report()
    print("Selection Rates:", report['selection_rates'])
    print("Four-Fifths:", report['four_fifths_rule'])
    print("Chi-Square:", report['chi_square'])
    print("Recommendation:", report['recommendation'])

Mitigation Strategies

When disparate impact is detected, multiple intervention points exist. These strategies fall into three categories based on when they're applied:

•Reweighting: Assign higher weights to underrepresented or disadvantaged groups to balance their influence during training.
•Resampling: Oversample minority groups or undersample majority groups to achieve balanced representation.
•Disparate Impact Remover: Transform features to remove correlation with protected attributes while preserving rank-ordering within groups.
•Learning Fair Representations: Learn latent representations that are independent of protected attributes but preserve predictive information.

Choosing Mitigation Strategy

Summary: Addressing Disparate Impact

Key Takeaways

•Disparate impact focuses on outcomes — Intent doesn't matter; disproportionate effects on protected groups create liability.
•Multiple metrics exist — The 80% rule is a starting point; statistical significance testing adds rigor.
•Legal framework involves burden-shifting — Business necessity can justify impact, but less discriminatory alternatives must be considered.
•Root causes are diverse — Historical bias, proxy features, threshold effects, and label bias all contribute.
•Mitigation operates at three stages — Pre-processing, in-processing, and post-processing each offer different tradeoffs.

Page Complete

3 / 5