Machine LearningML Interpretability & Fairness

Fairness in Machine Learning

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

2 / 5

Protected Attributes

The Attributes That Demand Protection

In 2019, Apple's credit card made headlines when its algorithm offered significantly lower credit limits to women than to men with similar financial profiles. The system never explicitly used gender as an input—yet the outcomes were starkly gendered. This case illustrates a fundamental truth about fair ML: discrimination can occur without explicit use of protected attributes.

Protected attributes—characteristics like race, gender, age, and disability status—occupy a unique position in machine learning. They represent attributes that, by law or ethical principle, should not adversely influence algorithmic decisions. Yet their influence often persists through proxies, historical data, and complex feature interactions.

Understanding protected attributes is essential for any ML practitioner. It's not enough to remove these variables from your model and declare the problem solved. As we'll see, the challenges run much deeper.

What You Will Learn

By the end of this page, you will understand the legal and ethical foundations of protected attributes, master the technical challenges of proxy discrimination and feature correlations, and learn strategies for responsible handling of sensitive data in ML systems.

Legal Foundations: What the Law Protects

Protected attributes have their origins in civil rights legislation designed to combat discrimination. While ML practitioners aren't lawyers, understanding the legal landscape is essential because:

Legal liability: Your models may be subject to anti-discrimination audits
Regulatory compliance: Many industries have specific fairness requirements
Ethical grounding: Laws reflect societal consensus on what deserves protection

Key U.S. Legislative Frameworks:

Major U.S. Anti-Discrimination Laws Relevant to ML
Legislation	Year	Protected Attributes	Domains
Civil Rights Act (Title VII)	1964	Race, color, religion, sex, national origin	Employment
Fair Housing Act	1968	Race, color, religion, national origin, sex, familial status, disability	Housing
Equal Credit Opportunity Act	1974	Race, color, religion, national origin, sex, marital status, age, receipt of public assistance	Credit
Age Discrimination in Employment Act	1967	Age (40+)	Employment
Americans with Disabilities Act	1990	Disability	Employment, public accommodation
Genetic Information Nondiscrimination Act	2008	Genetic information	Employment, health insurance

International Perspectives:

Protected attributes vary significantly across jurisdictions:

European Union (GDPR Article 9): Explicitly defines 'special categories' of personal data including racial/ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, health data, and sexual orientation.
Canada (Canadian Human Rights Act): Protects race, national/ethnic origin, color, religion, age, sex, sexual orientation, gender identity, marital status, family status, genetic characteristics, disability, and pardoned conviction.
UK (Equality Act 2010): Nine protected characteristics—age, disability, gender reassignment, marriage/civil partnership, pregnancy/maternity, race, religion/belief, sex, and sexual orientation.

Domain-Specific Considerations:

Industry-Specific Protected Attribute Concerns

•Healthcare: HIPAA adds health status as effectively protected; medical AI must not discriminate based on diagnosis history, genetic markers, or disability.
•Insurance: Varies dramatically by type—health insurance cannot use pre-existing conditions (ACA), but car insurance can use driving record despite disparate impact on groups with different mobility needs.
•Employment: Beyond protected classes, many jurisdictions protect employment history, salary history (to reduce perpetuation of gender pay gaps), and criminal records (ban-the-box laws).
•Criminal Justice: Especially sensitive—race must never influence sentencing, yet recidivism prediction tools have shown racial disparities.
•Education: Affirmative action cases have created complex legal landscape around race-conscious admissions.

Context Matters

An attribute that's protected in one domain may be legitimate in another. Age is protected in employment but may be relevant for medical diagnosis. Gender is protected in hiring but may be medically relevant for certain conditions. Always consider the specific legal and ethical context of your application.

The Proxy Problem: Hidden Discrimination

Perhaps the most insidious challenge with protected attributes is proxy discrimination—where ostensibly neutral features serve as stand-ins for protected characteristics.

What Makes a Proxy?

A feature becomes a proxy when it's:

Correlated with a protected attribute
Used in the model's decision-making
Not independently justified as necessary for the prediction task

Classic Examples of Proxy Variables:

Common Proxy Variables and Their Protected Correlates
Feature	Proxies For	Mechanism	Example Domain
ZIP Code	Race, income, education	Residential segregation patterns	Credit, insurance, hiring
First Name	Race, ethnicity, gender	Cultural naming patterns	Resume screening
University Attended	Socioeconomic status, race	Unequal educational access	Hiring
Browsing History	Age, gender, interests	Behavioral patterns	Ad targeting
Arrest History	Race	Disparate policing	Criminal justice
Height/Weight	Gender	Biological differences	Healthcare, insurance
Commute Time	Race, income	Housing patterns, transportation access	Employment
Writing Style	Education, culture	Language exposure patterns	Education, hiring

The ZIP Code Illusion

Using ZIP codes in U.S. contexts is particularly dangerous. Due to historical redlining and ongoing segregation, ZIP code correlates strongly with race in many cities. A model using ZIP code for credit decisions may achieve 'race-blind' implementation while producing racially discriminatory outcomes.

Detecting Proxy Relationships:

Identifying proxies requires systematic analysis. Here are key techniques:

proxy_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
def analyze_proxy_relationships(X, protected_attr, feature_names):
    """
    Analyze correlations between features and protected attributes
    to identify potential proxies.
    
    Args:
        X: Feature matrix (n_samples, n_features)
        protected_attr: Binary protected attribute array
        feature_names: List of feature names
    
    Returns:
        DataFrame with proxy analysis results
    """
    import numpy as np
    import pandas as pd
    from scipy import stats
    
    results = []
    
    for i, name in enumerate(feature_names):
        feature = X[:, i]
        
        # For continuous features: point-biserial correlation
        if len(np.unique(feature)) > 10:
            corr, p_value = stats.pointbiserialr(protected_attr, feature)
            test_type = "point-biserial"
        # For categorical features: chi-square test
        else:
            contingency = pd.crosstab(feature, protected_attr)
            chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
            # Cramér's V for effect size
            n = len(feature)
            corr = np.sqrt(chi2 / (n * (min(contingency.shape) - 1)))
            test_type = "cramers_v"
        
        results.append({
            'feature': name,
            'correlation': abs(corr),
            'p_value': p_value,
            'test_type': test_type,
            'proxy_risk': 'HIGH' if abs(corr) > 0.3 else 
                          'MEDIUM' if abs(corr) > 0.15 else 'LOW'
        })
    
    return pd.DataFrame(results).sort_values('correlation', ascending=False)
 
 
def mutual_information_proxy_analysis(X, protected_attr, feature_names):
    """
    Use mutual information to detect nonlinear proxy relationships.
    
    MI captures relationships that linear correlation might miss.
    """
    import numpy as np
    from sklearn.feature_selection import mutual_info_classif
    
    # Reshape protected attribute
    protected_attr = np.array(protected_attr).ravel()
    
    # Calculate MI between each feature and protected attribute
    mi_scores = mutual_info_classif(
        X, protected_attr, 
        discrete_features='auto',
        random_state=42
    )
    
    results = []
    for name, mi in zip(feature_names, mi_scores):
        results.append({
            'feature': name,
            'mutual_information': mi,
            'proxy_risk': 'HIGH' if mi > 0.2 else 
                          'MEDIUM' if mi > 0.1 else 'LOW'
        })
    
    return sorted(results, key=lambda x: x['mutual_information'], reverse=True)
 
 
def feature_importance_by_group(X, y, protected_attr, feature_names, model):
    """
    Compare feature importances across groups.
    
    If a feature is much more important for predicting outcomes
    in one group, it may be acting as a proxy.
    """
    import numpy as np
    from sklearn.inspection import permutation_importance
    
    protected_attr = np.array(protected_attr)
    
    # Separate groups
    group_0_mask = protected_attr == 0
    group_1_mask = protected_attr == 1
    
    # Calculate permutation importance for each group
    result = {}
    for group_name, mask in [('group_0', group_0_mask), ('group_1', group_1_mask)]:
        if mask.sum() < 10:
            continue
            
        perm_importance = permutation_importance(
            model, X[mask], y[mask], 
            n_repeats=10, random_state=42
        )
        
        result[group_name] = {
            name: imp for name, imp in 
            zip(feature_names, perm_importance.importances_mean)
        }
    
    # Identify features with divergent importance
    divergent_features = []
    for name in feature_names:
        imp_0 = result.get('group_0', {}).get(name, 0)
        imp_1 = result.get('group_1', {}).get(name, 0)
        
        if max(imp_0, imp_1) > 0.01:  # Only consider meaningful features
            ratio = max(imp_0, imp_1) / (min(imp_0, imp_1) + 1e-10)
            if ratio > 2:  # 2x difference threshold
                divergent_features.append({
                    'feature': name,
                    'importance_group_0': imp_0,
                    'importance_group_1': imp_1,
                    'ratio': ratio
                })
    
    return sorted(divergent_features, key=lambda x: x['ratio'], reverse=True)
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    n = 1000
    
    # Simulated protected attribute (e.g., race)
    protected = np.random.binomial(1, 0.3, n)
    
    # Features with varying proxy relationships
    feature_names = ['income', 'zip_code', 'education', 'experience', 'random']
    
    # Create features with different correlation levels
    X = np.column_stack([
        np.random.normal(50000, 15000, n) + protected * 10000,  # income: medium correlation
        np.random.choice([10001, 10002, 10003, 90210, 90211], n, 
                        p=[0.3-0.1*protected.mean(), 0.3-0.1*protected.mean(), 
                           0.1, 0.15+0.1*protected.mean(), 0.15+0.1*protected.mean()]),  # zip: high correlation
        protected * 2 + np.random.normal(14, 2, n),  # education: high correlation
        np.random.uniform(0, 20, n),  # experience: no correlation
        np.random.randn(n)  # random: no correlation
    ])
    
    results = analyze_proxy_relationships(X, protected, feature_names)
    print("Proxy Analysis Results:")
    print(results.to_string(index=False))

Intersectionality: Compounded Discrimination

Intersectionality, a term coined by legal scholar Kimberlé Crenshaw, refers to how multiple protected attributes interact to create unique experiences of discrimination that cannot be understood by examining each attribute in isolation.

The Classic Example:

In the 1976 case DeGraffenreid v. General Motors, Black women sued GM for discrimination. The court found:

GM hired Black men (maintenance jobs) → no race discrimination
GM hired white women (secretarial jobs) → no sex discrimination

But GM hired no Black women at all. The intersection of race AND gender created a unique pattern invisible when analyzing each attribute separately.

Implications for ML Fairness:

Why Single-Axis Analysis Fails

•Simpson's Paradox: A model may appear fair for men and fair for women, but deeply unfair for Black women specifically.
•Small Subgroup Problem: Intersectional groups (e.g., elderly disabled women of color) may have small sample sizes, making bias harder to detect statistically.
•Compounded Bias: Biases from multiple attributes can multiply rather than add—discrimination against older workers PLUS discrimination against women may hit older women harder than the sum would suggest.
•Proxy Interactions: A feature might be an innocuous predictor for most groups but a proxy specifically for an intersectional subgroup.

intersectional_fairness.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def intersectional_fairness_analysis(y_true, y_pred, protected_attrs):
    """
    Analyze fairness across intersectional subgroups.
    
    Args:
        y_true: Ground truth labels
        y_pred: Predictions
        protected_attrs: Dict mapping attribute names to arrays
                        e.g., {'race': [...], 'gender': [...]}
    
    Returns:
        DataFrame with metrics for all intersectional groups
    """
    import numpy as np
    import pandas as pd
    from itertools import product
    from sklearn.metrics import accuracy_score, precision_score, recall_score
    
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    # Create intersectional groups
    attr_names = list(protected_attrs.keys())
    attr_values = [np.unique(protected_attrs[name]) for name in attr_names]
    
    results = []
    
    # Overall metrics
    results.append({
        'group': 'OVERALL',
        'n': len(y_true),
        'base_rate': y_true.mean(),
        'positive_rate': y_pred.mean(),
        'accuracy': accuracy_score(y_true, y_pred),
        'tpr': recall_score(y_true, y_pred, zero_division=0),
        'fpr': (y_pred[y_true == 0] == 1).mean() if (y_true == 0).sum() > 0 else 0,
    })
    
    # Single-attribute groups
    for attr_name, attrs in protected_attrs.items():
        attrs = np.array(attrs)
        for val in np.unique(attrs):
            mask = attrs == val
            if mask.sum() < 10:
                continue
                
            y_t, y_p = y_true[mask], y_pred[mask]
            results.append({
                'group': f'{attr_name}={val}',
                'n': mask.sum(),
                'base_rate': y_t.mean(),
                'positive_rate': y_p.mean(),
                'accuracy': accuracy_score(y_t, y_p),
                'tpr': recall_score(y_t, y_p, zero_division=0),
                'fpr': (y_p[y_t == 0] == 1).mean() if (y_t == 0).sum() > 0 else 0,
            })
    
    # Intersectional groups
    for combo in product(*attr_values):
        mask = np.ones(len(y_true), dtype=bool)
        group_name_parts = []
        
        for attr_name, val in zip(attr_names, combo):
            attrs = np.array(protected_attrs[attr_name])
            mask &= (attrs == val)
            group_name_parts.append(f'{attr_name}={val}')
        
        if mask.sum() < 10:  # Skip very small groups
            continue
        
        group_name = ' & '.join(group_name_parts)
        y_t, y_p = y_true[mask], y_pred[mask]
        
        results.append({
            'group': group_name,
            'n': mask.sum(),
            'base_rate': y_t.mean(),
            'positive_rate': y_p.mean(),
            'accuracy': accuracy_score(y_t, y_p),
            'tpr': recall_score(y_t, y_p, zero_division=0),
            'fpr': (y_p[y_t == 0] == 1).mean() if (y_t == 0).sum() > 0 else 0,
        })
    
    df = pd.DataFrame(results)
    
    # Calculate disparities
    overall_tpr = df[df['group'] == 'OVERALL']['tpr'].values[0]
    overall_fpr = df[df['group'] == 'OVERALL']['fpr'].values[0]
    
    df['tpr_disparity'] = df['tpr'] - overall_tpr
    df['fpr_disparity'] = df['fpr'] - overall_fpr
    
    return df.sort_values('tpr_disparity')
 
 
def detect_intersectional_bias(df):
    """
    Identify intersectional groups experiencing disproportionate harm.
    
    Flags groups where bias is worse than would be expected from
    single-attribute analysis.
    """
    results = []
    
    intersectional_groups = df[df['group'].str.contains(' & ')]
    single_groups = df[~df['group'].str.contains(' & ') & (df['group'] != 'OVERALL')]
    
    for _, row in intersectional_groups.iterrows():
        group_name = row['group']
        parts = group_name.split(' & ')
        
        # Get single-attribute disparities
        expected_tpr_disparity = 0
        expected_fpr_disparity = 0
        
        for part in parts:
            part_row = single_groups[single_groups['group'] == part]
            if len(part_row) > 0:
                expected_tpr_disparity += part_row['tpr_disparity'].values[0]
                expected_fpr_disparity += part_row['fpr_disparity'].values[0]
        
        # Compare actual vs expected (additive assumption)
        actual_tpr_disparity = row['tpr_disparity']
        actual_fpr_disparity = row['fpr_disparity']
        
        # Flag if actual is worse than expected
        tpr_excess = expected_tpr_disparity - actual_tpr_disparity
        fpr_excess = actual_fpr_disparity - expected_fpr_disparity
        
        if tpr_excess > 0.05 or fpr_excess > 0.05:
            results.append({
                'group': group_name,
                'n': row['n'],
                'actual_tpr_disparity': actual_tpr_disparity,
                'expected_tpr_disparity': expected_tpr_disparity,
                'tpr_excess_harm': tpr_excess,
                'actual_fpr_disparity': actual_fpr_disparity,
                'expected_fpr_disparity': expected_fpr_disparity,
                'fpr_excess_harm': fpr_excess,
            })
    
    return pd.DataFrame(results).sort_values('tpr_excess_harm', ascending=False)
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    import pandas as pd
    
    np.random.seed(42)
    n = 2000
    
    # Protected attributes
    race = np.random.choice(['white', 'black', 'asian'], n, p=[0.6, 0.25, 0.15])
    gender = np.random.choice(['male', 'female'], n, p=[0.5, 0.5])
    
    # Ground truth (with intersectional structure)
    y_true = np.random.binomial(1, 0.3, n)
    
    # Biased predictions: extra bias against Black women
    y_pred = y_true.copy()
    black_female_mask = (race == 'black') & (gender == 'female')
    # Add extra false negatives for Black women
    flip_mask = black_female_mask & (y_true == 1) & (np.random.rand(n) < 0.3)
    y_pred[flip_mask] = 0
    
    protected_attrs = {'race': race, 'gender': gender}
    
    df = intersectional_fairness_analysis(y_true, y_pred, protected_attrs)
    print("Intersectional Fairness Analysis:")
    print(df[['group', 'n', 'tpr', 'fpr', 'tpr_disparity']].to_string(index=False))
    
    bias_df = detect_intersectional_bias(df)
    if len(bias_df) > 0:
        print("\nIntersectional Bias Detected:")
        print(bias_df.to_string(index=False))

Strategies for Handling Protected Attributes

The question of how to handle protected attributes in ML systems doesn't have a single correct answer. Different strategies reflect different ethical and legal philosophies.

The Four Main Approaches:

Fairness Through Unawareness (FTU) simply removes protected attributes from the feature set, ensuring the model never 'sees' them.

The Approach:

Remove race, gender, age, etc. from training data
Train model on remaining features
Assume that ignoring protected attributes ensures fairness

Why It's Appealing:

Simple to implement
Legally defensible ('we don't use race/gender')
Aligns with 'colorblind' philosophy

Why It Fails:

•Proxies remain: ZIP code, name, and other features encode protected attributes. Model learns discrimination through backdoors.
•No accountability: Without tracking protected attributes, you cannot audit for disparate impact.
•Perpetuates historical bias: If training data reflects past discrimination, model reproduces it even without explicit access to protected attributes.
•Ignores legitimate uses: Sometimes protected attributes are needed for fairness (e.g., detecting discrimination, affirmative measures).

Not Recommended as Primary Strategy

FTU is widely considered insufficient by itself. Removing protected attributes without auditing for proxy effects is like closing your eyes and hoping discrimination doesn't occur.

Data Collection: Ethical Considerations

Collecting protected attribute data creates a dilemma: you need it to audit for fairness, but collecting it raises privacy, consent, and potential misuse concerns.

Key Questions for Data Collection:

Responsible Data Collection Principles

•Purpose limitation: Collect protected attributes only for fairness auditing, not for prediction or targeting. Document this purpose clearly.
•Informed consent: Individuals should understand why their demographic data is being collected and how it will be used. Optional self-identification is preferable.
•Data minimization: Collect only the attributes needed. Prefer coarse categories (e.g., 'under 40' vs exact age) when sufficient.
•Access controls: Restrict access to protected attribute data to those performing fairness audits. Separate from operational systems.
•Aggregation: Report fairness metrics only in aggregate (e.g., 'FPR for group A = X%'), never expose individual protected attributes in results.
•Retention limits: Delete or anonymize protected attribute data when no longer needed for auditing.

The Collection Paradox

Not collecting protected attributes doesn't prevent discrimination—it just prevents you from knowing about it. Responsible collection for auditing purposes typically outweighs the risks, especially in high-stakes domains. But collection must come with strong use restrictions.

When Protected Data Isn't Available:

In many real-world scenarios, protected attribute data isn't available. Strategies include:

Bayesian Improved Surname Geocoding (BISG): Probabilistically infer race/ethnicity from surname and location for auditing purposes
Name-based inference: First names correlate with gender and sometimes ethnicity (fraught with issues but used in some audits)
Ecological inference: Use aggregate demographic data at geographic level to estimate protected attribute distributions
Surveys and opt-in collection: Reach out to users to voluntarily provide demographic data for research purposes

Caution: Imputed demographics can be inaccurate and should never be used for individual decisions. Use only for aggregate fairness analysis.

demographic_imputation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def bisg_race_probability(surname, zip_code, surname_race_probs, 
                             zip_race_probs):
    """
    Bayesian Improved Surname Geocoding for race imputation.
    
    Uses Bayes' theorem to combine surname and geography information.
    
    P(race | surname, zip) ∝ P(surname | race) * P(race | zip)
    
    Args:
        surname: Individual's surname (lowercase)
        zip_code: Individual's ZIP code
        surname_race_probs: Dict mapping surname -> {race: probability}
        zip_race_probs: Dict mapping zip_code -> {race: probability}
    
    Returns:
        Dict mapping race to probability
    """
    races = ['white', 'black', 'hispanic', 'asian', 'other']
    
    # Get surname probabilities (default to uniform if unknown)
    surname_probs = surname_race_probs.get(
        surname.lower(), 
        {r: 0.2 for r in races}
    )
    
    # Get ZIP probabilities (default to national averages)
    zip_probs = zip_race_probs.get(
        zip_code,
        {'white': 0.60, 'black': 0.13, 'hispanic': 0.18, 
         'asian': 0.06, 'other': 0.03}
    )
    
    # Bayes' rule (unnormalized)
    posterior = {}
    for race in races:
        posterior[race] = surname_probs.get(race, 0) * zip_probs.get(race, 0)
    
    # Normalize
    total = sum(posterior.values())
    if total > 0:
        for race in races:
            posterior[race] /= total
    
    return posterior
 
 
class DemographicImputationAuditor:
    """
    Audit framework using imputed demographics.
    
    WARNING: Imputed demographics should ONLY be used for aggregate
    fairness analysis, never for individual decisions.
    """
    
    def __init__(self, confidence_threshold=0.7):
        """
        Args:
            confidence_threshold: Minimum probability to count as member 
                                  of a group (reduces noise from uncertain 
                                  imputations)
        """
        self.confidence_threshold = confidence_threshold
        
    def audit_with_uncertainty(self, y_pred, demographic_probs, 
                                group_name='group'):
        """
        Perform fairness audit accounting for imputation uncertainty.
        
        Args:
            y_pred: Binary predictions
            demographic_probs: List of {group: probability} dicts
            group_name: Name of demographic dimension
        
        Returns:
            Dict with fairness metrics and confidence intervals
        """
        import numpy as np
        
        groups = set()
        for probs in demographic_probs:
            groups.update(probs.keys())
        
        results = {}
        for group in groups:
            # Get individuals with high-confidence assignment
            high_conf_mask = np.array([
                p.get(group, 0) >= self.confidence_threshold 
                for p in demographic_probs
            ])
            
            if high_conf_mask.sum() < 10:
                continue
            
            # Calculate metrics for this group
            group_preds = np.array(y_pred)[high_conf_mask]
            
            results[group] = {
                'n_high_confidence': high_conf_mask.sum(),
                'positive_rate': group_preds.mean(),
                'avg_membership_prob': np.mean([
                    p.get(group, 0) for p in demographic_probs
                ]),
            }
        
        # Also calculate weighted version
        weighted_results = {}
        for group in groups:
            weights = np.array([p.get(group, 0) for p in demographic_probs])
            if weights.sum() < 10:
                continue
            
            weighted_positive_rate = (
                np.array(y_pred) * weights
            ).sum() / weights.sum()
            
            weighted_results[group] = {
                'effective_n': weights.sum(),
                'weighted_positive_rate': weighted_positive_rate,
            }
        
        return {
            'high_confidence': results,
            'weighted': weighted_results,
            'note': 'Use weighted results for smaller bias; high-confidence for lower noise'
        }
 
 
# Example usage
if __name__ == "__main__":
    # Simulated BISG lookup tables
    surname_probs = {
        'smith': {'white': 0.70, 'black': 0.23, 'hispanic': 0.02, 
                  'asian': 0.01, 'other': 0.04},
        'garcia': {'white': 0.05, 'black': 0.01, 'hispanic': 0.92,
                   'asian': 0.01, 'other': 0.01},
        'kim': {'white': 0.02, 'black': 0.01, 'hispanic': 0.01,
                'asian': 0.94, 'other': 0.02},
    }
    
    zip_probs = {
        '90210': {'white': 0.80, 'black': 0.03, 'hispanic': 0.10,
                  'asian': 0.05, 'other': 0.02},
        '10027': {'white': 0.25, 'black': 0.40, 'hispanic': 0.25,
                  'asian': 0.05, 'other': 0.05},
    }
    
    # Example: Impute race for an individual
    probs = bisg_race_probability('garcia', '90210', surname_probs, zip_probs)
    print("BISG Race Probabilities for 'Garcia' in 90210:")
    for race, prob in sorted(probs.items(), key=lambda x: -x[1]):
        print(f"  {race}: {prob:.3f}")

Legal Doctrine: Disparate Treatment vs. Disparate Impact

U.S. anti-discrimination law distinguishes between two forms of discrimination, both of which apply to algorithmic decision-making:

Disparate Treatment (Intentional Discrimination)

Occurs when protected attributes are explicitly used in decision-making. In ML terms, this would mean directly including race, gender, etc. as model features that influence predictions.

Standard: Plaintiff must show decision was because of protected attribute
ML Analog: Model explicitly uses protected attribute (rare in modern systems)
Defense: Bona Fide Occupational Qualification (BFOQ)—attribute is essential to job (e.g., actors for gender-specific roles)

Disparate Impact (Unintentional Discrimination)

Occurs when a facially neutral policy disproportionately affects a protected group. This is where most ML fairness concerns arise—proxy discrimination is a form of disparate impact.

Disparate Impact Analysis Framework

•Prima Facie Case: Plaintiff demonstrates statistical disparity in outcomes (e.g., using 80% rule: selection rate for protected group < 80% of favorable group)
•Business Necessity: Defendant must show disputed practice is job-related and consistent with business necessity
•Less Discriminatory Alternative: Plaintiff can still prevail by showing equally effective alternative with less disparate impact existed

The 80% Rule (Four-Fifths Rule)

EEOC guidelines suggest disparate impact if selection rate for protected group is less than 80% of the rate for the group with highest selection rate. E.g., if men are hired at 50%, women must be hired at least at 40%. This is a threshold for raising concern, not a legal safe harbor.

disparate_impact_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
def four_fifths_rule_analysis(y_pred, protected_attr, group_names=None):
    """
    Analyze disparate impact using the 80% (Four-Fifths) rule.
    
    Args:
        y_pred: Binary predictions (1 = positive outcome)
        protected_attr: Group membership
        group_names: Optional mapping from values to names
    
    Returns:
        Dict with disparate impact analysis
    """
    import numpy as np
    
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    groups = np.unique(protected_attr)
    
    # Calculate selection rates per group
    selection_rates = {}
    for g in groups:
        mask = protected_attr == g
        name = group_names.get(g, str(g)) if group_names else str(g)
        selection_rates[name] = y_pred[mask].mean()
    
    # Find highest selection rate (reference group)
    reference_group = max(selection_rates, key=selection_rates.get)
    reference_rate = selection_rates[reference_group]
    
    # Calculate disparate impact ratios
    results = {
        'reference_group': reference_group,
        'reference_selection_rate': reference_rate,
        'groups': []
    }
    
    for group, rate in selection_rates.items():
        if reference_rate > 0:
            ratio = rate / reference_rate
        else:
            ratio = 1.0 if rate == 0 else float('inf')
        
        results['groups'].append({
            'group': group,
            'selection_rate': rate,
            'adverse_impact_ratio': ratio,
            'passes_80_percent_rule': ratio >= 0.8,
            'disparity_flag': 'ADVERSE IMPACT' if ratio < 0.8 else 'OK'
        })
    
    # Overall assessment
    failing_groups = [g for g in results['groups'] if not g['passes_80_percent_rule']]
    results['overall_assessment'] = {
        'any_adverse_impact': len(failing_groups) > 0,
        'groups_with_adverse_impact': [g['group'] for g in failing_groups],
        'minimum_ratio': min(g['adverse_impact_ratio'] for g in results['groups']),
    }
    
    return results
 
 
def statistical_parity_significance(y_pred, protected_attr, 
                                    alpha=0.05):
    """
    Statistical test for whether group selection rates differ significantly.
    
    Uses chi-square test for independence.
    
    Args:
        y_pred: Binary predictions
        protected_attr: Group membership
        alpha: Significance level
    
    Returns:
        Dict with test results
    """
    import numpy as np
    from scipy import stats
    
    # Build contingency table
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    groups = np.unique(protected_attr)
    
    contingency = np.zeros((len(groups), 2))
    for i, g in enumerate(groups):
        mask = protected_attr == g
        contingency[i, 0] = (y_pred[mask] == 0).sum()
        contingency[i, 1] = (y_pred[mask] == 1).sum()
    
    chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
    
    return {
        'chi_square_statistic': chi2,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'significant_disparity': p_value < alpha,
        'interpretation': (
            f"At α={alpha}, we {'reject' if p_value < alpha else 'fail to reject'} "
            f"the null hypothesis of equal selection rates across groups."
        )
    }
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    n = 500
    
    # Simulated hiring decisions with disparate impact
    gender = np.random.choice([0, 1], n, p=[0.5, 0.5])  # 0=female, 1=male
    
    # Biased hiring: men hired at 60%, women at 40%
    hire_prob = 0.4 + 0.2 * gender
    hired = np.random.binomial(1, hire_prob, n)
    
    group_names = {0: 'female', 1: 'male'}
    
    # Four-fifths rule analysis
    di_results = four_fifths_rule_analysis(hired, gender, group_names)
    
    print("Disparate Impact Analysis:")
    print(f"Reference group: {di_results['reference_group']} "
          f"(rate: {di_results['reference_selection_rate']:.2%})")
    print()
    for g in di_results['groups']:
        print(f"  {g['group']}: {g['selection_rate']:.2%} "
              f"(ratio: {g['adverse_impact_ratio']:.2f}) "
              f"[{g['disparity_flag']}]")
    
    print(f"\nAdverse impact detected: {di_results['overall_assessment']['any_adverse_impact']}")
    
    # Statistical significance test
    stat_results = statistical_parity_significance(hired, gender)
    print(f"\nStatistical Test: {stat_results['interpretation']}")

Emerging and Contested Protected Attributes

The set of protected attributes is not fixed—it evolves with societal understanding and legal developments. ML practitioners should be aware of both established and emerging protected categories.

Emerging Protected Categories:

Emerging and Non-Traditional Protected Attributes
Attribute	Current Status	ML Concerns	Example Impact
Socioeconomic Status	Not traditionally protected; gaining recognition	Income, education, neighborhood proxied by many features	Credit algorithms discriminating against poor neighborhoods
Neurodiversity	Partially covered by disability laws	Behavioral features may correlate with autism, ADHD	Hiring algorithms penalizing atypical interview patterns
Political Affiliation	Protected in some jurisdictions	Social media, location data can reveal politics	Loan decisions based on political neighborhood patterns
Immigrant/Citizenship Status	Complex legal landscape	Name, language patterns proxy citizenship	Housing algorithms disadvantaging non-native names
Pregnancy	Legally protected but hard to enforce	Purchase patterns, health data may reveal	Insurance/employment algorithms detecting pregnancy
Genetic Information	GINA protection; expanding concerns	Health predictions increasingly incorporate genetics	Insurance using genetic risk factors where prohibited
Accent/Dialect	Intersects with national origin	Speech recognition, voice analysis systems	Voice assistants performing worse for non-standard accents

The Machine Learning Specific Challenge:

ML systems can infer attributes that were never explicitly collected. A model might:

Infer pregnancy from purchase patterns
Infer mental health status from social media activity
Infer disability from browsing behavior
Infer socioeconomic status from typing patterns

This creates a dark patterns risk: organizations can claim they don't 'use' protected attributes while their models effectively reconstruct them from proxy data.

The Inference Problem

The EU's GDPR treats inferred sensitive data the same as directly collected data. If your model effectively learns to infer race, religion, or health status from other features, you may be processing sensitive personal data even without explicit collection. This has significant compliance implications.

Forward-Looking Best Practices

•Assumption of inferability: Assume any sensitive attribute can be inferred. Design systems accordingly.
•Broad fairness auditing: Don't limit audits to legally required attributes. Consider socioeconomic status, neurodiversity, and other potentially marginalized characteristics.
•Feature purpose review: For each feature, ask: 'Could this proxy for a protected attribute?' If yes, is its inclusion justified by necessity?
•Representation testing: Include representatives from potentially affected groups in testing and evaluation.
•Impact assessment: Conduct prospective impact assessments for new ML systems, considering potential effects on vulnerable populations.

Summary: Protecting the Protected

We've explored the complex landscape of protected attributes in machine learning. Let's consolidate the essential insights:

Key Takeaways

•Legal grounding is essential — Protected attributes derive from civil rights law. Understanding legal frameworks (Title VII, ECOA, GDPR) informs responsible ML practice.
•Removing attributes isn't enough — Proxy discrimination occurs when ostensibly neutral features (ZIP code, name, education) encode protected attributes. Fairness Through Unawareness fails.
•Intersectionality reveals hidden harms — Single-axis analysis misses discrimination at intersections (Black women, elderly disabled). Audit across subgroups.
•Collection enables accountability — Collecting protected data for auditing (with privacy protections) is typically preferable to willful blindness about discrimination.
•Disparate impact applies to ML — Facially neutral algorithms can unlawfully discriminate if they produce unjustified disparate outcomes.
•Protected attributes are evolving — Socioeconomic status, neurodiversity, and other categories are gaining protection. Design forward-looking systems.

What's Next:

Now that we understand protected attributes and the mechanisms of discrimination, the next page examines disparate impact in detail—how to measure it, when it's legally significant, and strategies for mitigation. We'll formalize the analytical frameworks introduced here and develop practical tools for disparate impact analysis.

Page Complete

You now have a comprehensive understanding of protected attributes in machine learning. This knowledge forms the foundation for building systems that respect legal requirements and ethical principles. The next page deepens our analysis into formal disparate impact assessment.

2 / 5

Loading learning content...

Machine LearningML Interpretability & Fairness

Fairness in Machine Learning

LevelAdvanced

Duration90 mins

TopicML Interpretability & Fairness

2 / 5

Protected Attributes

The Attributes That Demand Protection

What You Will Learn

Legal Foundations: What the Law Protects

Protected attributes have their origins in civil rights legislation designed to combat discrimination. While ML practitioners aren't lawyers, understanding the legal landscape is essential because:

Legal liability: Your models may be subject to anti-discrimination audits
Regulatory compliance: Many industries have specific fairness requirements
Ethical grounding: Laws reflect societal consensus on what deserves protection

Key U.S. Legislative Frameworks:

Major U.S. Anti-Discrimination Laws Relevant to ML
Legislation	Year	Protected Attributes	Domains
Civil Rights Act (Title VII)	1964	Race, color, religion, sex, national origin	Employment
Fair Housing Act	1968	Race, color, religion, national origin, sex, familial status, disability	Housing
Equal Credit Opportunity Act	1974	Race, color, religion, national origin, sex, marital status, age, receipt of public assistance	Credit
Age Discrimination in Employment Act	1967	Age (40+)	Employment
Americans with Disabilities Act	1990	Disability	Employment, public accommodation
Genetic Information Nondiscrimination Act	2008	Genetic information	Employment, health insurance

International Perspectives:

Protected attributes vary significantly across jurisdictions:

European Union (GDPR Article 9): Explicitly defines 'special categories' of personal data including racial/ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, biometric data, health data, and sexual orientation.
Canada (Canadian Human Rights Act): Protects race, national/ethnic origin, color, religion, age, sex, sexual orientation, gender identity, marital status, family status, genetic characteristics, disability, and pardoned conviction.
UK (Equality Act 2010): Nine protected characteristics—age, disability, gender reassignment, marriage/civil partnership, pregnancy/maternity, race, religion/belief, sex, and sexual orientation.

Domain-Specific Considerations:

Industry-Specific Protected Attribute Concerns

•Healthcare: HIPAA adds health status as effectively protected; medical AI must not discriminate based on diagnosis history, genetic markers, or disability.
•Insurance: Varies dramatically by type—health insurance cannot use pre-existing conditions (ACA), but car insurance can use driving record despite disparate impact on groups with different mobility needs.
•Employment: Beyond protected classes, many jurisdictions protect employment history, salary history (to reduce perpetuation of gender pay gaps), and criminal records (ban-the-box laws).
•Criminal Justice: Especially sensitive—race must never influence sentencing, yet recidivism prediction tools have shown racial disparities.
•Education: Affirmative action cases have created complex legal landscape around race-conscious admissions.

Context Matters

The Proxy Problem: Hidden Discrimination

Perhaps the most insidious challenge with protected attributes is proxy discrimination—where ostensibly neutral features serve as stand-ins for protected characteristics.

What Makes a Proxy?

A feature becomes a proxy when it's:

Correlated with a protected attribute
Used in the model's decision-making
Not independently justified as necessary for the prediction task

Classic Examples of Proxy Variables:

Common Proxy Variables and Their Protected Correlates
Feature	Proxies For	Mechanism	Example Domain
ZIP Code	Race, income, education	Residential segregation patterns	Credit, insurance, hiring
First Name	Race, ethnicity, gender	Cultural naming patterns	Resume screening
University Attended	Socioeconomic status, race	Unequal educational access	Hiring
Browsing History	Age, gender, interests	Behavioral patterns	Ad targeting
Arrest History	Race	Disparate policing	Criminal justice
Height/Weight	Gender	Biological differences	Healthcare, insurance
Commute Time	Race, income	Housing patterns, transportation access	Employment
Writing Style	Education, culture	Language exposure patterns	Education, hiring

The ZIP Code Illusion

Detecting Proxy Relationships:

Identifying proxies requires systematic analysis. Here are key techniques:

proxy_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
def analyze_proxy_relationships(X, protected_attr, feature_names):
    """
    Analyze correlations between features and protected attributes
    to identify potential proxies.
    
    Args:
        X: Feature matrix (n_samples, n_features)
        protected_attr: Binary protected attribute array
        feature_names: List of feature names
    
    Returns:
        DataFrame with proxy analysis results
    """
    import numpy as np
    import pandas as pd
    from scipy import stats
    
    results = []
    
    for i, name in enumerate(feature_names):
        feature = X[:, i]
        
        # For continuous features: point-biserial correlation
        if len(np.unique(feature)) > 10:
            corr, p_value = stats.pointbiserialr(protected_attr, feature)
            test_type = "point-biserial"
        # For categorical features: chi-square test
        else:
            contingency = pd.crosstab(feature, protected_attr)
            chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
            # Cramér's V for effect size
            n = len(feature)
            corr = np.sqrt(chi2 / (n * (min(contingency.shape) - 1)))
            test_type = "cramers_v"
        
        results.append({
            'feature': name,
            'correlation': abs(corr),
            'p_value': p_value,
            'test_type': test_type,
            'proxy_risk': 'HIGH' if abs(corr) > 0.3 else 
                          'MEDIUM' if abs(corr) > 0.15 else 'LOW'
        })
    
    return pd.DataFrame(results).sort_values('correlation', ascending=False)
 
 
def mutual_information_proxy_analysis(X, protected_attr, feature_names):
    """
    Use mutual information to detect nonlinear proxy relationships.
    
    MI captures relationships that linear correlation might miss.
    """
    import numpy as np
    from sklearn.feature_selection import mutual_info_classif
    
    # Reshape protected attribute
    protected_attr = np.array(protected_attr).ravel()
    
    # Calculate MI between each feature and protected attribute
    mi_scores = mutual_info_classif(
        X, protected_attr, 
        discrete_features='auto',
        random_state=42
    )
    
    results = []
    for name, mi in zip(feature_names, mi_scores):
        results.append({
            'feature': name,
            'mutual_information': mi,
            'proxy_risk': 'HIGH' if mi > 0.2 else 
                          'MEDIUM' if mi > 0.1 else 'LOW'
        })
    
    return sorted(results, key=lambda x: x['mutual_information'], reverse=True)
 
 
def feature_importance_by_group(X, y, protected_attr, feature_names, model):
    """
    Compare feature importances across groups.
    
    If a feature is much more important for predicting outcomes
    in one group, it may be acting as a proxy.
    """
    import numpy as np
    from sklearn.inspection import permutation_importance
    
    protected_attr = np.array(protected_attr)
    
    # Separate groups
    group_0_mask = protected_attr == 0
    group_1_mask = protected_attr == 1
    
    # Calculate permutation importance for each group
    result = {}
    for group_name, mask in [('group_0', group_0_mask), ('group_1', group_1_mask)]:
        if mask.sum() < 10:
            continue
            
        perm_importance = permutation_importance(
            model, X[mask], y[mask], 
            n_repeats=10, random_state=42
        )
        
        result[group_name] = {
            name: imp for name, imp in 
            zip(feature_names, perm_importance.importances_mean)
        }
    
    # Identify features with divergent importance
    divergent_features = []
    for name in feature_names:
        imp_0 = result.get('group_0', {}).get(name, 0)
        imp_1 = result.get('group_1', {}).get(name, 0)
        
        if max(imp_0, imp_1) > 0.01:  # Only consider meaningful features
            ratio = max(imp_0, imp_1) / (min(imp_0, imp_1) + 1e-10)
            if ratio > 2:  # 2x difference threshold
                divergent_features.append({
                    'feature': name,
                    'importance_group_0': imp_0,
                    'importance_group_1': imp_1,
                    'ratio': ratio
                })
    
    return sorted(divergent_features, key=lambda x: x['ratio'], reverse=True)
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    n = 1000
    
    # Simulated protected attribute (e.g., race)
    protected = np.random.binomial(1, 0.3, n)
    
    # Features with varying proxy relationships
    feature_names = ['income', 'zip_code', 'education', 'experience', 'random']
    
    # Create features with different correlation levels
    X = np.column_stack([
        np.random.normal(50000, 15000, n) + protected * 10000,  # income: medium correlation
        np.random.choice([10001, 10002, 10003, 90210, 90211], n, 
                        p=[0.3-0.1*protected.mean(), 0.3-0.1*protected.mean(), 
                           0.1, 0.15+0.1*protected.mean(), 0.15+0.1*protected.mean()]),  # zip: high correlation
        protected * 2 + np.random.normal(14, 2, n),  # education: high correlation
        np.random.uniform(0, 20, n),  # experience: no correlation
        np.random.randn(n)  # random: no correlation
    ])
    
    results = analyze_proxy_relationships(X, protected, feature_names)
    print("Proxy Analysis Results:")
    print(results.to_string(index=False))

Intersectionality: Compounded Discrimination

The Classic Example:

In the 1976 case DeGraffenreid v. General Motors, Black women sued GM for discrimination. The court found:

GM hired Black men (maintenance jobs) → no race discrimination
GM hired white women (secretarial jobs) → no sex discrimination

But GM hired no Black women at all. The intersection of race AND gender created a unique pattern invisible when analyzing each attribute separately.

Implications for ML Fairness:

Why Single-Axis Analysis Fails

•Simpson's Paradox: A model may appear fair for men and fair for women, but deeply unfair for Black women specifically.
•Small Subgroup Problem: Intersectional groups (e.g., elderly disabled women of color) may have small sample sizes, making bias harder to detect statistically.
•Compounded Bias: Biases from multiple attributes can multiply rather than add—discrimination against older workers PLUS discrimination against women may hit older women harder than the sum would suggest.
•Proxy Interactions: A feature might be an innocuous predictor for most groups but a proxy specifically for an intersectional subgroup.

intersectional_fairness.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def intersectional_fairness_analysis(y_true, y_pred, protected_attrs):
    """
    Analyze fairness across intersectional subgroups.
    
    Args:
        y_true: Ground truth labels
        y_pred: Predictions
        protected_attrs: Dict mapping attribute names to arrays
                        e.g., {'race': [...], 'gender': [...]}
    
    Returns:
        DataFrame with metrics for all intersectional groups
    """
    import numpy as np
    import pandas as pd
    from itertools import product
    from sklearn.metrics import accuracy_score, precision_score, recall_score
    
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    # Create intersectional groups
    attr_names = list(protected_attrs.keys())
    attr_values = [np.unique(protected_attrs[name]) for name in attr_names]
    
    results = []
    
    # Overall metrics
    results.append({
        'group': 'OVERALL',
        'n': len(y_true),
        'base_rate': y_true.mean(),
        'positive_rate': y_pred.mean(),
        'accuracy': accuracy_score(y_true, y_pred),
        'tpr': recall_score(y_true, y_pred, zero_division=0),
        'fpr': (y_pred[y_true == 0] == 1).mean() if (y_true == 0).sum() > 0 else 0,
    })
    
    # Single-attribute groups
    for attr_name, attrs in protected_attrs.items():
        attrs = np.array(attrs)
        for val in np.unique(attrs):
            mask = attrs == val
            if mask.sum() < 10:
                continue
                
            y_t, y_p = y_true[mask], y_pred[mask]
            results.append({
                'group': f'{attr_name}={val}',
                'n': mask.sum(),
                'base_rate': y_t.mean(),
                'positive_rate': y_p.mean(),
                'accuracy': accuracy_score(y_t, y_p),
                'tpr': recall_score(y_t, y_p, zero_division=0),
                'fpr': (y_p[y_t == 0] == 1).mean() if (y_t == 0).sum() > 0 else 0,
            })
    
    # Intersectional groups
    for combo in product(*attr_values):
        mask = np.ones(len(y_true), dtype=bool)
        group_name_parts = []
        
        for attr_name, val in zip(attr_names, combo):
            attrs = np.array(protected_attrs[attr_name])
            mask &= (attrs == val)
            group_name_parts.append(f'{attr_name}={val}')
        
        if mask.sum() < 10:  # Skip very small groups
            continue
        
        group_name = ' & '.join(group_name_parts)
        y_t, y_p = y_true[mask], y_pred[mask]
        
        results.append({
            'group': group_name,
            'n': mask.sum(),
            'base_rate': y_t.mean(),
            'positive_rate': y_p.mean(),
            'accuracy': accuracy_score(y_t, y_p),
            'tpr': recall_score(y_t, y_p, zero_division=0),
            'fpr': (y_p[y_t == 0] == 1).mean() if (y_t == 0).sum() > 0 else 0,
        })
    
    df = pd.DataFrame(results)
    
    # Calculate disparities
    overall_tpr = df[df['group'] == 'OVERALL']['tpr'].values[0]
    overall_fpr = df[df['group'] == 'OVERALL']['fpr'].values[0]
    
    df['tpr_disparity'] = df['tpr'] - overall_tpr
    df['fpr_disparity'] = df['fpr'] - overall_fpr
    
    return df.sort_values('tpr_disparity')
 
 
def detect_intersectional_bias(df):
    """
    Identify intersectional groups experiencing disproportionate harm.
    
    Flags groups where bias is worse than would be expected from
    single-attribute analysis.
    """
    results = []
    
    intersectional_groups = df[df['group'].str.contains(' & ')]
    single_groups = df[~df['group'].str.contains(' & ') & (df['group'] != 'OVERALL')]
    
    for _, row in intersectional_groups.iterrows():
        group_name = row['group']
        parts = group_name.split(' & ')
        
        # Get single-attribute disparities
        expected_tpr_disparity = 0
        expected_fpr_disparity = 0
        
        for part in parts:
            part_row = single_groups[single_groups['group'] == part]
            if len(part_row) > 0:
                expected_tpr_disparity += part_row['tpr_disparity'].values[0]
                expected_fpr_disparity += part_row['fpr_disparity'].values[0]
        
        # Compare actual vs expected (additive assumption)
        actual_tpr_disparity = row['tpr_disparity']
        actual_fpr_disparity = row['fpr_disparity']
        
        # Flag if actual is worse than expected
        tpr_excess = expected_tpr_disparity - actual_tpr_disparity
        fpr_excess = actual_fpr_disparity - expected_fpr_disparity
        
        if tpr_excess > 0.05 or fpr_excess > 0.05:
            results.append({
                'group': group_name,
                'n': row['n'],
                'actual_tpr_disparity': actual_tpr_disparity,
                'expected_tpr_disparity': expected_tpr_disparity,
                'tpr_excess_harm': tpr_excess,
                'actual_fpr_disparity': actual_fpr_disparity,
                'expected_fpr_disparity': expected_fpr_disparity,
                'fpr_excess_harm': fpr_excess,
            })
    
    return pd.DataFrame(results).sort_values('tpr_excess_harm', ascending=False)
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    import pandas as pd
    
    np.random.seed(42)
    n = 2000
    
    # Protected attributes
    race = np.random.choice(['white', 'black', 'asian'], n, p=[0.6, 0.25, 0.15])
    gender = np.random.choice(['male', 'female'], n, p=[0.5, 0.5])
    
    # Ground truth (with intersectional structure)
    y_true = np.random.binomial(1, 0.3, n)
    
    # Biased predictions: extra bias against Black women
    y_pred = y_true.copy()
    black_female_mask = (race == 'black') & (gender == 'female')
    # Add extra false negatives for Black women
    flip_mask = black_female_mask & (y_true == 1) & (np.random.rand(n) < 0.3)
    y_pred[flip_mask] = 0
    
    protected_attrs = {'race': race, 'gender': gender}
    
    df = intersectional_fairness_analysis(y_true, y_pred, protected_attrs)
    print("Intersectional Fairness Analysis:")
    print(df[['group', 'n', 'tpr', 'fpr', 'tpr_disparity']].to_string(index=False))
    
    bias_df = detect_intersectional_bias(df)
    if len(bias_df) > 0:
        print("\nIntersectional Bias Detected:")
        print(bias_df.to_string(index=False))

Strategies for Handling Protected Attributes

The question of how to handle protected attributes in ML systems doesn't have a single correct answer. Different strategies reflect different ethical and legal philosophies.

The Four Main Approaches:

Fairness Through Unawareness (FTU) simply removes protected attributes from the feature set, ensuring the model never 'sees' them.

The Approach:

Remove race, gender, age, etc. from training data
Train model on remaining features
Assume that ignoring protected attributes ensures fairness

Why It's Appealing:

Simple to implement
Legally defensible ('we don't use race/gender')
Aligns with 'colorblind' philosophy

Why It Fails:

•Proxies remain: ZIP code, name, and other features encode protected attributes. Model learns discrimination through backdoors.
•No accountability: Without tracking protected attributes, you cannot audit for disparate impact.
•Perpetuates historical bias: If training data reflects past discrimination, model reproduces it even without explicit access to protected attributes.
•Ignores legitimate uses: Sometimes protected attributes are needed for fairness (e.g., detecting discrimination, affirmative measures).

Not Recommended as Primary Strategy

FTU is widely considered insufficient by itself. Removing protected attributes without auditing for proxy effects is like closing your eyes and hoping discrimination doesn't occur.

Data Collection: Ethical Considerations

Collecting protected attribute data creates a dilemma: you need it to audit for fairness, but collecting it raises privacy, consent, and potential misuse concerns.

Key Questions for Data Collection:

Responsible Data Collection Principles

•Purpose limitation: Collect protected attributes only for fairness auditing, not for prediction or targeting. Document this purpose clearly.
•Informed consent: Individuals should understand why their demographic data is being collected and how it will be used. Optional self-identification is preferable.
•Data minimization: Collect only the attributes needed. Prefer coarse categories (e.g., 'under 40' vs exact age) when sufficient.
•Access controls: Restrict access to protected attribute data to those performing fairness audits. Separate from operational systems.
•Aggregation: Report fairness metrics only in aggregate (e.g., 'FPR for group A = X%'), never expose individual protected attributes in results.
•Retention limits: Delete or anonymize protected attribute data when no longer needed for auditing.

The Collection Paradox

When Protected Data Isn't Available:

In many real-world scenarios, protected attribute data isn't available. Strategies include:

Bayesian Improved Surname Geocoding (BISG): Probabilistically infer race/ethnicity from surname and location for auditing purposes
Name-based inference: First names correlate with gender and sometimes ethnicity (fraught with issues but used in some audits)
Ecological inference: Use aggregate demographic data at geographic level to estimate protected attribute distributions
Surveys and opt-in collection: Reach out to users to voluntarily provide demographic data for research purposes

Caution: Imputed demographics can be inaccurate and should never be used for individual decisions. Use only for aggregate fairness analysis.

demographic_imputation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def bisg_race_probability(surname, zip_code, surname_race_probs, 
                             zip_race_probs):
    """
    Bayesian Improved Surname Geocoding for race imputation.
    
    Uses Bayes' theorem to combine surname and geography information.
    
    P(race | surname, zip) ∝ P(surname | race) * P(race | zip)
    
    Args:
        surname: Individual's surname (lowercase)
        zip_code: Individual's ZIP code
        surname_race_probs: Dict mapping surname -> {race: probability}
        zip_race_probs: Dict mapping zip_code -> {race: probability}
    
    Returns:
        Dict mapping race to probability
    """
    races = ['white', 'black', 'hispanic', 'asian', 'other']
    
    # Get surname probabilities (default to uniform if unknown)
    surname_probs = surname_race_probs.get(
        surname.lower(), 
        {r: 0.2 for r in races}
    )
    
    # Get ZIP probabilities (default to national averages)
    zip_probs = zip_race_probs.get(
        zip_code,
        {'white': 0.60, 'black': 0.13, 'hispanic': 0.18, 
         'asian': 0.06, 'other': 0.03}
    )
    
    # Bayes' rule (unnormalized)
    posterior = {}
    for race in races:
        posterior[race] = surname_probs.get(race, 0) * zip_probs.get(race, 0)
    
    # Normalize
    total = sum(posterior.values())
    if total > 0:
        for race in races:
            posterior[race] /= total
    
    return posterior
 
 
class DemographicImputationAuditor:
    """
    Audit framework using imputed demographics.
    
    WARNING: Imputed demographics should ONLY be used for aggregate
    fairness analysis, never for individual decisions.
    """
    
    def __init__(self, confidence_threshold=0.7):
        """
        Args:
            confidence_threshold: Minimum probability to count as member 
                                  of a group (reduces noise from uncertain 
                                  imputations)
        """
        self.confidence_threshold = confidence_threshold
        
    def audit_with_uncertainty(self, y_pred, demographic_probs, 
                                group_name='group'):
        """
        Perform fairness audit accounting for imputation uncertainty.
        
        Args:
            y_pred: Binary predictions
            demographic_probs: List of {group: probability} dicts
            group_name: Name of demographic dimension
        
        Returns:
            Dict with fairness metrics and confidence intervals
        """
        import numpy as np
        
        groups = set()
        for probs in demographic_probs:
            groups.update(probs.keys())
        
        results = {}
        for group in groups:
            # Get individuals with high-confidence assignment
            high_conf_mask = np.array([
                p.get(group, 0) >= self.confidence_threshold 
                for p in demographic_probs
            ])
            
            if high_conf_mask.sum() < 10:
                continue
            
            # Calculate metrics for this group
            group_preds = np.array(y_pred)[high_conf_mask]
            
            results[group] = {
                'n_high_confidence': high_conf_mask.sum(),
                'positive_rate': group_preds.mean(),
                'avg_membership_prob': np.mean([
                    p.get(group, 0) for p in demographic_probs
                ]),
            }
        
        # Also calculate weighted version
        weighted_results = {}
        for group in groups:
            weights = np.array([p.get(group, 0) for p in demographic_probs])
            if weights.sum() < 10:
                continue
            
            weighted_positive_rate = (
                np.array(y_pred) * weights
            ).sum() / weights.sum()
            
            weighted_results[group] = {
                'effective_n': weights.sum(),
                'weighted_positive_rate': weighted_positive_rate,
            }
        
        return {
            'high_confidence': results,
            'weighted': weighted_results,
            'note': 'Use weighted results for smaller bias; high-confidence for lower noise'
        }
 
 
# Example usage
if __name__ == "__main__":
    # Simulated BISG lookup tables
    surname_probs = {
        'smith': {'white': 0.70, 'black': 0.23, 'hispanic': 0.02, 
                  'asian': 0.01, 'other': 0.04},
        'garcia': {'white': 0.05, 'black': 0.01, 'hispanic': 0.92,
                   'asian': 0.01, 'other': 0.01},
        'kim': {'white': 0.02, 'black': 0.01, 'hispanic': 0.01,
                'asian': 0.94, 'other': 0.02},
    }
    
    zip_probs = {
        '90210': {'white': 0.80, 'black': 0.03, 'hispanic': 0.10,
                  'asian': 0.05, 'other': 0.02},
        '10027': {'white': 0.25, 'black': 0.40, 'hispanic': 0.25,
                  'asian': 0.05, 'other': 0.05},
    }
    
    # Example: Impute race for an individual
    probs = bisg_race_probability('garcia', '90210', surname_probs, zip_probs)
    print("BISG Race Probabilities for 'Garcia' in 90210:")
    for race, prob in sorted(probs.items(), key=lambda x: -x[1]):
        print(f"  {race}: {prob:.3f}")

Legal Doctrine: Disparate Treatment vs. Disparate Impact

U.S. anti-discrimination law distinguishes between two forms of discrimination, both of which apply to algorithmic decision-making:

Disparate Treatment (Intentional Discrimination)

Occurs when protected attributes are explicitly used in decision-making. In ML terms, this would mean directly including race, gender, etc. as model features that influence predictions.

Standard: Plaintiff must show decision was because of protected attribute
ML Analog: Model explicitly uses protected attribute (rare in modern systems)
Defense: Bona Fide Occupational Qualification (BFOQ)—attribute is essential to job (e.g., actors for gender-specific roles)

Disparate Impact (Unintentional Discrimination)

Occurs when a facially neutral policy disproportionately affects a protected group. This is where most ML fairness concerns arise—proxy discrimination is a form of disparate impact.

Disparate Impact Analysis Framework

•Prima Facie Case: Plaintiff demonstrates statistical disparity in outcomes (e.g., using 80% rule: selection rate for protected group < 80% of favorable group)
•Business Necessity: Defendant must show disputed practice is job-related and consistent with business necessity
•Less Discriminatory Alternative: Plaintiff can still prevail by showing equally effective alternative with less disparate impact existed

The 80% Rule (Four-Fifths Rule)

disparate_impact_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
def four_fifths_rule_analysis(y_pred, protected_attr, group_names=None):
    """
    Analyze disparate impact using the 80% (Four-Fifths) rule.
    
    Args:
        y_pred: Binary predictions (1 = positive outcome)
        protected_attr: Group membership
        group_names: Optional mapping from values to names
    
    Returns:
        Dict with disparate impact analysis
    """
    import numpy as np
    
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    groups = np.unique(protected_attr)
    
    # Calculate selection rates per group
    selection_rates = {}
    for g in groups:
        mask = protected_attr == g
        name = group_names.get(g, str(g)) if group_names else str(g)
        selection_rates[name] = y_pred[mask].mean()
    
    # Find highest selection rate (reference group)
    reference_group = max(selection_rates, key=selection_rates.get)
    reference_rate = selection_rates[reference_group]
    
    # Calculate disparate impact ratios
    results = {
        'reference_group': reference_group,
        'reference_selection_rate': reference_rate,
        'groups': []
    }
    
    for group, rate in selection_rates.items():
        if reference_rate > 0:
            ratio = rate / reference_rate
        else:
            ratio = 1.0 if rate == 0 else float('inf')
        
        results['groups'].append({
            'group': group,
            'selection_rate': rate,
            'adverse_impact_ratio': ratio,
            'passes_80_percent_rule': ratio >= 0.8,
            'disparity_flag': 'ADVERSE IMPACT' if ratio < 0.8 else 'OK'
        })
    
    # Overall assessment
    failing_groups = [g for g in results['groups'] if not g['passes_80_percent_rule']]
    results['overall_assessment'] = {
        'any_adverse_impact': len(failing_groups) > 0,
        'groups_with_adverse_impact': [g['group'] for g in failing_groups],
        'minimum_ratio': min(g['adverse_impact_ratio'] for g in results['groups']),
    }
    
    return results
 
 
def statistical_parity_significance(y_pred, protected_attr, 
                                    alpha=0.05):
    """
    Statistical test for whether group selection rates differ significantly.
    
    Uses chi-square test for independence.
    
    Args:
        y_pred: Binary predictions
        protected_attr: Group membership
        alpha: Significance level
    
    Returns:
        Dict with test results
    """
    import numpy as np
    from scipy import stats
    
    # Build contingency table
    y_pred = np.array(y_pred)
    protected_attr = np.array(protected_attr)
    
    groups = np.unique(protected_attr)
    
    contingency = np.zeros((len(groups), 2))
    for i, g in enumerate(groups):
        mask = protected_attr == g
        contingency[i, 0] = (y_pred[mask] == 0).sum()
        contingency[i, 1] = (y_pred[mask] == 1).sum()
    
    chi2, p_value, dof, expected = stats.chi2_contingency(contingency)
    
    return {
        'chi_square_statistic': chi2,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'significant_disparity': p_value < alpha,
        'interpretation': (
            f"At α={alpha}, we {'reject' if p_value < alpha else 'fail to reject'} "
            f"the null hypothesis of equal selection rates across groups."
        )
    }
 
 
# Example usage
if __name__ == "__main__":
    import numpy as np
    
    np.random.seed(42)
    n = 500
    
    # Simulated hiring decisions with disparate impact
    gender = np.random.choice([0, 1], n, p=[0.5, 0.5])  # 0=female, 1=male
    
    # Biased hiring: men hired at 60%, women at 40%
    hire_prob = 0.4 + 0.2 * gender
    hired = np.random.binomial(1, hire_prob, n)
    
    group_names = {0: 'female', 1: 'male'}
    
    # Four-fifths rule analysis
    di_results = four_fifths_rule_analysis(hired, gender, group_names)
    
    print("Disparate Impact Analysis:")
    print(f"Reference group: {di_results['reference_group']} "
          f"(rate: {di_results['reference_selection_rate']:.2%})")
    print()
    for g in di_results['groups']:
        print(f"  {g['group']}: {g['selection_rate']:.2%} "
              f"(ratio: {g['adverse_impact_ratio']:.2f}) "
              f"[{g['disparity_flag']}]")
    
    print(f"\nAdverse impact detected: {di_results['overall_assessment']['any_adverse_impact']}")
    
    # Statistical significance test
    stat_results = statistical_parity_significance(hired, gender)
    print(f"\nStatistical Test: {stat_results['interpretation']}")

Emerging and Contested Protected Attributes

The set of protected attributes is not fixed—it evolves with societal understanding and legal developments. ML practitioners should be aware of both established and emerging protected categories.

Emerging Protected Categories:

Emerging and Non-Traditional Protected Attributes
Attribute	Current Status	ML Concerns	Example Impact
Socioeconomic Status	Not traditionally protected; gaining recognition	Income, education, neighborhood proxied by many features	Credit algorithms discriminating against poor neighborhoods
Neurodiversity	Partially covered by disability laws	Behavioral features may correlate with autism, ADHD	Hiring algorithms penalizing atypical interview patterns
Political Affiliation	Protected in some jurisdictions	Social media, location data can reveal politics	Loan decisions based on political neighborhood patterns
Immigrant/Citizenship Status	Complex legal landscape	Name, language patterns proxy citizenship	Housing algorithms disadvantaging non-native names
Pregnancy	Legally protected but hard to enforce	Purchase patterns, health data may reveal	Insurance/employment algorithms detecting pregnancy
Genetic Information	GINA protection; expanding concerns	Health predictions increasingly incorporate genetics	Insurance using genetic risk factors where prohibited
Accent/Dialect	Intersects with national origin	Speech recognition, voice analysis systems	Voice assistants performing worse for non-standard accents

The Machine Learning Specific Challenge:

ML systems can infer attributes that were never explicitly collected. A model might:

Infer pregnancy from purchase patterns
Infer mental health status from social media activity
Infer disability from browsing behavior
Infer socioeconomic status from typing patterns

This creates a dark patterns risk: organizations can claim they don't 'use' protected attributes while their models effectively reconstruct them from proxy data.

The Inference Problem

Forward-Looking Best Practices

•Assumption of inferability: Assume any sensitive attribute can be inferred. Design systems accordingly.
•Broad fairness auditing: Don't limit audits to legally required attributes. Consider socioeconomic status, neurodiversity, and other potentially marginalized characteristics.
•Feature purpose review: For each feature, ask: 'Could this proxy for a protected attribute?' If yes, is its inclusion justified by necessity?
•Representation testing: Include representatives from potentially affected groups in testing and evaluation.
•Impact assessment: Conduct prospective impact assessments for new ML systems, considering potential effects on vulnerable populations.

Summary: Protecting the Protected

We've explored the complex landscape of protected attributes in machine learning. Let's consolidate the essential insights:

Key Takeaways

•Legal grounding is essential — Protected attributes derive from civil rights law. Understanding legal frameworks (Title VII, ECOA, GDPR) informs responsible ML practice.
•Removing attributes isn't enough — Proxy discrimination occurs when ostensibly neutral features (ZIP code, name, education) encode protected attributes. Fairness Through Unawareness fails.
•Intersectionality reveals hidden harms — Single-axis analysis misses discrimination at intersections (Black women, elderly disabled). Audit across subgroups.
•Collection enables accountability — Collecting protected data for auditing (with privacy protections) is typically preferable to willful blindness about discrimination.
•Disparate impact applies to ML — Facially neutral algorithms can unlawfully discriminate if they produce unjustified disparate outcomes.
•Protected attributes are evolving — Socioeconomic status, neurodiversity, and other categories are gaining protection. Design forward-looking systems.

What's Next:

Page Complete

2 / 5