Regression Diagnostics - Learning Module

Loading content...

0/278

Heteroscedasticity

The Problem of Non-Constant Variance

In the classical linear regression model, we assume: $$\text{Var}(\varepsilon_i | X) = \sigma^2 \quad \text{for all } i$$

This homoscedasticity (constant variance) assumption underpins the efficiency of OLS and the validity of standard inference procedures. When it fails—when variance changes systematically across observations—we have heteroscedasticity.

Heteroscedasticity is not merely a technical concern. It occurs naturally in many domains:

Income data: Higher earners show greater variability
Company data: Larger firms exhibit more variable outcomes
Time series: Volatility clusters during certain periods
Count data: Variance typically increases with the mean

Understanding heteroscedasticity is essential for building regression models that produce reliable predictions and valid inferences.

What You Will Learn

This page covers: (1) The mathematical consequences of heteroscedasticity on OLS; (2) Visual detection through residual plots and spread-location plots; (3) Formal tests—Breusch-Pagan, White, Goldfeld-Quandt; (4) Remedial strategies including WLS and robust standard errors; and (5) Practical guidance on when each remedy is appropriate.

Mathematical Consequences of Heteroscedasticity

Understanding why heteroscedasticity matters requires examining its effects on OLS properties. The consequences are more subtle than many practitioners realize.

What OLS Retains Under Heteroscedasticity

Unbiasedness is preserved: $$E[\hat{\beta}_{OLS}] = \beta$$

The expected value of OLS estimates remains correct because unbiasedness depends only on:

Linearity: $E[y|X] = X\beta$
Zero mean errors: $E[\varepsilon|X] = 0$

Neither of these involves variance structure.

Consistency is preserved: $$\hat{\beta}_{OLS} \xrightarrow{p} \beta \text{ as } n \to \infty$$

OLS remains consistent regardless of heteroscedasticity.

What OLS Loses Under Heteroscedasticity

1. Efficiency is Lost

The Gauss-Markov theorem guarantees OLS is BLUE (Best Linear Unbiased Estimator) only under homoscedasticity. With heteroscedasticity:

OLS is still linear and unbiased
But it is no longer efficient—other estimators (like WLS) have smaller variance
You're extracting less information from your data than possible

2. Standard Errors are Wrong

This is the critical problem. Under heteroscedasticity, the usual OLS variance formula: $$\text{Var}(\hat{\beta}_{OLS}) = \sigma^2(X^TX)^{-1}$$

is incorrect. The true variance is: $$\text{Var}(\hat{\beta}_{OLS}) = (X^TX)^{-1}X^T \Omega X(X^TX)^{-1}$$

where $\Omega = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2)$ is the diagonal variance matrix.

The Inference Problem

Using wrong standard errors cascades into all inference: t-statistics, p-values, confidence intervals, and F-tests are all invalid. You might reject null hypotheses you shouldn't, or fail to detect real effects. Heteroscedasticity doesn't bias your estimates—it makes you draw wrong conclusions about their precision.

Direction of Bias in Standard Errors

The common pattern is that standard errors are underestimated under heteroscedasticity:

When high-variance observations have high leverage (common with increasing variance across $x$)
OLS downweights the contribution of high-variance points insufficiently
The residual mean square $\hat{\sigma}^2$ doesn't properly capture the variance structure

Result: t-statistics are inflated, p-values are too small, and you falsely reject null hypotheses.

The opposite can occur if high-variance observations happen to have low leverage, but this is less common in practice.

Visual Intuition

Imagine predicting house prices with square footage:

Low-footage homes: relatively consistent prices
High-footage homes: enormous price variability (depends heavily on neighborhood, quality, etc.)

OLS treats all residuals as equally informative, but the low-footage residuals actually contain more precision per observation. Ignoring this wastes information.

Summary: Effects of Heteroscedasticity on OLS
Property	Status	Practical Impact
Unbiasedness	✓ Preserved	Point estimates remain correct on average
Consistency	✓ Preserved	Estimates converge to truth as n → ∞
Efficiency	✗ Lost	Larger variance than necessary; information wasted
Standard Errors	✗ Wrong	Usually underestimated → inflated t-stats
Hypothesis Tests	✗ Invalid	Wrong rejection rates (often too many false positives)
Confidence Intervals	✗ Invalid	Coverage probability ≠ stated level

Visual Detection of Heteroscedasticity

Visual diagnosis of heteroscedasticity is often more informative than formal tests because it reveals the structure of the variance function—essential information for choosing appropriate remedies.

Primary Diagnostic: Residuals vs. Fitted Values

Plot residuals (or squared residuals, or absolute residuals) against fitted values:

Homoscedasticity pattern: Random scatter with constant vertical spread across all $\hat{y}$ values.

Heteroscedasticity patterns:

Fan/Funnel (opening right): Variance increases with predictions—very common in positive-valued data
Funnel (opening left): Variance decreases with predictions—less common
Bowtie/Diamond: Variance peaks in the middle—can indicate modeling issues
Banded patterns: May indicate discrete variance groups

The Scale-Location Plot

For enhanced visual clarity, plot the square root of absolute standardized residuals against fitted values: $$y = \sqrt{|r_i|}$$

where $r_i$ are standardized or studentized residuals. This transformation:

Stabilizes variance under homoscedasticity (the square root of a χ² is more symmetric)
Makes trends in spread more visible
A horizontal trend line indicates constant variance
Upward or downward slopes indicate heteroscedasticity

heteroscedasticity_visual.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from statsmodels.nonparametric.smoothers_lowess import lowess
 
def heteroscedasticity_diagnostic_plots(y, y_hat, residuals, figsize=(14, 10)):
    """
    Comprehensive visual diagnostics for heteroscedasticity.
    
    Parameters:
    -----------
    y : array-like
        Observed response values
    y_hat : array-like
        Fitted values
    residuals : array-like or dict
        Raw residuals or dict from compute_residual_types()
    """
    if isinstance(residuals, dict):
        e = residuals['raw']
        std_resid = residuals.get('internally_studentized', e / np.std(e))
    else:
        e = np.array(residuals)
        std_resid = e / np.std(e, ddof=1)
    
    y_hat = np.array(y_hat)
    n = len(e)
    
    fig, axes = plt.subplots(2, 2, figsize=figsize)
    
    # Plot 1: Residuals vs Fitted
    ax1 = axes[0, 0]
    ax1.scatter(y_hat, e, alpha=0.6, edgecolors='k', linewidth=0.5)
    ax1.axhline(0, color='red', linestyle='--', linewidth=1.5)
    
    # Add LOWESS smoother
    smooth = lowess(e, y_hat, frac=0.5)
    ax1.plot(smooth[:, 0], smooth[:, 1], 'b-', linewidth=2, label='LOWESS')
    
    ax1.set_xlabel('Fitted Values', fontsize=11)
    ax1.set_ylabel('Residuals', fontsize=11)
    ax1.set_title('Residuals vs Fitted', fontsize=12, fontweight='bold')
    ax1.legend()
    
    # Plot 2: Scale-Location Plot
    ax2 = axes[0, 1]
    sqrt_abs_std = np.sqrt(np.abs(std_resid))
    ax2.scatter(y_hat, sqrt_abs_std, alpha=0.6, edgecolors='k', linewidth=0.5)
    
    # LOWESS for trend in spread
    smooth = lowess(sqrt_abs_std, y_hat, frac=0.5)
    ax2.plot(smooth[:, 0], smooth[:, 1], 'r-', linewidth=2, label='Spread trend')
    
    # Reference line (expected value under homoscedasticity)
    # E[sqrt(|Z|)] ≈ 0.798 for standard normal
    ax2.axhline(0.798, color='green', linestyle='--', alpha=0.7, 
                label='Expected under H₀')
    
    ax2.set_xlabel('Fitted Values', fontsize=11)
    ax2.set_ylabel('√|Standardized Residuals|', fontsize=11)
    ax2.set_title('Scale-Location', fontsize=12, fontweight='bold')
    ax2.legend()
    
    # Plot 3: Squared Residuals vs Fitted
    ax3 = axes[1, 0]
    e_squared = e ** 2
    ax3.scatter(y_hat, e_squared, alpha=0.6, edgecolors='k', linewidth=0.5)
    
    smooth = lowess(e_squared, y_hat, frac=0.5)
    ax3.plot(smooth[:, 0], smooth[:, 1], 'r-', linewidth=2, label='LOWESS')
    ax3.axhline(np.mean(e_squared), color='green', linestyle='--', 
                alpha=0.7, label='Mean')
    
    ax3.set_xlabel('Fitted Values', fontsize=11)
    ax3.set_ylabel('Squared Residuals', fontsize=11)
    ax3.set_title('Squared Residuals vs Fitted', fontsize=12, fontweight='bold')
    ax3.legend()
    
    # Plot 4: Residuals vs each predictor would go here
    # For now, show a histogram of squared residuals
    ax4 = axes[1, 1]
    
    # Divide into groups and show variance in each
    n_groups = 5
    sorted_idx = np.argsort(y_hat)
    groups = np.array_split(sorted_idx, n_groups)
    
    group_centers = []
    group_vars = []
    group_sizes = []
    
    for g in groups:
        group_centers.append(np.mean(y_hat[g]))
        group_vars.append(np.var(e[g], ddof=1))
        group_sizes.append(len(g))
    
    ax4.bar(range(n_groups), group_vars, alpha=0.7, edgecolor='k')
    ax4.set_xticks(range(n_groups))
    ax4.set_xticklabels([f'Q{i+1}
({group_centers[i]:.1f})' for i in range(n_groups)])
    ax4.set_xlabel('Quintile (by fitted value)', fontsize=11)
    ax4.set_ylabel('Residual Variance', fontsize=11)
    ax4.set_title('Variance by Fitted Value Quintile', fontsize=12, fontweight='bold')
    
    # Add variance ratio annotation
    var_ratio = max(group_vars) / min(group_vars)
    ax4.annotate(f'Max/Min ratio: {var_ratio:.2f}', 
                xy=(0.95, 0.95), xycoords='axes fraction',
                ha='right', va='top', fontsize=10,
                bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    plt.tight_layout()
    return fig, {'variance_ratio': var_ratio, 'group_variances': group_vars}
 
# Demonstration: Comparing homoscedastic vs heteroscedastic data
np.random.seed(42)
n = 200
x = np.random.uniform(1, 10, n)
 
# Homoscedastic case
y_homo = 2 + 3 * x + np.random.randn(n) * 2  # Constant variance
 
# Heteroscedastic case (variance proportional to x)
y_hetero = 2 + 3 * x + np.random.randn(n) * (0.5 * x)  # Variance increases with x
 
# Fit models
from sklearn.linear_model import LinearRegression
model_homo = LinearRegression()
model_homo.fit(x.reshape(-1, 1), y_homo)
y_hat_homo = model_homo.predict(x.reshape(-1, 1))
e_homo = y_homo - y_hat_homo
 
model_hetero = LinearRegression()
model_hetero.fit(x.reshape(-1, 1), y_hetero)
y_hat_hetero = model_hetero.predict(x.reshape(-1, 1))
e_hetero = y_hetero - y_hat_hetero
 
print("=== HOMOSCEDASTIC CASE ===")
fig1, stats1 = heteroscedasticity_diagnostic_plots(y_homo, y_hat_homo, e_homo)
plt.suptitle('Homoscedastic Residuals', fontsize=14, fontweight='bold', y=1.02)
plt.savefig('hetero_visual_homo.png', dpi=150, bbox_inches='tight')
plt.show()
print(f"Variance ratio (quintiles): {stats1['variance_ratio']:.2f}")
 
print("
=== HETEROSCEDASTIC CASE ===")
fig2, stats2 = heteroscedasticity_diagnostic_plots(y_hetero, y_hat_hetero, e_hetero)
plt.suptitle('Heteroscedastic Residuals (Funnel Pattern)', fontsize=14, fontweight='bold', y=1.02)
plt.savefig('hetero_visual_hetero.png', dpi=150, bbox_inches='tight')
plt.show()
print(f"Variance ratio (quintiles): {stats2['variance_ratio']:.2f}")

Formal Tests for Heteroscedasticity

While visual diagnostics are essential, formal tests provide objective evidence and p-values for reporting. The most widely used tests are Breusch-Pagan and White's test.

Breusch-Pagan Test

The Breusch-Pagan (BP) test examines whether residual variance is related to the regressors.

Procedure:

Fit OLS: $y = X\beta + \varepsilon$, obtain residuals $\hat{e}_i$
Compute scaled squared residuals: $g_i = \hat{e}_i^2 / \hat{\sigma}^2$ where $\hat{\sigma}^2 = \sum \hat{e}_i^2 / n$
Regress $g_i$ on the original predictors $X$: $g = X\gamma + u$
Compute test statistic: $BP = \frac{1}{2} \cdot ESS_g$ (half the explained sum of squares from step 3)

Distribution: Under $H_0$ (homoscedasticity), $BP \sim \chi^2_{p-1}$ where $p$ is the number of regressors including intercept.

Interpretation:

Large $BP$ (small p-value): Reject homoscedasticity
The test is sensitive to non-normality—large samples or normality assumption needed

White's Test

White's test is more general—it doesn't assume a specific form of heteroscedasticity and is robust to non-normality.

Procedure:

Fit OLS, obtain residuals $\hat{e}_i$
Regress $\hat{e}i^2$ on all predictors, their squares, and their pairwise products: $$\hat{e}^2 = \alpha_0 + \sum_j \alpha_j x_j + \sum_j \alpha{jj} x_j^2 + \sum_{j<k} \alpha_{jk} x_j x_k + u$$
Test statistic: $W = n \cdot R^2$ from step 2

Distribution: Under $H_0$, $W \sim \chi^2_q$ where $q$ is the number of regressors in step 2 (excluding intercept).

Advantages:

Detects general forms of heteroscedasticity
Also detects specification errors (nonlinearity)
More robust to non-normal errors

Disadvantage: Loss of power when many predictors (many df consumed by squares and interactions).

Comparison of Heteroscedasticity Tests
Test	Alternative Hypothesis	Robustness	Power	Best Use
Breusch-Pagan	σ² = f(Xγ)	Requires normality (approx.)	High for linear variance	Standard use, small/moderate samples
White	General heteroscedasticity	Robust to non-normality	Lower (many df)	Large samples, general detection
Goldfeld-Quandt	Variance differs across groups	Requires normal errors	High for group differences	When variance suspected to change at cutpoint
Cook-Weisberg	σ² = kf(μ)^δ	Similar to BP	High for specific forms	When variance ∝ mean^δ suspected

heteroscedasticity_tests.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_breuschpagan, het_white
 
def comprehensive_heteroscedasticity_tests(X, y, alpha=0.05):
    """
    Run multiple heteroscedasticity tests with detailed output.
    
    Parameters:
    -----------
    X : ndarray of shape (n, p)
        Design matrix (without intercept—will be added)
    y : ndarray of shape (n,)
        Response variable
    alpha : float
        Significance level
    
    Returns:
    --------
    dict with test results
    """
    n = len(y)
    
    # Add constant for statsmodels
    X_const = sm.add_constant(X)
    
    # Fit OLS
    model = sm.OLS(y, X_const).fit()
    residuals = model.resid
    y_hat = model.fittedvalues
    
    results = {'n': n, 'alpha': alpha}
    
    print("═" * 65)
    print("HETEROSCEDASTICITY TESTS")
    print("═" * 65)
    print(f"Sample size: {n}")
    print(f"Number of predictors: {X.shape[1]}")
    print("═" * 65)
    
    # 1. Breusch-Pagan Test
    bp_lm, bp_lm_pvalue, bp_f, bp_f_pvalue = het_breuschpagan(residuals, X_const)
    results['breusch_pagan'] = {
        'lm_stat': bp_lm,
        'lm_pvalue': bp_lm_pvalue,
        'f_stat': bp_f,
        'f_pvalue': bp_f_pvalue
    }
    bp_verdict = "✗ REJECT H₀" if bp_lm_pvalue < alpha else "✓ FAIL TO REJECT"
    print(f"
Breusch-Pagan Test:")
    print(f"  LM statistic: {bp_lm:.4f}")
    print(f"  LM p-value:   {bp_lm_pvalue:.4f}  {bp_verdict}")
    print(f"  F statistic:  {bp_f:.4f}")
    print(f"  F p-value:    {bp_f_pvalue:.4f}")
    
    # 2. White's Test
    white_lm, white_lm_pvalue, white_f, white_f_pvalue = het_white(residuals, X_const)
    results['white'] = {
        'lm_stat': white_lm,
        'lm_pvalue': white_lm_pvalue,
        'f_stat': white_f,
        'f_pvalue': white_f_pvalue
    }
    white_verdict = "✗ REJECT H₀" if white_lm_pvalue < alpha else "✓ FAIL TO REJECT"
    print(f"
White's Test:")
    print(f"  LM statistic: {white_lm:.4f}")
    print(f"  LM p-value:   {white_lm_pvalue:.4f}  {white_verdict}")
    print(f"  F statistic:  {white_f:.4f}")
    print(f"  F p-value:    {white_f_pvalue:.4f}")
    
    # 3. Goldfeld-Quandt Test (manual implementation)
    # Sort by a single predictor (use first predictor if multiple)
    if X.ndim == 1:
        sort_var = X
    else:
        sort_var = X[:, 0]
    
    sorted_idx = np.argsort(sort_var)
    
    # Exclude middle portion (typically 20%)
    drop_frac = 0.2
    drop_n = int(n * drop_frac)
    n_each = (n - drop_n) // 2
    
    low_idx = sorted_idx[:n_each]
    high_idx = sorted_idx[-n_each:]
    
    # Fit models to each subset
    X_low = X_const[low_idx]
    y_low = y[low_idx]
    X_high = X_const[high_idx]
    y_high = y[high_idx]
    
    model_low = sm.OLS(y_low, X_low).fit()
    model_high = sm.OLS(y_high, X_high).fit()
    
    rss_low = np.sum(model_low.resid ** 2)
    rss_high = np.sum(model_high.resid ** 2)
    
    p = X_const.shape[1]
    df_low = n_each - p
    df_high = n_each - p
    
    # F-statistic: ratio of variances
    if rss_high / df_high > rss_low / df_low:
        f_stat_gq = (rss_high / df_high) / (rss_low / df_low)
        p_value_gq = 2 * (1 - stats.f.cdf(f_stat_gq, df_high, df_low))  # Two-tailed
    else:
        f_stat_gq = (rss_low / df_low) / (rss_high / df_high)
        p_value_gq = 2 * (1 - stats.f.cdf(f_stat_gq, df_low, df_high))
    
    results['goldfeld_quandt'] = {
        'f_stat': f_stat_gq,
        'p_value': p_value_gq,
        'rss_low': rss_low,
        'rss_high': rss_high
    }
    gq_verdict = "✗ REJECT H₀" if p_value_gq < alpha else "✓ FAIL TO REJECT"
    print(f"
Goldfeld-Quandt Test (sorted by x₁):")
    print(f"  F statistic:  {f_stat_gq:.4f}")
    print(f"  p-value:      {p_value_gq:.4f}  {gq_verdict}")
    print(f"  RSS (low):    {rss_low:.4f}")
    print(f"  RSS (high):   {rss_high:.4f}")
    
    # Summary
    print("
" + "═" * 65)
    print("SUMMARY")
    print("═" * 65)
    
    n_reject = sum([
        bp_lm_pvalue < alpha,
        white_lm_pvalue < alpha,
        p_value_gq < alpha
    ])
    
    if n_reject == 0:
        print("No evidence of heteroscedasticity detected.")
        print("Standard OLS inference is appropriate.")
    elif n_reject <= 1:
        print("Weak evidence of heteroscedasticity.")
        print("Consider robust standard errors as precaution.")
    else:
        print("Strong evidence of heteroscedasticity.")
        print("Recommend: Robust SEs, WLS, or variance modeling.")
    
    print("═" * 65)
    
    return results
 
# Demonstration
np.random.seed(42)
n = 200
 
# Homoscedastic data
x = np.random.uniform(1, 10, n)
y_homo = 2 + 3 * x + np.random.randn(n) * 2
 
print("
" + "▓" * 65)
print(" HOMOSCEDASTIC DATA ")
print("▓" * 65)
results_homo = comprehensive_heteroscedasticity_tests(x.reshape(-1, 1), y_homo)
 
# Heteroscedastic data
y_hetero = 2 + 3 * x + np.random.randn(n) * (0.5 * x)
 
print("
" + "▓" * 65)
print(" HETEROSCEDASTIC DATA (variance ∝ x) ")
print("▓" * 65)
results_hetero = comprehensive_heteroscedasticity_tests(x.reshape(-1, 1), y_hetero)

Weighted Least Squares (WLS)

When the variance function is known (or can be estimated), Weighted Least Squares provides an efficient estimator that properly accounts for heteroscedasticity.

The WLS Principle

If we know that $\text{Var}(\varepsilon_i) = \sigma^2 w_i^{-1}$ for known weights $w_i$, we can transform the model:

$$\sqrt{w_i} y_i = \sqrt{w_i} x_i^T \beta + \sqrt{w_i} \varepsilon_i$$

The transformed errors have constant variance $\sigma^2$, so OLS on the transformed model is efficient.

WLS Estimator: $$\hat{\beta}_{WLS} = (X^T W X)^{-1} X^T W y$$

where $W = \text{diag}(w_1, \ldots, w_n)$.

Interpretation: Observations with higher variance get lower weights—they contribute less to the fit because they provide less precise information about the regression function.

Choosing Weights

The challenge is that the true variance function is rarely known. Common approaches:

1. Known Weights Sometimes domain knowledge specifies weights:

Survey data: weights from sampling design
Aggregate data: weight by number of observations per group
Measurement: weight by instrument precision

2. Variance Proportional to Predictor If $\text{Var}(\varepsilon_i) \propto x_i$, use $w_i = 1/x_i$.

3. Iteratively Reweighted Least Squares (IRLS) Estimate weights from residuals:

Fit OLS, obtain residuals $\hat{e}_i$
Regress $\log(\hat{e}_i^2)$ on predictors to estimate variance function
Compute weights from estimated variances
Fit WLS with these weights
Iterate until convergence

Feasible GLS

When weights are estimated from the data (rather than known), the method is called Feasible GLS (FGLS) or Estimated WLS (EWLS). FGLS is consistent but only fully efficient asymptotically—in finite samples, efficiency depends on how well weights are estimated.

weighted_least_squares.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import numpy as np
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
 
def weighted_least_squares_analysis(X, y, weight_type='estimated'):
    """
    Fit WLS and compare to OLS under heteroscedasticity.
    
    Parameters:
    -----------
    X : ndarray of shape (n,) or (n, p)
        Predictor(s)
    y : ndarray of shape (n,)
        Response variable
    weight_type : str
        'known' (variance ∝ X), 'estimated' (from residuals), or array of weights
    
    Returns:
    --------
    dict with OLS and WLS results
    """
    n = len(y)
    
    # Ensure X is 2D
    if X.ndim == 1:
        X = X.reshape(-1, 1)
    
    X_const = sm.add_constant(X)
    
    # ==================
    # Fit OLS first
    # ==================
    model_ols = sm.OLS(y, X_const).fit()
    
    print("═" * 65)
    print("WEIGHTED LEAST SQUARES ANALYSIS")
    print("═" * 65)
    
    print("
─── OLS Results ───")
    print(f"Coefficients: {model_ols.params}")
    print(f"Std Errors:   {model_ols.bse}")
    print(f"R-squared:    {model_ols.rsquared:.4f}")
    
    # ==================
    # Determine weights
    # ==================
    if weight_type == 'known':
        # Assume variance proportional to first predictor
        weights = 1 / X[:, 0]
        weights = weights / weights.mean()  # Normalize
        print("
Weights: Proportional to 1/x (known variance structure)")
        
    elif weight_type == 'estimated':
        # FGLS approach: estimate variance function from residuals
        residuals = model_ols.resid
        
        # Regress log(e²) on predictors
        log_e2 = np.log(residuals ** 2 + 1e-10)  # Small constant for stability
        var_model = sm.OLS(log_e2, X_const).fit()
        
        # Predicted log-variance
        log_var_hat = var_model.fittedvalues
        var_hat = np.exp(log_var_hat)
        
        weights = 1 / var_hat
        weights = weights / weights.mean()  # Normalize
        print("
Weights: Estimated from variance function (FGLS)")
        print(f"Variance model R²: {var_model.rsquared:.4f}")
        
    else:
        # User-provided weights
        weights = np.array(weight_type)
        weights = weights / weights.mean()
        print("
Weights: User-provided")
    
    # ==================
    # Fit WLS
    # ==================
    model_wls = sm.WLS(y, X_const, weights=weights).fit()
    
    print("
─── WLS Results ───")
    print(f"Coefficients: {model_wls.params}")
    print(f"Std Errors:   {model_wls.bse}")
    print(f"R-squared:    {model_wls.rsquared:.4f}")
    
    # ==================
    # Compare Standard Errors
    # ==================
    print("
─── Standard Error Comparison ───")
    for i, name in enumerate(['Intercept'] + [f'β{j+1}' for j in range(X.shape[1])]):
        se_ols = model_ols.bse[i]
        se_wls = model_wls.bse[i]
        ratio = se_wls / se_ols
        print(f"{name:12s}: OLS={se_ols:.4f}, WLS={se_wls:.4f}, Ratio={ratio:.3f}")
    
    # ==================
    # Also compute robust SEs for OLS
    # ==================
    model_ols_robust = model_ols.get_robustcov_results(cov_type='HC3')
    
    print("
─── OLS with Robust (HC3) Standard Errors ───")
    print(f"Coefficients: {model_ols_robust.params}")
    print(f"Robust SEs:   {model_ols_robust.bse}")
    
    return {
        'ols': model_ols,
        'wls': model_wls,
        'ols_robust': model_ols_robust,
        'weights': weights
    }
 
# Demonstration
np.random.seed(42)
n = 200
x = np.random.uniform(1, 10, n)
 
# Generate heteroscedastic data (variance ∝ x)
true_beta = [2, 3]
y = true_beta[0] + true_beta[1] * x + np.random.randn(n) * np.sqrt(x)
 
print("
True model: y = 2 + 3x + ε, where Var(ε) ∝ x")
print(f"True coefficients: β₀ = {true_beta[0]}, β₁ = {true_beta[1]}")
 
# Analysis with known weights
print("
" + "▓" * 65)
print(" ANALYSIS WITH KNOWN WEIGHTS ")
print("▓" * 65)
results_known = weighted_least_squares_analysis(x, y, weight_type='known')
 
# Analysis with estimated weights
print("
" + "▓" * 65)
print(" ANALYSIS WITH ESTIMATED WEIGHTS (FGLS) ")
print("▓" * 65)
results_estimated = weighted_least_squares_analysis(x, y, weight_type='estimated')
 
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 
# Plot 1: Data and fitted lines
ax1 = axes[0]
ax1.scatter(x, y, alpha=0.5, s=20)
x_plot = np.linspace(x.min(), x.max(), 100)
ax1.plot(x_plot, true_beta[0] + true_beta[1] * x_plot, 'k-', 
         linewidth=2, label=f'True: y = {true_beta[0]} + {true_beta[1]}x')
ax1.plot(x_plot, results_known['ols'].params[0] + results_known['ols'].params[1] * x_plot, 
         'b--', linewidth=2, label=f'OLS: y = {results_known["ols"].params[0]:.2f} + {results_known["ols"].params[1]:.2f}x')
ax1.plot(x_plot, results_known['wls'].params[0] + results_known['wls'].params[1] * x_plot, 
         'r--', linewidth=2, label=f'WLS: y = {results_known["wls"].params[0]:.2f} + {results_known["wls"].params[1]:.2f}x')
ax1.set_xlabel('x', fontsize=12)
ax1.set_ylabel('y', fontsize=12)
ax1.set_title('Data and Fitted Lines', fontweight='bold')
ax1.legend()
 
# Plot 2: Weight distribution
ax2 = axes[1]
ax2.scatter(x, results_known['weights'], alpha=0.6)
ax2.set_xlabel('x', fontsize=12)
ax2.set_ylabel('Weight (w)', fontsize=12)
ax2.set_title('WLS Weights (w = 1/x)', fontweight='bold')
 
plt.tight_layout()
plt.savefig('wls_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

Heteroscedasticity-Robust Standard Errors

When heteroscedasticity exists but its form is unknown, we can still obtain valid inference through robust standard errors (also called heteroscedasticity-consistent or HC standard errors, sandwich estimators, or Huber-White standard errors).

The Sandwich Estimator

Recall that under heteroscedasticity, the true variance of $\hat{\beta}$ is: $$\text{Var}(\hat{\beta}) = (X^TX)^{-1} X^T \Omega X (X^TX)^{-1}$$

where $\Omega = \text{diag}(\sigma_1^2, \ldots, \sigma_n^2)$.

This has a "sandwich" structure:

Bread: $(X^TX)^{-1}$ on each side
Meat: $X^T \Omega X$ in the middle

We estimate this by plugging in $\hat{e}_i^2$ for the unknown $\sigma_i^2$, with various corrections.

HC Estimator Variants

HC0 (White's original estimator): $$\hat{\Omega} = \text{diag}(\hat{e}_1^2, \ldots, \hat{e}_n^2)$$

No correction. Tends to underestimate variance in small samples.

HC1 (Degrees of freedom correction): $$\hat{\Omega} = \frac{n}{n-p} \cdot \text{diag}(\hat{e}_1^2, \ldots, \hat{e}_n^2)$$

Simple adjustment for the fact that residuals have mean zero.

HC2 (Leverage correction): $$\hat{\sigma}_i^2 = \frac{\hat{e}i^2}{1 - h{ii}}$$

Accounts for the fact that high-leverage observations have artificially small residuals.

HC3 (Jackknife-like correction): $$\hat{\sigma}_i^2 = \frac{\hat{e}i^2}{(1 - h{ii})^2}$$

More aggressive correction, often best in small samples. This is the recommended default in most applications.

When to Use Robust SEs

A modern best practice is to always report robust standard errors, regardless of heteroscedasticity test results. Under homoscedasticity, robust SEs are similar to classical SEs. Under heteroscedasticity, they're correct while classical SEs are wrong. You lose little and gain protection.

Comparison of HC Estimator Variants
Estimator	Formula	Finite-Sample Performance	Recommendation
HC0	$\hat{e}_i^2$	Downward biased	Avoid in small samples
HC1	$\frac{n}{n-p}\hat{e}_i^2$	Slightly better	Minimal improvement over HC0
HC2	$\frac{\hat{e}i^2}{1-h{ii}}$	Good	Accounts for leverage
HC3	$\frac{\hat{e}i^2}{(1-h{ii})^2}$	Best	Default recommendation

robust_standard_errors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import numpy as np
import statsmodels.api as sm
 
def compare_standard_errors(X, y):
    """
    Compare classical and various robust standard error estimators.
    
    Parameters:
    -----------
    X : ndarray
        Design matrix (without constant)
    y : ndarray
        Response variable
    """
    n = len(y)
    if X.ndim == 1:
        X = X.reshape(-1, 1)
    
    X_const = sm.add_constant(X)
    p = X_const.shape[1]
    
    # Fit OLS
    model = sm.OLS(y, X_const).fit()
    
    # Get different robust SE estimators
    hc0 = model.get_robustcov_results(cov_type='HC0')
    hc1 = model.get_robustcov_results(cov_type='HC1')
    hc2 = model.get_robustcov_results(cov_type='HC2')
    hc3 = model.get_robustcov_results(cov_type='HC3')
    
    print("═" * 70)
    print("STANDARD ERROR COMPARISON")
    print("═" * 70)
    print(f"{'Coefficient':<12} {'Classical':>10} {'HC0':>10} {'HC1':>10} {'HC2':>10} {'HC3':>10}")
    print("─" * 70)
    
    names = ['Intercept'] + [f'β{j+1}' for j in range(p-1)]
    
    for i, name in enumerate(names):
        print(f"{name:<12} {model.bse[i]:>10.4f} {hc0.bse[i]:>10.4f} "
              f"{hc1.bse[i]:>10.4f} {hc2.bse[i]:>10.4f} {hc3.bse[i]:>10.4f}")
    
    print("─" * 70)
    
    # Compute ratios to classical SEs
    print(f"
{'Ratio to Classical SE (values > 1 suggest heteroscedasticity)'}")
    print(f"{'Coefficient':<12} {'HC0':>10} {'HC1':>10} {'HC2':>10} {'HC3':>10}")
    print("─" * 70)
    
    for i, name in enumerate(names):
        print(f"{name:<12} {hc0.bse[i]/model.bse[i]:>10.3f} "
              f"{hc1.bse[i]/model.bse[i]:>10.3f} "
              f"{hc2.bse[i]/model.bse[i]:>10.3f} "
              f"{hc3.bse[i]/model.bse[i]:>10.3f}")
    
    print("═" * 70)
    
    # T-statistics comparison
    print("
T-STATISTICS AND P-VALUES")
    print("═" * 70)
    print(f"{'Coefficient':<12} {'Classical':>15} {'HC3 (Robust)':>15}")
    print("─" * 70)
    
    for i, name in enumerate(names):
        t_class = model.tvalues[i]
        p_class = model.pvalues[i]
        t_hc3 = hc3.tvalues[i]
        p_hc3 = hc3.pvalues[i]
        
        class_sig = "**" if p_class < 0.05 else ""
        hc3_sig = "**" if p_hc3 < 0.05 else ""
        
        print(f"{name:<12} t={t_class:>6.3f} (p={p_class:.4f}){class_sig} "
              f"t={t_hc3:>6.3f} (p={p_hc3:.4f}){hc3_sig:2s}")
    
    print("═" * 70)
    print("** indicates significance at α=0.05")
    
    return {
        'model': model,
        'hc0': hc0,
        'hc1': hc1,
        'hc2': hc2,
        'hc3': hc3
    }
 
# Demonstration with heteroscedastic data
np.random.seed(42)
n = 100
x = np.random.uniform(1, 10, n)
 
# Moderate heteroscedasticity
y = 2 + 3 * x + np.random.randn(n) * (0.5 + 0.3 * x)
 
print("
Data: y = 2 + 3x + ε, where σ(ε) = 0.5 + 0.3x")
print("This creates heteroscedasticity that inflates classical SEs for slope
")
 
results = compare_standard_errors(x, y)
 
# Show practical implication: confidence intervals
print("
" + "═" * 70)
print("95% CONFIDENCE INTERVALS FOR SLOPE")
print("═" * 70)
beta1 = results['model'].params[1]
se_class = results['model'].bse[1]
se_hc3 = results['hc3'].bse[1]
 
ci_class = (beta1 - 1.96 * se_class, beta1 + 1.96 * se_class)
ci_hc3 = (beta1 - 1.96 * se_hc3, beta1 + 1.96 * se_hc3)
 
print(f"Slope estimate: {beta1:.4f}")
print(f"Classical CI:   [{ci_class[0]:.4f}, {ci_class[1]:.4f}]  width = {ci_class[1]-ci_class[0]:.4f}")
print(f"Robust CI:      [{ci_hc3[0]:.4f}, {ci_hc3[1]:.4f}]  width = {ci_hc3[1]-ci_hc3[0]:.4f}")
print(f"CI width ratio: {(ci_hc3[1]-ci_hc3[0]) / (ci_class[1]-ci_class[0]):.3f}")

Summary: Managing Heteroscedasticity

Heteroscedasticity is ubiquitous in real data. Recognizing and properly handling it is essential for valid statistical inference.

Key Takeaways

•Heteroscedasticity corrupts inference, not estimation — OLS coefficients remain unbiased and consistent, but standard errors are wrong, invalidating hypothesis tests and confidence intervals.
•Visual detection is powerful and informative — Residual vs. fitted plots and scale-location plots reveal the pattern of variance change, guiding remedy selection.
•Use formal tests judiciously — Breusch-Pagan and White's test provide p-values, but don't rely solely on significance—effect sizes (variance ratios) matter too.
•WLS is optimal when variance function is known — It regains efficiency by properly weighting observations. With unknown variance, FGLS estimates the weights.
•Robust standard errors are a safe default — HC3 estimators provide valid inference without modeling the variance function. Use them routinely.
•Multiple remedies can be combined — You might use WLS for efficiency and report robust SEs for robustness, or transform the response to stabilize variance.

Page Complete

You now have a comprehensive toolkit for detecting and managing heteroscedasticity. The next page examines influential observations—individual data points that disproportionately affect regression results—and the diagnostic measures (leverage, Cook's distance, DFBETAS) that identify them.

Heteroscedasticity

The Problem of Non-Constant Variance

In the classical linear regression model, we assume: $$\text{Var}(\varepsilon_i | X) = \sigma^2 \quad \text{for all } i$$

Heteroscedasticity is not merely a technical concern. It occurs naturally in many domains:

Income data: Higher earners show greater variability
Company data: Larger firms exhibit more variable outcomes
Time series: Volatility clusters during certain periods
Count data: Variance typically increases with the mean

Understanding heteroscedasticity is essential for building regression models that produce reliable predictions and valid inferences.

What You Will Learn

Mathematical Consequences of Heteroscedasticity

Understanding why heteroscedasticity matters requires examining its effects on OLS properties. The consequences are more subtle than many practitioners realize.

What OLS Retains Under Heteroscedasticity

Unbiasedness is preserved: $$E[\hat{\beta}_{OLS}] = \beta$$

The expected value of OLS estimates remains correct because unbiasedness depends only on:

Linearity: $E[y|X] = X\beta$
Zero mean errors: $E[\varepsilon|X] = 0$

Neither of these involves variance structure.

Consistency is preserved: $$\hat{\beta}_{OLS} \xrightarrow{p} \beta \text{ as } n \to \infty$$

OLS remains consistent regardless of heteroscedasticity.

What OLS Loses Under Heteroscedasticity

1. Efficiency is Lost

The Gauss-Markov theorem guarantees OLS is BLUE (Best Linear Unbiased Estimator) only under homoscedasticity. With heteroscedasticity:

OLS is still linear and unbiased
But it is no longer efficient—other estimators (like WLS) have smaller variance
You're extracting less information from your data than possible

2. Standard Errors are Wrong

This is the critical problem. Under heteroscedasticity, the usual OLS variance formula: $$\text{Var}(\hat{\beta}_{OLS}) = \sigma^2(X^TX)^{-1}$$

is incorrect. The true variance is: $$\text{Var}(\hat{\beta}_{OLS}) = (X^TX)^{-1}X^T \Omega X(X^TX)^{-1}$$

where $\Omega = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2)$ is the diagonal variance matrix.

The Inference Problem

Direction of Bias in Standard Errors

The common pattern is that standard errors are underestimated under heteroscedasticity:

When high-variance observations have high leverage (common with increasing variance across $x$)
OLS downweights the contribution of high-variance points insufficiently
The residual mean square $\hat{\sigma}^2$ doesn't properly capture the variance structure

Result: t-statistics are inflated, p-values are too small, and you falsely reject null hypotheses.

The opposite can occur if high-variance observations happen to have low leverage, but this is less common in practice.

Visual Intuition

Imagine predicting house prices with square footage:

Low-footage homes: relatively consistent prices
High-footage homes: enormous price variability (depends heavily on neighborhood, quality, etc.)

OLS treats all residuals as equally informative, but the low-footage residuals actually contain more precision per observation. Ignoring this wastes information.

Summary: Effects of Heteroscedasticity on OLS
Property	Status	Practical Impact
Unbiasedness	✓ Preserved	Point estimates remain correct on average
Consistency	✓ Preserved	Estimates converge to truth as n → ∞
Efficiency	✗ Lost	Larger variance than necessary; information wasted
Standard Errors	✗ Wrong	Usually underestimated → inflated t-stats
Hypothesis Tests	✗ Invalid	Wrong rejection rates (often too many false positives)
Confidence Intervals	✗ Invalid	Coverage probability ≠ stated level

Visual Detection of Heteroscedasticity

Primary Diagnostic: Residuals vs. Fitted Values

Plot residuals (or squared residuals, or absolute residuals) against fitted values:

Homoscedasticity pattern: Random scatter with constant vertical spread across all $\hat{y}$ values.

Heteroscedasticity patterns:

Fan/Funnel (opening right): Variance increases with predictions—very common in positive-valued data
Funnel (opening left): Variance decreases with predictions—less common
Bowtie/Diamond: Variance peaks in the middle—can indicate modeling issues
Banded patterns: May indicate discrete variance groups

The Scale-Location Plot

For enhanced visual clarity, plot the square root of absolute standardized residuals against fitted values: $$y = \sqrt{|r_i|}$$

where $r_i$ are standardized or studentized residuals. This transformation:

Stabilizes variance under homoscedasticity (the square root of a χ² is more symmetric)
Makes trends in spread more visible
A horizontal trend line indicates constant variance
Upward or downward slopes indicate heteroscedasticity

heteroscedasticity_visual.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from statsmodels.nonparametric.smoothers_lowess import lowess
 
def heteroscedasticity_diagnostic_plots(y, y_hat, residuals, figsize=(14, 10)):
    """
    Comprehensive visual diagnostics for heteroscedasticity.
    
    Parameters:
    -----------
    y : array-like
        Observed response values
    y_hat : array-like
        Fitted values
    residuals : array-like or dict
        Raw residuals or dict from compute_residual_types()
    """
    if isinstance(residuals, dict):
        e = residuals['raw']
        std_resid = residuals.get('internally_studentized', e / np.std(e))
    else:
        e = np.array(residuals)
        std_resid = e / np.std(e, ddof=1)
    
    y_hat = np.array(y_hat)
    n = len(e)
    
    fig, axes = plt.subplots(2, 2, figsize=figsize)
    
    # Plot 1: Residuals vs Fitted
    ax1 = axes[0, 0]
    ax1.scatter(y_hat, e, alpha=0.6, edgecolors='k', linewidth=0.5)
    ax1.axhline(0, color='red', linestyle='--', linewidth=1.5)
    
    # Add LOWESS smoother
    smooth = lowess(e, y_hat, frac=0.5)
    ax1.plot(smooth[:, 0], smooth[:, 1], 'b-', linewidth=2, label='LOWESS')
    
    ax1.set_xlabel('Fitted Values', fontsize=11)
    ax1.set_ylabel('Residuals', fontsize=11)
    ax1.set_title('Residuals vs Fitted', fontsize=12, fontweight='bold')
    ax1.legend()
    
    # Plot 2: Scale-Location Plot
    ax2 = axes[0, 1]
    sqrt_abs_std = np.sqrt(np.abs(std_resid))
    ax2.scatter(y_hat, sqrt_abs_std, alpha=0.6, edgecolors='k', linewidth=0.5)
    
    # LOWESS for trend in spread
    smooth = lowess(sqrt_abs_std, y_hat, frac=0.5)
    ax2.plot(smooth[:, 0], smooth[:, 1], 'r-', linewidth=2, label='Spread trend')
    
    # Reference line (expected value under homoscedasticity)
    # E[sqrt(|Z|)] ≈ 0.798 for standard normal
    ax2.axhline(0.798, color='green', linestyle='--', alpha=0.7, 
                label='Expected under H₀')
    
    ax2.set_xlabel('Fitted Values', fontsize=11)
    ax2.set_ylabel('√|Standardized Residuals|', fontsize=11)
    ax2.set_title('Scale-Location', fontsize=12, fontweight='bold')
    ax2.legend()
    
    # Plot 3: Squared Residuals vs Fitted
    ax3 = axes[1, 0]
    e_squared = e ** 2
    ax3.scatter(y_hat, e_squared, alpha=0.6, edgecolors='k', linewidth=0.5)
    
    smooth = lowess(e_squared, y_hat, frac=0.5)
    ax3.plot(smooth[:, 0], smooth[:, 1], 'r-', linewidth=2, label='LOWESS')
    ax3.axhline(np.mean(e_squared), color='green', linestyle='--', 
                alpha=0.7, label='Mean')
    
    ax3.set_xlabel('Fitted Values', fontsize=11)
    ax3.set_ylabel('Squared Residuals', fontsize=11)
    ax3.set_title('Squared Residuals vs Fitted', fontsize=12, fontweight='bold')
    ax3.legend()
    
    # Plot 4: Residuals vs each predictor would go here
    # For now, show a histogram of squared residuals
    ax4 = axes[1, 1]
    
    # Divide into groups and show variance in each
    n_groups = 5
    sorted_idx = np.argsort(y_hat)
    groups = np.array_split(sorted_idx, n_groups)
    
    group_centers = []
    group_vars = []
    group_sizes = []
    
    for g in groups:
        group_centers.append(np.mean(y_hat[g]))
        group_vars.append(np.var(e[g], ddof=1))
        group_sizes.append(len(g))
    
    ax4.bar(range(n_groups), group_vars, alpha=0.7, edgecolor='k')
    ax4.set_xticks(range(n_groups))
    ax4.set_xticklabels([f'Q{i+1}
({group_centers[i]:.1f})' for i in range(n_groups)])
    ax4.set_xlabel('Quintile (by fitted value)', fontsize=11)
    ax4.set_ylabel('Residual Variance', fontsize=11)
    ax4.set_title('Variance by Fitted Value Quintile', fontsize=12, fontweight='bold')
    
    # Add variance ratio annotation
    var_ratio = max(group_vars) / min(group_vars)
    ax4.annotate(f'Max/Min ratio: {var_ratio:.2f}', 
                xy=(0.95, 0.95), xycoords='axes fraction',
                ha='right', va='top', fontsize=10,
                bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    plt.tight_layout()
    return fig, {'variance_ratio': var_ratio, 'group_variances': group_vars}
 
# Demonstration: Comparing homoscedastic vs heteroscedastic data
np.random.seed(42)
n = 200
x = np.random.uniform(1, 10, n)
 
# Homoscedastic case
y_homo = 2 + 3 * x + np.random.randn(n) * 2  # Constant variance
 
# Heteroscedastic case (variance proportional to x)
y_hetero = 2 + 3 * x + np.random.randn(n) * (0.5 * x)  # Variance increases with x
 
# Fit models
from sklearn.linear_model import LinearRegression
model_homo = LinearRegression()
model_homo.fit(x.reshape(-1, 1), y_homo)
y_hat_homo = model_homo.predict(x.reshape(-1, 1))
e_homo = y_homo - y_hat_homo
 
model_hetero = LinearRegression()
model_hetero.fit(x.reshape(-1, 1), y_hetero)
y_hat_hetero = model_hetero.predict(x.reshape(-1, 1))
e_hetero = y_hetero - y_hat_hetero
 
print("=== HOMOSCEDASTIC CASE ===")
fig1, stats1 = heteroscedasticity_diagnostic_plots(y_homo, y_hat_homo, e_homo)
plt.suptitle('Homoscedastic Residuals', fontsize=14, fontweight='bold', y=1.02)
plt.savefig('hetero_visual_homo.png', dpi=150, bbox_inches='tight')
plt.show()
print(f"Variance ratio (quintiles): {stats1['variance_ratio']:.2f}")
 
print("
=== HETEROSCEDASTIC CASE ===")
fig2, stats2 = heteroscedasticity_diagnostic_plots(y_hetero, y_hat_hetero, e_hetero)
plt.suptitle('Heteroscedastic Residuals (Funnel Pattern)', fontsize=14, fontweight='bold', y=1.02)
plt.savefig('hetero_visual_hetero.png', dpi=150, bbox_inches='tight')
plt.show()
print(f"Variance ratio (quintiles): {stats2['variance_ratio']:.2f}")

Formal Tests for Heteroscedasticity

While visual diagnostics are essential, formal tests provide objective evidence and p-values for reporting. The most widely used tests are Breusch-Pagan and White's test.

Breusch-Pagan Test

The Breusch-Pagan (BP) test examines whether residual variance is related to the regressors.

Procedure:

Fit OLS: $y = X\beta + \varepsilon$, obtain residuals $\hat{e}_i$
Compute scaled squared residuals: $g_i = \hat{e}_i^2 / \hat{\sigma}^2$ where $\hat{\sigma}^2 = \sum \hat{e}_i^2 / n$
Regress $g_i$ on the original predictors $X$: $g = X\gamma + u$
Compute test statistic: $BP = \frac{1}{2} \cdot ESS_g$ (half the explained sum of squares from step 3)

Distribution: Under $H_0$ (homoscedasticity), $BP \sim \chi^2_{p-1}$ where $p$ is the number of regressors including intercept.

Interpretation:

Large $BP$ (small p-value): Reject homoscedasticity
The test is sensitive to non-normality—large samples or normality assumption needed

White's Test

White's test is more general—it doesn't assume a specific form of heteroscedasticity and is robust to non-normality.

Procedure:

Fit OLS, obtain residuals $\hat{e}_i$
Regress $\hat{e}i^2$ on all predictors, their squares, and their pairwise products: $$\hat{e}^2 = \alpha_0 + \sum_j \alpha_j x_j + \sum_j \alpha{jj} x_j^2 + \sum_{j<k} \alpha_{jk} x_j x_k + u$$
Test statistic: $W = n \cdot R^2$ from step 2

Distribution: Under $H_0$, $W \sim \chi^2_q$ where $q$ is the number of regressors in step 2 (excluding intercept).

Advantages:

Detects general forms of heteroscedasticity
Also detects specification errors (nonlinearity)
More robust to non-normal errors

Disadvantage: Loss of power when many predictors (many df consumed by squares and interactions).

Comparison of Heteroscedasticity Tests
Test	Alternative Hypothesis	Robustness	Power	Best Use
Breusch-Pagan	σ² = f(Xγ)	Requires normality (approx.)	High for linear variance	Standard use, small/moderate samples
White	General heteroscedasticity	Robust to non-normality	Lower (many df)	Large samples, general detection
Goldfeld-Quandt	Variance differs across groups	Requires normal errors	High for group differences	When variance suspected to change at cutpoint
Cook-Weisberg	σ² = kf(μ)^δ	Similar to BP	High for specific forms	When variance ∝ mean^δ suspected

heteroscedasticity_tests.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_breuschpagan, het_white
 
def comprehensive_heteroscedasticity_tests(X, y, alpha=0.05):
    """
    Run multiple heteroscedasticity tests with detailed output.
    
    Parameters:
    -----------
    X : ndarray of shape (n, p)
        Design matrix (without intercept—will be added)
    y : ndarray of shape (n,)
        Response variable
    alpha : float
        Significance level
    
    Returns:
    --------
    dict with test results
    """
    n = len(y)
    
    # Add constant for statsmodels
    X_const = sm.add_constant(X)
    
    # Fit OLS
    model = sm.OLS(y, X_const).fit()
    residuals = model.resid
    y_hat = model.fittedvalues
    
    results = {'n': n, 'alpha': alpha}
    
    print("═" * 65)
    print("HETEROSCEDASTICITY TESTS")
    print("═" * 65)
    print(f"Sample size: {n}")
    print(f"Number of predictors: {X.shape[1]}")
    print("═" * 65)
    
    # 1. Breusch-Pagan Test
    bp_lm, bp_lm_pvalue, bp_f, bp_f_pvalue = het_breuschpagan(residuals, X_const)
    results['breusch_pagan'] = {
        'lm_stat': bp_lm,
        'lm_pvalue': bp_lm_pvalue,
        'f_stat': bp_f,
        'f_pvalue': bp_f_pvalue
    }
    bp_verdict = "✗ REJECT H₀" if bp_lm_pvalue < alpha else "✓ FAIL TO REJECT"
    print(f"
Breusch-Pagan Test:")
    print(f"  LM statistic: {bp_lm:.4f}")
    print(f"  LM p-value:   {bp_lm_pvalue:.4f}  {bp_verdict}")
    print(f"  F statistic:  {bp_f:.4f}")
    print(f"  F p-value:    {bp_f_pvalue:.4f}")
    
    # 2. White's Test
    white_lm, white_lm_pvalue, white_f, white_f_pvalue = het_white(residuals, X_const)
    results['white'] = {
        'lm_stat': white_lm,
        'lm_pvalue': white_lm_pvalue,
        'f_stat': white_f,
        'f_pvalue': white_f_pvalue
    }
    white_verdict = "✗ REJECT H₀" if white_lm_pvalue < alpha else "✓ FAIL TO REJECT"
    print(f"
White's Test:")
    print(f"  LM statistic: {white_lm:.4f}")
    print(f"  LM p-value:   {white_lm_pvalue:.4f}  {white_verdict}")
    print(f"  F statistic:  {white_f:.4f}")
    print(f"  F p-value:    {white_f_pvalue:.4f}")
    
    # 3. Goldfeld-Quandt Test (manual implementation)
    # Sort by a single predictor (use first predictor if multiple)
    if X.ndim == 1:
        sort_var = X
    else:
        sort_var = X[:, 0]
    
    sorted_idx = np.argsort(sort_var)
    
    # Exclude middle portion (typically 20%)
    drop_frac = 0.2
    drop_n = int(n * drop_frac)
    n_each = (n - drop_n) // 2
    
    low_idx = sorted_idx[:n_each]
    high_idx = sorted_idx[-n_each:]
    
    # Fit models to each subset
    X_low = X_const[low_idx]
    y_low = y[low_idx]
    X_high = X_const[high_idx]
    y_high = y[high_idx]
    
    model_low = sm.OLS(y_low, X_low).fit()
    model_high = sm.OLS(y_high, X_high).fit()
    
    rss_low = np.sum(model_low.resid ** 2)
    rss_high = np.sum(model_high.resid ** 2)
    
    p = X_const.shape[1]
    df_low = n_each - p
    df_high = n_each - p
    
    # F-statistic: ratio of variances
    if rss_high / df_high > rss_low / df_low:
        f_stat_gq = (rss_high / df_high) / (rss_low / df_low)
        p_value_gq = 2 * (1 - stats.f.cdf(f_stat_gq, df_high, df_low))  # Two-tailed
    else:
        f_stat_gq = (rss_low / df_low) / (rss_high / df_high)
        p_value_gq = 2 * (1 - stats.f.cdf(f_stat_gq, df_low, df_high))
    
    results['goldfeld_quandt'] = {
        'f_stat': f_stat_gq,
        'p_value': p_value_gq,
        'rss_low': rss_low,
        'rss_high': rss_high
    }
    gq_verdict = "✗ REJECT H₀" if p_value_gq < alpha else "✓ FAIL TO REJECT"
    print(f"
Goldfeld-Quandt Test (sorted by x₁):")
    print(f"  F statistic:  {f_stat_gq:.4f}")
    print(f"  p-value:      {p_value_gq:.4f}  {gq_verdict}")
    print(f"  RSS (low):    {rss_low:.4f}")
    print(f"  RSS (high):   {rss_high:.4f}")
    
    # Summary
    print("
" + "═" * 65)
    print("SUMMARY")
    print("═" * 65)
    
    n_reject = sum([
        bp_lm_pvalue < alpha,
        white_lm_pvalue < alpha,
        p_value_gq < alpha
    ])
    
    if n_reject == 0:
        print("No evidence of heteroscedasticity detected.")
        print("Standard OLS inference is appropriate.")
    elif n_reject <= 1:
        print("Weak evidence of heteroscedasticity.")
        print("Consider robust standard errors as precaution.")
    else:
        print("Strong evidence of heteroscedasticity.")
        print("Recommend: Robust SEs, WLS, or variance modeling.")
    
    print("═" * 65)
    
    return results
 
# Demonstration
np.random.seed(42)
n = 200
 
# Homoscedastic data
x = np.random.uniform(1, 10, n)
y_homo = 2 + 3 * x + np.random.randn(n) * 2
 
print("
" + "▓" * 65)
print(" HOMOSCEDASTIC DATA ")
print("▓" * 65)
results_homo = comprehensive_heteroscedasticity_tests(x.reshape(-1, 1), y_homo)
 
# Heteroscedastic data
y_hetero = 2 + 3 * x + np.random.randn(n) * (0.5 * x)
 
print("
" + "▓" * 65)
print(" HETEROSCEDASTIC DATA (variance ∝ x) ")
print("▓" * 65)
results_hetero = comprehensive_heteroscedasticity_tests(x.reshape(-1, 1), y_hetero)

Weighted Least Squares (WLS)

When the variance function is known (or can be estimated), Weighted Least Squares provides an efficient estimator that properly accounts for heteroscedasticity.

The WLS Principle

If we know that $\text{Var}(\varepsilon_i) = \sigma^2 w_i^{-1}$ for known weights $w_i$, we can transform the model:

$$\sqrt{w_i} y_i = \sqrt{w_i} x_i^T \beta + \sqrt{w_i} \varepsilon_i$$

The transformed errors have constant variance $\sigma^2$, so OLS on the transformed model is efficient.

WLS Estimator: $$\hat{\beta}_{WLS} = (X^T W X)^{-1} X^T W y$$

where $W = \text{diag}(w_1, \ldots, w_n)$.

Interpretation: Observations with higher variance get lower weights—they contribute less to the fit because they provide less precise information about the regression function.

Choosing Weights

The challenge is that the true variance function is rarely known. Common approaches:

1. Known Weights Sometimes domain knowledge specifies weights:

Survey data: weights from sampling design
Aggregate data: weight by number of observations per group
Measurement: weight by instrument precision

2. Variance Proportional to Predictor If $\text{Var}(\varepsilon_i) \propto x_i$, use $w_i = 1/x_i$.

3. Iteratively Reweighted Least Squares (IRLS) Estimate weights from residuals:

Fit OLS, obtain residuals $\hat{e}_i$
Regress $\log(\hat{e}_i^2)$ on predictors to estimate variance function
Compute weights from estimated variances
Fit WLS with these weights
Iterate until convergence

Feasible GLS

weighted_least_squares.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import numpy as np
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
 
def weighted_least_squares_analysis(X, y, weight_type='estimated'):
    """
    Fit WLS and compare to OLS under heteroscedasticity.
    
    Parameters:
    -----------
    X : ndarray of shape (n,) or (n, p)
        Predictor(s)
    y : ndarray of shape (n,)
        Response variable
    weight_type : str
        'known' (variance ∝ X), 'estimated' (from residuals), or array of weights
    
    Returns:
    --------
    dict with OLS and WLS results
    """
    n = len(y)
    
    # Ensure X is 2D
    if X.ndim == 1:
        X = X.reshape(-1, 1)
    
    X_const = sm.add_constant(X)
    
    # ==================
    # Fit OLS first
    # ==================
    model_ols = sm.OLS(y, X_const).fit()
    
    print("═" * 65)
    print("WEIGHTED LEAST SQUARES ANALYSIS")
    print("═" * 65)
    
    print("
─── OLS Results ───")
    print(f"Coefficients: {model_ols.params}")
    print(f"Std Errors:   {model_ols.bse}")
    print(f"R-squared:    {model_ols.rsquared:.4f}")
    
    # ==================
    # Determine weights
    # ==================
    if weight_type == 'known':
        # Assume variance proportional to first predictor
        weights = 1 / X[:, 0]
        weights = weights / weights.mean()  # Normalize
        print("
Weights: Proportional to 1/x (known variance structure)")
        
    elif weight_type == 'estimated':
        # FGLS approach: estimate variance function from residuals
        residuals = model_ols.resid
        
        # Regress log(e²) on predictors
        log_e2 = np.log(residuals ** 2 + 1e-10)  # Small constant for stability
        var_model = sm.OLS(log_e2, X_const).fit()
        
        # Predicted log-variance
        log_var_hat = var_model.fittedvalues
        var_hat = np.exp(log_var_hat)
        
        weights = 1 / var_hat
        weights = weights / weights.mean()  # Normalize
        print("
Weights: Estimated from variance function (FGLS)")
        print(f"Variance model R²: {var_model.rsquared:.4f}")
        
    else:
        # User-provided weights
        weights = np.array(weight_type)
        weights = weights / weights.mean()
        print("
Weights: User-provided")
    
    # ==================
    # Fit WLS
    # ==================
    model_wls = sm.WLS(y, X_const, weights=weights).fit()
    
    print("
─── WLS Results ───")
    print(f"Coefficients: {model_wls.params}")
    print(f"Std Errors:   {model_wls.bse}")
    print(f"R-squared:    {model_wls.rsquared:.4f}")
    
    # ==================
    # Compare Standard Errors
    # ==================
    print("
─── Standard Error Comparison ───")
    for i, name in enumerate(['Intercept'] + [f'β{j+1}' for j in range(X.shape[1])]):
        se_ols = model_ols.bse[i]
        se_wls = model_wls.bse[i]
        ratio = se_wls / se_ols
        print(f"{name:12s}: OLS={se_ols:.4f}, WLS={se_wls:.4f}, Ratio={ratio:.3f}")
    
    # ==================
    # Also compute robust SEs for OLS
    # ==================
    model_ols_robust = model_ols.get_robustcov_results(cov_type='HC3')
    
    print("
─── OLS with Robust (HC3) Standard Errors ───")
    print(f"Coefficients: {model_ols_robust.params}")
    print(f"Robust SEs:   {model_ols_robust.bse}")
    
    return {
        'ols': model_ols,
        'wls': model_wls,
        'ols_robust': model_ols_robust,
        'weights': weights
    }
 
# Demonstration
np.random.seed(42)
n = 200
x = np.random.uniform(1, 10, n)
 
# Generate heteroscedastic data (variance ∝ x)
true_beta = [2, 3]
y = true_beta[0] + true_beta[1] * x + np.random.randn(n) * np.sqrt(x)
 
print("
True model: y = 2 + 3x + ε, where Var(ε) ∝ x")
print(f"True coefficients: β₀ = {true_beta[0]}, β₁ = {true_beta[1]}")
 
# Analysis with known weights
print("
" + "▓" * 65)
print(" ANALYSIS WITH KNOWN WEIGHTS ")
print("▓" * 65)
results_known = weighted_least_squares_analysis(x, y, weight_type='known')
 
# Analysis with estimated weights
print("
" + "▓" * 65)
print(" ANALYSIS WITH ESTIMATED WEIGHTS (FGLS) ")
print("▓" * 65)
results_estimated = weighted_least_squares_analysis(x, y, weight_type='estimated')
 
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
 
# Plot 1: Data and fitted lines
ax1 = axes[0]
ax1.scatter(x, y, alpha=0.5, s=20)
x_plot = np.linspace(x.min(), x.max(), 100)
ax1.plot(x_plot, true_beta[0] + true_beta[1] * x_plot, 'k-', 
         linewidth=2, label=f'True: y = {true_beta[0]} + {true_beta[1]}x')
ax1.plot(x_plot, results_known['ols'].params[0] + results_known['ols'].params[1] * x_plot, 
         'b--', linewidth=2, label=f'OLS: y = {results_known["ols"].params[0]:.2f} + {results_known["ols"].params[1]:.2f}x')
ax1.plot(x_plot, results_known['wls'].params[0] + results_known['wls'].params[1] * x_plot, 
         'r--', linewidth=2, label=f'WLS: y = {results_known["wls"].params[0]:.2f} + {results_known["wls"].params[1]:.2f}x')
ax1.set_xlabel('x', fontsize=12)
ax1.set_ylabel('y', fontsize=12)
ax1.set_title('Data and Fitted Lines', fontweight='bold')
ax1.legend()
 
# Plot 2: Weight distribution
ax2 = axes[1]
ax2.scatter(x, results_known['weights'], alpha=0.6)
ax2.set_xlabel('x', fontsize=12)
ax2.set_ylabel('Weight (w)', fontsize=12)
ax2.set_title('WLS Weights (w = 1/x)', fontweight='bold')
 
plt.tight_layout()
plt.savefig('wls_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

Heteroscedasticity-Robust Standard Errors

The Sandwich Estimator

Recall that under heteroscedasticity, the true variance of $\hat{\beta}$ is: $$\text{Var}(\hat{\beta}) = (X^TX)^{-1} X^T \Omega X (X^TX)^{-1}$$

where $\Omega = \text{diag}(\sigma_1^2, \ldots, \sigma_n^2)$.

This has a "sandwich" structure:

Bread: $(X^TX)^{-1}$ on each side
Meat: $X^T \Omega X$ in the middle