Loading content...
In the classical linear regression model, we assume: $$\text{Var}(\varepsilon_i | X) = \sigma^2 \quad \text{for all } i$$
This homoscedasticity (constant variance) assumption underpins the efficiency of OLS and the validity of standard inference procedures. When it fails—when variance changes systematically across observations—we have heteroscedasticity.
Heteroscedasticity is not merely a technical concern. It occurs naturally in many domains:
Understanding heteroscedasticity is essential for building regression models that produce reliable predictions and valid inferences.
This page covers: (1) The mathematical consequences of heteroscedasticity on OLS; (2) Visual detection through residual plots and spread-location plots; (3) Formal tests—Breusch-Pagan, White, Goldfeld-Quandt; (4) Remedial strategies including WLS and robust standard errors; and (5) Practical guidance on when each remedy is appropriate.
Understanding why heteroscedasticity matters requires examining its effects on OLS properties. The consequences are more subtle than many practitioners realize.
Unbiasedness is preserved: $$E[\hat{\beta}_{OLS}] = \beta$$
The expected value of OLS estimates remains correct because unbiasedness depends only on:
Neither of these involves variance structure.
Consistency is preserved: $$\hat{\beta}_{OLS} \xrightarrow{p} \beta \text{ as } n \to \infty$$
OLS remains consistent regardless of heteroscedasticity.
1. Efficiency is Lost
The Gauss-Markov theorem guarantees OLS is BLUE (Best Linear Unbiased Estimator) only under homoscedasticity. With heteroscedasticity:
2. Standard Errors are Wrong
This is the critical problem. Under heteroscedasticity, the usual OLS variance formula: $$\text{Var}(\hat{\beta}_{OLS}) = \sigma^2(X^TX)^{-1}$$
is incorrect. The true variance is: $$\text{Var}(\hat{\beta}_{OLS}) = (X^TX)^{-1}X^T \Omega X(X^TX)^{-1}$$
where $\Omega = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_n^2)$ is the diagonal variance matrix.
Using wrong standard errors cascades into all inference: t-statistics, p-values, confidence intervals, and F-tests are all invalid. You might reject null hypotheses you shouldn't, or fail to detect real effects. Heteroscedasticity doesn't bias your estimates—it makes you draw wrong conclusions about their precision.
The common pattern is that standard errors are underestimated under heteroscedasticity:
Result: t-statistics are inflated, p-values are too small, and you falsely reject null hypotheses.
The opposite can occur if high-variance observations happen to have low leverage, but this is less common in practice.
Imagine predicting house prices with square footage:
OLS treats all residuals as equally informative, but the low-footage residuals actually contain more precision per observation. Ignoring this wastes information.
| Property | Status | Practical Impact |
|---|---|---|
| Unbiasedness | ✓ Preserved | Point estimates remain correct on average |
| Consistency | ✓ Preserved | Estimates converge to truth as n → ∞ |
| Efficiency | ✗ Lost | Larger variance than necessary; information wasted |
| Standard Errors | ✗ Wrong | Usually underestimated → inflated t-stats |
| Hypothesis Tests | ✗ Invalid | Wrong rejection rates (often too many false positives) |
| Confidence Intervals | ✗ Invalid | Coverage probability ≠ stated level |
Visual diagnosis of heteroscedasticity is often more informative than formal tests because it reveals the structure of the variance function—essential information for choosing appropriate remedies.
Plot residuals (or squared residuals, or absolute residuals) against fitted values:
Homoscedasticity pattern: Random scatter with constant vertical spread across all $\hat{y}$ values.
Heteroscedasticity patterns:
For enhanced visual clarity, plot the square root of absolute standardized residuals against fitted values: $$y = \sqrt{|r_i|}$$
where $r_i$ are standardized or studentized residuals. This transformation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151
import numpy as npimport matplotlib.pyplot as pltfrom scipy import statsfrom statsmodels.nonparametric.smoothers_lowess import lowess def heteroscedasticity_diagnostic_plots(y, y_hat, residuals, figsize=(14, 10)): """ Comprehensive visual diagnostics for heteroscedasticity. Parameters: ----------- y : array-like Observed response values y_hat : array-like Fitted values residuals : array-like or dict Raw residuals or dict from compute_residual_types() """ if isinstance(residuals, dict): e = residuals['raw'] std_resid = residuals.get('internally_studentized', e / np.std(e)) else: e = np.array(residuals) std_resid = e / np.std(e, ddof=1) y_hat = np.array(y_hat) n = len(e) fig, axes = plt.subplots(2, 2, figsize=figsize) # Plot 1: Residuals vs Fitted ax1 = axes[0, 0] ax1.scatter(y_hat, e, alpha=0.6, edgecolors='k', linewidth=0.5) ax1.axhline(0, color='red', linestyle='--', linewidth=1.5) # Add LOWESS smoother smooth = lowess(e, y_hat, frac=0.5) ax1.plot(smooth[:, 0], smooth[:, 1], 'b-', linewidth=2, label='LOWESS') ax1.set_xlabel('Fitted Values', fontsize=11) ax1.set_ylabel('Residuals', fontsize=11) ax1.set_title('Residuals vs Fitted', fontsize=12, fontweight='bold') ax1.legend() # Plot 2: Scale-Location Plot ax2 = axes[0, 1] sqrt_abs_std = np.sqrt(np.abs(std_resid)) ax2.scatter(y_hat, sqrt_abs_std, alpha=0.6, edgecolors='k', linewidth=0.5) # LOWESS for trend in spread smooth = lowess(sqrt_abs_std, y_hat, frac=0.5) ax2.plot(smooth[:, 0], smooth[:, 1], 'r-', linewidth=2, label='Spread trend') # Reference line (expected value under homoscedasticity) # E[sqrt(|Z|)] ≈ 0.798 for standard normal ax2.axhline(0.798, color='green', linestyle='--', alpha=0.7, label='Expected under H₀') ax2.set_xlabel('Fitted Values', fontsize=11) ax2.set_ylabel('√|Standardized Residuals|', fontsize=11) ax2.set_title('Scale-Location', fontsize=12, fontweight='bold') ax2.legend() # Plot 3: Squared Residuals vs Fitted ax3 = axes[1, 0] e_squared = e ** 2 ax3.scatter(y_hat, e_squared, alpha=0.6, edgecolors='k', linewidth=0.5) smooth = lowess(e_squared, y_hat, frac=0.5) ax3.plot(smooth[:, 0], smooth[:, 1], 'r-', linewidth=2, label='LOWESS') ax3.axhline(np.mean(e_squared), color='green', linestyle='--', alpha=0.7, label='Mean') ax3.set_xlabel('Fitted Values', fontsize=11) ax3.set_ylabel('Squared Residuals', fontsize=11) ax3.set_title('Squared Residuals vs Fitted', fontsize=12, fontweight='bold') ax3.legend() # Plot 4: Residuals vs each predictor would go here # For now, show a histogram of squared residuals ax4 = axes[1, 1] # Divide into groups and show variance in each n_groups = 5 sorted_idx = np.argsort(y_hat) groups = np.array_split(sorted_idx, n_groups) group_centers = [] group_vars = [] group_sizes = [] for g in groups: group_centers.append(np.mean(y_hat[g])) group_vars.append(np.var(e[g], ddof=1)) group_sizes.append(len(g)) ax4.bar(range(n_groups), group_vars, alpha=0.7, edgecolor='k') ax4.set_xticks(range(n_groups)) ax4.set_xticklabels([f'Q{i+1}({group_centers[i]:.1f})' for i in range(n_groups)]) ax4.set_xlabel('Quintile (by fitted value)', fontsize=11) ax4.set_ylabel('Residual Variance', fontsize=11) ax4.set_title('Variance by Fitted Value Quintile', fontsize=12, fontweight='bold') # Add variance ratio annotation var_ratio = max(group_vars) / min(group_vars) ax4.annotate(f'Max/Min ratio: {var_ratio:.2f}', xy=(0.95, 0.95), xycoords='axes fraction', ha='right', va='top', fontsize=10, bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8)) plt.tight_layout() return fig, {'variance_ratio': var_ratio, 'group_variances': group_vars} # Demonstration: Comparing homoscedastic vs heteroscedastic datanp.random.seed(42)n = 200x = np.random.uniform(1, 10, n) # Homoscedastic casey_homo = 2 + 3 * x + np.random.randn(n) * 2 # Constant variance # Heteroscedastic case (variance proportional to x)y_hetero = 2 + 3 * x + np.random.randn(n) * (0.5 * x) # Variance increases with x # Fit modelsfrom sklearn.linear_model import LinearRegressionmodel_homo = LinearRegression()model_homo.fit(x.reshape(-1, 1), y_homo)y_hat_homo = model_homo.predict(x.reshape(-1, 1))e_homo = y_homo - y_hat_homo model_hetero = LinearRegression()model_hetero.fit(x.reshape(-1, 1), y_hetero)y_hat_hetero = model_hetero.predict(x.reshape(-1, 1))e_hetero = y_hetero - y_hat_hetero print("=== HOMOSCEDASTIC CASE ===")fig1, stats1 = heteroscedasticity_diagnostic_plots(y_homo, y_hat_homo, e_homo)plt.suptitle('Homoscedastic Residuals', fontsize=14, fontweight='bold', y=1.02)plt.savefig('hetero_visual_homo.png', dpi=150, bbox_inches='tight')plt.show()print(f"Variance ratio (quintiles): {stats1['variance_ratio']:.2f}") print("=== HETEROSCEDASTIC CASE ===")fig2, stats2 = heteroscedasticity_diagnostic_plots(y_hetero, y_hat_hetero, e_hetero)plt.suptitle('Heteroscedastic Residuals (Funnel Pattern)', fontsize=14, fontweight='bold', y=1.02)plt.savefig('hetero_visual_hetero.png', dpi=150, bbox_inches='tight')plt.show()print(f"Variance ratio (quintiles): {stats2['variance_ratio']:.2f}")While visual diagnostics are essential, formal tests provide objective evidence and p-values for reporting. The most widely used tests are Breusch-Pagan and White's test.
The Breusch-Pagan (BP) test examines whether residual variance is related to the regressors.
Procedure:
Distribution: Under $H_0$ (homoscedasticity), $BP \sim \chi^2_{p-1}$ where $p$ is the number of regressors including intercept.
Interpretation:
White's test is more general—it doesn't assume a specific form of heteroscedasticity and is robust to non-normality.
Procedure:
Distribution: Under $H_0$, $W \sim \chi^2_q$ where $q$ is the number of regressors in step 2 (excluding intercept).
Advantages:
Disadvantage: Loss of power when many predictors (many df consumed by squares and interactions).
| Test | Alternative Hypothesis | Robustness | Power | Best Use |
|---|---|---|---|---|
| Breusch-Pagan | σ² = f(Xγ) | Requires normality (approx.) | High for linear variance | Standard use, small/moderate samples |
| White | General heteroscedasticity | Robust to non-normality | Lower (many df) | Large samples, general detection |
| Goldfeld-Quandt | Variance differs across groups | Requires normal errors | High for group differences | When variance suspected to change at cutpoint |
| Cook-Weisberg | σ² = kf(μ)^δ | Similar to BP | High for specific forms | When variance ∝ mean^δ suspected |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176
import numpy as npfrom scipy import statsimport statsmodels.api as smfrom statsmodels.stats.diagnostic import het_breuschpagan, het_white def comprehensive_heteroscedasticity_tests(X, y, alpha=0.05): """ Run multiple heteroscedasticity tests with detailed output. Parameters: ----------- X : ndarray of shape (n, p) Design matrix (without intercept—will be added) y : ndarray of shape (n,) Response variable alpha : float Significance level Returns: -------- dict with test results """ n = len(y) # Add constant for statsmodels X_const = sm.add_constant(X) # Fit OLS model = sm.OLS(y, X_const).fit() residuals = model.resid y_hat = model.fittedvalues results = {'n': n, 'alpha': alpha} print("═" * 65) print("HETEROSCEDASTICITY TESTS") print("═" * 65) print(f"Sample size: {n}") print(f"Number of predictors: {X.shape[1]}") print("═" * 65) # 1. Breusch-Pagan Test bp_lm, bp_lm_pvalue, bp_f, bp_f_pvalue = het_breuschpagan(residuals, X_const) results['breusch_pagan'] = { 'lm_stat': bp_lm, 'lm_pvalue': bp_lm_pvalue, 'f_stat': bp_f, 'f_pvalue': bp_f_pvalue } bp_verdict = "✗ REJECT H₀" if bp_lm_pvalue < alpha else "✓ FAIL TO REJECT" print(f"Breusch-Pagan Test:") print(f" LM statistic: {bp_lm:.4f}") print(f" LM p-value: {bp_lm_pvalue:.4f} {bp_verdict}") print(f" F statistic: {bp_f:.4f}") print(f" F p-value: {bp_f_pvalue:.4f}") # 2. White's Test white_lm, white_lm_pvalue, white_f, white_f_pvalue = het_white(residuals, X_const) results['white'] = { 'lm_stat': white_lm, 'lm_pvalue': white_lm_pvalue, 'f_stat': white_f, 'f_pvalue': white_f_pvalue } white_verdict = "✗ REJECT H₀" if white_lm_pvalue < alpha else "✓ FAIL TO REJECT" print(f"White's Test:") print(f" LM statistic: {white_lm:.4f}") print(f" LM p-value: {white_lm_pvalue:.4f} {white_verdict}") print(f" F statistic: {white_f:.4f}") print(f" F p-value: {white_f_pvalue:.4f}") # 3. Goldfeld-Quandt Test (manual implementation) # Sort by a single predictor (use first predictor if multiple) if X.ndim == 1: sort_var = X else: sort_var = X[:, 0] sorted_idx = np.argsort(sort_var) # Exclude middle portion (typically 20%) drop_frac = 0.2 drop_n = int(n * drop_frac) n_each = (n - drop_n) // 2 low_idx = sorted_idx[:n_each] high_idx = sorted_idx[-n_each:] # Fit models to each subset X_low = X_const[low_idx] y_low = y[low_idx] X_high = X_const[high_idx] y_high = y[high_idx] model_low = sm.OLS(y_low, X_low).fit() model_high = sm.OLS(y_high, X_high).fit() rss_low = np.sum(model_low.resid ** 2) rss_high = np.sum(model_high.resid ** 2) p = X_const.shape[1] df_low = n_each - p df_high = n_each - p # F-statistic: ratio of variances if rss_high / df_high > rss_low / df_low: f_stat_gq = (rss_high / df_high) / (rss_low / df_low) p_value_gq = 2 * (1 - stats.f.cdf(f_stat_gq, df_high, df_low)) # Two-tailed else: f_stat_gq = (rss_low / df_low) / (rss_high / df_high) p_value_gq = 2 * (1 - stats.f.cdf(f_stat_gq, df_low, df_high)) results['goldfeld_quandt'] = { 'f_stat': f_stat_gq, 'p_value': p_value_gq, 'rss_low': rss_low, 'rss_high': rss_high } gq_verdict = "✗ REJECT H₀" if p_value_gq < alpha else "✓ FAIL TO REJECT" print(f"Goldfeld-Quandt Test (sorted by x₁):") print(f" F statistic: {f_stat_gq:.4f}") print(f" p-value: {p_value_gq:.4f} {gq_verdict}") print(f" RSS (low): {rss_low:.4f}") print(f" RSS (high): {rss_high:.4f}") # Summary print("" + "═" * 65) print("SUMMARY") print("═" * 65) n_reject = sum([ bp_lm_pvalue < alpha, white_lm_pvalue < alpha, p_value_gq < alpha ]) if n_reject == 0: print("No evidence of heteroscedasticity detected.") print("Standard OLS inference is appropriate.") elif n_reject <= 1: print("Weak evidence of heteroscedasticity.") print("Consider robust standard errors as precaution.") else: print("Strong evidence of heteroscedasticity.") print("Recommend: Robust SEs, WLS, or variance modeling.") print("═" * 65) return results # Demonstrationnp.random.seed(42)n = 200 # Homoscedastic datax = np.random.uniform(1, 10, n)y_homo = 2 + 3 * x + np.random.randn(n) * 2 print("" + "▓" * 65)print(" HOMOSCEDASTIC DATA ")print("▓" * 65)results_homo = comprehensive_heteroscedasticity_tests(x.reshape(-1, 1), y_homo) # Heteroscedastic datay_hetero = 2 + 3 * x + np.random.randn(n) * (0.5 * x) print("" + "▓" * 65)print(" HETEROSCEDASTIC DATA (variance ∝ x) ")print("▓" * 65)results_hetero = comprehensive_heteroscedasticity_tests(x.reshape(-1, 1), y_hetero)When the variance function is known (or can be estimated), Weighted Least Squares provides an efficient estimator that properly accounts for heteroscedasticity.
If we know that $\text{Var}(\varepsilon_i) = \sigma^2 w_i^{-1}$ for known weights $w_i$, we can transform the model:
$$\sqrt{w_i} y_i = \sqrt{w_i} x_i^T \beta + \sqrt{w_i} \varepsilon_i$$
The transformed errors have constant variance $\sigma^2$, so OLS on the transformed model is efficient.
WLS Estimator: $$\hat{\beta}_{WLS} = (X^T W X)^{-1} X^T W y$$
where $W = \text{diag}(w_1, \ldots, w_n)$.
Interpretation: Observations with higher variance get lower weights—they contribute less to the fit because they provide less precise information about the regression function.
The challenge is that the true variance function is rarely known. Common approaches:
1. Known Weights Sometimes domain knowledge specifies weights:
2. Variance Proportional to Predictor If $\text{Var}(\varepsilon_i) \propto x_i$, use $w_i = 1/x_i$.
3. Iteratively Reweighted Least Squares (IRLS) Estimate weights from residuals:
When weights are estimated from the data (rather than known), the method is called Feasible GLS (FGLS) or Estimated WLS (EWLS). FGLS is consistent but only fully efficient asymptotically—in finite samples, efficiency depends on how well weights are estimated.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
import numpy as npfrom scipy import statsimport statsmodels.api as smimport matplotlib.pyplot as plt def weighted_least_squares_analysis(X, y, weight_type='estimated'): """ Fit WLS and compare to OLS under heteroscedasticity. Parameters: ----------- X : ndarray of shape (n,) or (n, p) Predictor(s) y : ndarray of shape (n,) Response variable weight_type : str 'known' (variance ∝ X), 'estimated' (from residuals), or array of weights Returns: -------- dict with OLS and WLS results """ n = len(y) # Ensure X is 2D if X.ndim == 1: X = X.reshape(-1, 1) X_const = sm.add_constant(X) # ================== # Fit OLS first # ================== model_ols = sm.OLS(y, X_const).fit() print("═" * 65) print("WEIGHTED LEAST SQUARES ANALYSIS") print("═" * 65) print("─── OLS Results ───") print(f"Coefficients: {model_ols.params}") print(f"Std Errors: {model_ols.bse}") print(f"R-squared: {model_ols.rsquared:.4f}") # ================== # Determine weights # ================== if weight_type == 'known': # Assume variance proportional to first predictor weights = 1 / X[:, 0] weights = weights / weights.mean() # Normalize print("Weights: Proportional to 1/x (known variance structure)") elif weight_type == 'estimated': # FGLS approach: estimate variance function from residuals residuals = model_ols.resid # Regress log(e²) on predictors log_e2 = np.log(residuals ** 2 + 1e-10) # Small constant for stability var_model = sm.OLS(log_e2, X_const).fit() # Predicted log-variance log_var_hat = var_model.fittedvalues var_hat = np.exp(log_var_hat) weights = 1 / var_hat weights = weights / weights.mean() # Normalize print("Weights: Estimated from variance function (FGLS)") print(f"Variance model R²: {var_model.rsquared:.4f}") else: # User-provided weights weights = np.array(weight_type) weights = weights / weights.mean() print("Weights: User-provided") # ================== # Fit WLS # ================== model_wls = sm.WLS(y, X_const, weights=weights).fit() print("─── WLS Results ───") print(f"Coefficients: {model_wls.params}") print(f"Std Errors: {model_wls.bse}") print(f"R-squared: {model_wls.rsquared:.4f}") # ================== # Compare Standard Errors # ================== print("─── Standard Error Comparison ───") for i, name in enumerate(['Intercept'] + [f'β{j+1}' for j in range(X.shape[1])]): se_ols = model_ols.bse[i] se_wls = model_wls.bse[i] ratio = se_wls / se_ols print(f"{name:12s}: OLS={se_ols:.4f}, WLS={se_wls:.4f}, Ratio={ratio:.3f}") # ================== # Also compute robust SEs for OLS # ================== model_ols_robust = model_ols.get_robustcov_results(cov_type='HC3') print("─── OLS with Robust (HC3) Standard Errors ───") print(f"Coefficients: {model_ols_robust.params}") print(f"Robust SEs: {model_ols_robust.bse}") return { 'ols': model_ols, 'wls': model_wls, 'ols_robust': model_ols_robust, 'weights': weights } # Demonstrationnp.random.seed(42)n = 200x = np.random.uniform(1, 10, n) # Generate heteroscedastic data (variance ∝ x)true_beta = [2, 3]y = true_beta[0] + true_beta[1] * x + np.random.randn(n) * np.sqrt(x) print("True model: y = 2 + 3x + ε, where Var(ε) ∝ x")print(f"True coefficients: β₀ = {true_beta[0]}, β₁ = {true_beta[1]}") # Analysis with known weightsprint("" + "▓" * 65)print(" ANALYSIS WITH KNOWN WEIGHTS ")print("▓" * 65)results_known = weighted_least_squares_analysis(x, y, weight_type='known') # Analysis with estimated weightsprint("" + "▓" * 65)print(" ANALYSIS WITH ESTIMATED WEIGHTS (FGLS) ")print("▓" * 65)results_estimated = weighted_least_squares_analysis(x, y, weight_type='estimated') # Visualizationfig, axes = plt.subplots(1, 2, figsize=(12, 5)) # Plot 1: Data and fitted linesax1 = axes[0]ax1.scatter(x, y, alpha=0.5, s=20)x_plot = np.linspace(x.min(), x.max(), 100)ax1.plot(x_plot, true_beta[0] + true_beta[1] * x_plot, 'k-', linewidth=2, label=f'True: y = {true_beta[0]} + {true_beta[1]}x')ax1.plot(x_plot, results_known['ols'].params[0] + results_known['ols'].params[1] * x_plot, 'b--', linewidth=2, label=f'OLS: y = {results_known["ols"].params[0]:.2f} + {results_known["ols"].params[1]:.2f}x')ax1.plot(x_plot, results_known['wls'].params[0] + results_known['wls'].params[1] * x_plot, 'r--', linewidth=2, label=f'WLS: y = {results_known["wls"].params[0]:.2f} + {results_known["wls"].params[1]:.2f}x')ax1.set_xlabel('x', fontsize=12)ax1.set_ylabel('y', fontsize=12)ax1.set_title('Data and Fitted Lines', fontweight='bold')ax1.legend() # Plot 2: Weight distributionax2 = axes[1]ax2.scatter(x, results_known['weights'], alpha=0.6)ax2.set_xlabel('x', fontsize=12)ax2.set_ylabel('Weight (w)', fontsize=12)ax2.set_title('WLS Weights (w = 1/x)', fontweight='bold') plt.tight_layout()plt.savefig('wls_analysis.png', dpi=150, bbox_inches='tight')plt.show()When heteroscedasticity exists but its form is unknown, we can still obtain valid inference through robust standard errors (also called heteroscedasticity-consistent or HC standard errors, sandwich estimators, or Huber-White standard errors).
Recall that under heteroscedasticity, the true variance of $\hat{\beta}$ is: $$\text{Var}(\hat{\beta}) = (X^TX)^{-1} X^T \Omega X (X^TX)^{-1}$$
where $\Omega = \text{diag}(\sigma_1^2, \ldots, \sigma_n^2)$.
This has a "sandwich" structure:
We estimate this by plugging in $\hat{e}_i^2$ for the unknown $\sigma_i^2$, with various corrections.
HC0 (White's original estimator): $$\hat{\Omega} = \text{diag}(\hat{e}_1^2, \ldots, \hat{e}_n^2)$$
No correction. Tends to underestimate variance in small samples.
HC1 (Degrees of freedom correction): $$\hat{\Omega} = \frac{n}{n-p} \cdot \text{diag}(\hat{e}_1^2, \ldots, \hat{e}_n^2)$$
Simple adjustment for the fact that residuals have mean zero.
HC2 (Leverage correction): $$\hat{\sigma}_i^2 = \frac{\hat{e}i^2}{1 - h{ii}}$$
Accounts for the fact that high-leverage observations have artificially small residuals.
HC3 (Jackknife-like correction): $$\hat{\sigma}_i^2 = \frac{\hat{e}i^2}{(1 - h{ii})^2}$$
More aggressive correction, often best in small samples. This is the recommended default in most applications.
A modern best practice is to always report robust standard errors, regardless of heteroscedasticity test results. Under homoscedasticity, robust SEs are similar to classical SEs. Under heteroscedasticity, they're correct while classical SEs are wrong. You lose little and gain protection.
| Estimator | Formula | Finite-Sample Performance | Recommendation |
|---|---|---|---|
| HC0 | $\hat{e}_i^2$ | Downward biased | Avoid in small samples |
| HC1 | $\frac{n}{n-p}\hat{e}_i^2$ | Slightly better | Minimal improvement over HC0 |
| HC2 | $\frac{\hat{e}i^2}{1-h{ii}}$ | Good | Accounts for leverage |
| HC3 | $\frac{\hat{e}i^2}{(1-h{ii})^2}$ | Best | Default recommendation |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
import numpy as npimport statsmodels.api as sm def compare_standard_errors(X, y): """ Compare classical and various robust standard error estimators. Parameters: ----------- X : ndarray Design matrix (without constant) y : ndarray Response variable """ n = len(y) if X.ndim == 1: X = X.reshape(-1, 1) X_const = sm.add_constant(X) p = X_const.shape[1] # Fit OLS model = sm.OLS(y, X_const).fit() # Get different robust SE estimators hc0 = model.get_robustcov_results(cov_type='HC0') hc1 = model.get_robustcov_results(cov_type='HC1') hc2 = model.get_robustcov_results(cov_type='HC2') hc3 = model.get_robustcov_results(cov_type='HC3') print("═" * 70) print("STANDARD ERROR COMPARISON") print("═" * 70) print(f"{'Coefficient':<12} {'Classical':>10} {'HC0':>10} {'HC1':>10} {'HC2':>10} {'HC3':>10}") print("─" * 70) names = ['Intercept'] + [f'β{j+1}' for j in range(p-1)] for i, name in enumerate(names): print(f"{name:<12} {model.bse[i]:>10.4f} {hc0.bse[i]:>10.4f} " f"{hc1.bse[i]:>10.4f} {hc2.bse[i]:>10.4f} {hc3.bse[i]:>10.4f}") print("─" * 70) # Compute ratios to classical SEs print(f"{'Ratio to Classical SE (values > 1 suggest heteroscedasticity)'}") print(f"{'Coefficient':<12} {'HC0':>10} {'HC1':>10} {'HC2':>10} {'HC3':>10}") print("─" * 70) for i, name in enumerate(names): print(f"{name:<12} {hc0.bse[i]/model.bse[i]:>10.3f} " f"{hc1.bse[i]/model.bse[i]:>10.3f} " f"{hc2.bse[i]/model.bse[i]:>10.3f} " f"{hc3.bse[i]/model.bse[i]:>10.3f}") print("═" * 70) # T-statistics comparison print("T-STATISTICS AND P-VALUES") print("═" * 70) print(f"{'Coefficient':<12} {'Classical':>15} {'HC3 (Robust)':>15}") print("─" * 70) for i, name in enumerate(names): t_class = model.tvalues[i] p_class = model.pvalues[i] t_hc3 = hc3.tvalues[i] p_hc3 = hc3.pvalues[i] class_sig = "**" if p_class < 0.05 else "" hc3_sig = "**" if p_hc3 < 0.05 else "" print(f"{name:<12} t={t_class:>6.3f} (p={p_class:.4f}){class_sig} " f"t={t_hc3:>6.3f} (p={p_hc3:.4f}){hc3_sig:2s}") print("═" * 70) print("** indicates significance at α=0.05") return { 'model': model, 'hc0': hc0, 'hc1': hc1, 'hc2': hc2, 'hc3': hc3 } # Demonstration with heteroscedastic datanp.random.seed(42)n = 100x = np.random.uniform(1, 10, n) # Moderate heteroscedasticityy = 2 + 3 * x + np.random.randn(n) * (0.5 + 0.3 * x) print("Data: y = 2 + 3x + ε, where σ(ε) = 0.5 + 0.3x")print("This creates heteroscedasticity that inflates classical SEs for slope") results = compare_standard_errors(x, y) # Show practical implication: confidence intervalsprint("" + "═" * 70)print("95% CONFIDENCE INTERVALS FOR SLOPE")print("═" * 70)beta1 = results['model'].params[1]se_class = results['model'].bse[1]se_hc3 = results['hc3'].bse[1] ci_class = (beta1 - 1.96 * se_class, beta1 + 1.96 * se_class)ci_hc3 = (beta1 - 1.96 * se_hc3, beta1 + 1.96 * se_hc3) print(f"Slope estimate: {beta1:.4f}")print(f"Classical CI: [{ci_class[0]:.4f}, {ci_class[1]:.4f}] width = {ci_class[1]-ci_class[0]:.4f}")print(f"Robust CI: [{ci_hc3[0]:.4f}, {ci_hc3[1]:.4f}] width = {ci_hc3[1]-ci_hc3[0]:.4f}")print(f"CI width ratio: {(ci_hc3[1]-ci_hc3[0]) / (ci_class[1]-ci_class[0]):.3f}")Heteroscedasticity is ubiquitous in real data. Recognizing and properly handling it is essential for valid statistical inference.
You now have a comprehensive toolkit for detecting and managing heteroscedasticity. The next page examines influential observations—individual data points that disproportionately affect regression results—and the diagnostic measures (leverage, Cook's distance, DFBETAS) that identify them.