Loading learning content...
Among the assumptions of classical regression, normality of errors occupies a peculiar position. It is simultaneously one of the most frequently tested assumptions and one of the most forgiving when violated. Understanding this paradox is essential for making sound inferential decisions.
In the standard linear regression model: $$y = X\beta + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I)$$
The normality assumption specifically states that the errors $\varepsilon_i$ are drawn from a Gaussian distribution. But why does this matter? The answer depends heavily on what you want to do with your regression.
This page covers: (1) When normality actually matters for regression inference; (2) Visual methods for assessing normality including Q-Q plots; (3) Formal statistical tests—Shapiro-Wilk, Anderson-Darling, Jarque-Bera; (4) Interpretation guidelines that account for sample size; and (5) Remedies when normality is severely violated.
The importance of normality depends on your analytical goals and sample size. Let's be precise about what normality buys you:
1. OLS Coefficient Estimation
The Gauss-Markov theorem guarantees that OLS estimates are BLUE (Best Linear Unbiased Estimators) under the assumptions of linearity, independence, homoscedasticity, and unbiasedness—with no normality requirement. Your $\hat{\beta}$ values are valid estimates regardless of error distribution.
2. Asymptotic Inference (Large Samples)
By the Central Limit Theorem, the sampling distribution of $\hat{\beta}$ approaches normality as $n \to \infty$, regardless of the error distribution. For large samples, t-tests and F-tests remain approximately valid even with non-normal errors.
1. Exact Small-Sample Inference
In small samples ($n < 30$), the exact distributions of t-statistics and F-statistics depend on normal errors. Non-normality can lead to:
2. Prediction Intervals
Prediction intervals for new observations rely on normality. Without it, the stated coverage probability (e.g., 95%) may be far from accurate.
3. Maximum Likelihood Equivalence
Under normality, OLS equals MLE. This has implications for information criteria (AIC, BIC) and likelihood ratio tests.
| Sample Size | OLS Estimation | Hypothesis Testing | Confidence Intervals | Prediction Intervals |
|---|---|---|---|---|
| n < 15 | Unaffected | Potentially invalid | Potentially invalid | Potentially invalid |
| 15 ≤ n < 30 | Unaffected | Moderate concern | Moderate concern | Moderate concern |
| 30 ≤ n < 100 | Unaffected | Usually robust | Usually robust | May need adjustment |
| n ≥ 100 | Unaffected | Very robust (CLT) | Very robust | May need adjustment |
For moderately large samples (n > 30-50), mild to moderate non-normality is rarely a serious concern for inference. Severe departures—heavy tails, extreme skewness, multimodality—warrant attention regardless of sample size. Focus your diagnostic energy accordingly.
The quantile-quantile (Q-Q) plot is the most powerful visual tool for assessing normality. It directly compares the distribution of your residuals to the theoretical normal distribution.
If residuals are normally distributed, points should fall approximately on a straight line with slope $\sigma$ passing through $(0, 0)$.
Heavy Tails (Leptokurtic)
Light Tails (Platykurtic)
Right Skewness
Left Skewness
Outliers
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats def comprehensive_qq_plot(residuals, title="Q-Q Plot", figsize=(10, 8)): """ Create a detailed Q-Q plot with reference band and pattern annotations. Parameters: ----------- residuals : array-like Residuals from regression model title : str Plot title Returns: -------- dict with figure and normality statistics """ e = np.array(residuals) n = len(e) # Sort residuals e_sorted = np.sort(e) # Compute theoretical quantiles (Blom formula) p = (np.arange(1, n + 1) - 0.375) / (n + 0.25) theoretical_quantiles = stats.norm.ppf(p) # Standardize residuals for plotting e_standardized = (e_sorted - np.mean(e)) / np.std(e, ddof=1) # Fit line through Q1 and Q3 (robust to outliers) q1_idx, q3_idx = int(0.25 * n), int(0.75 * n) slope = (e_standardized[q3_idx] - e_standardized[q1_idx]) / \ (theoretical_quantiles[q3_idx] - theoretical_quantiles[q1_idx]) intercept = e_standardized[q1_idx] - slope * theoretical_quantiles[q1_idx] # Create figure fig, axes = plt.subplots(2, 2, figsize=figsize) # Main Q-Q plot ax1 = axes[0, 0] ax1.scatter(theoretical_quantiles, e_standardized, alpha=0.6, edgecolors='k', linewidth=0.5, s=40) # Reference line x_line = np.array([theoretical_quantiles.min(), theoretical_quantiles.max()]) ax1.plot(x_line, x_line, 'r-', linewidth=2, label='Perfect normality') # Confidence band (approximate) se = 1 / (stats.norm.pdf(theoretical_quantiles) * np.sqrt(n)) ax1.fill_between(theoretical_quantiles, theoretical_quantiles - 1.96 * se, theoretical_quantiles + 1.96 * se, alpha=0.2, color='red', label='95% confidence band') ax1.set_xlabel('Theoretical Quantiles (Standard Normal)', fontsize=11) ax1.set_ylabel('Sample Quantiles (Standardized)', fontsize=11) ax1.set_title('Q-Q Plot', fontsize=12, fontweight='bold') ax1.legend(loc='upper left') ax1.grid(True, alpha=0.3) # Histogram with normal overlay ax2 = axes[0, 1] ax2.hist(e_standardized, bins='auto', density=True, alpha=0.7, edgecolor='black', label='Residuals') x_norm = np.linspace(e_standardized.min(), e_standardized.max(), 100) ax2.plot(x_norm, stats.norm.pdf(x_norm), 'r-', linewidth=2, label='Standard Normal') ax2.set_xlabel('Standardized Residuals', fontsize=11) ax2.set_ylabel('Density', fontsize=11) ax2.set_title('Histogram vs Normal PDF', fontsize=12, fontweight='bold') ax2.legend() # Detrended Q-Q plot (deviations from line) ax3 = axes[1, 0] deviations = e_standardized - theoretical_quantiles ax3.scatter(theoretical_quantiles, deviations, alpha=0.6, edgecolors='k', linewidth=0.5, s=40) ax3.axhline(0, color='red', linestyle='--', linewidth=1.5) ax3.fill_between(theoretical_quantiles, -1.96*se, 1.96*se, alpha=0.2, color='red') ax3.set_xlabel('Theoretical Quantiles', fontsize=11) ax3.set_ylabel('Deviation from Normal', fontsize=11) ax3.set_title('Detrended Q-Q Plot', fontsize=12, fontweight='bold') ax3.grid(True, alpha=0.3) # Summary statistics ax4 = axes[1, 1] ax4.axis('off') # Compute tests shapiro_stat, shapiro_p = stats.shapiro(e) skewness = stats.skew(e) kurtosis = stats.kurtosis(e) # Excess kurtosis summary_text = f""" NORMALITY ASSESSMENT SUMMARY ═══════════════════════════════════════ Sample Size: {n} Descriptive Statistics: ───────────────────────────────────── Skewness: {skewness:8.4f} (0 = symmetric) Excess Kurtosis: {kurtosis:8.4f} (0 = normal tails) Formal Tests: ───────────────────────────────────── Shapiro-Wilk W: {shapiro_stat:8.4f} Shapiro-Wilk p: {shapiro_p:8.4f} Interpretation: ───────────────────────────────────── {"✓ No evidence against normality (p > 0.05)" if shapiro_p > 0.05 else "⚠ Evidence of non-normality (p ≤ 0.05)"} {"✓ Skewness acceptable (|skew| < 0.5)" if abs(skewness) < 0.5 else "⚠ Notable skewness" if abs(skewness) < 1 else "⚠ Severe skewness"} {"✓ Kurtosis acceptable (|kurt| < 1)" if abs(kurtosis) < 1 else "⚠ Notable kurtosis" if abs(kurtosis) < 2 else "⚠ Severe kurtosis"} """ ax4.text(0.1, 0.95, summary_text, transform=ax4.transAxes, fontsize=10, verticalalignment='top', fontfamily='monospace', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5)) plt.suptitle(title, fontsize=14, fontweight='bold', y=1.02) plt.tight_layout() return { 'figure': fig, 'shapiro_stat': shapiro_stat, 'shapiro_p': shapiro_p, 'skewness': skewness, 'kurtosis': kurtosis } # Demonstrate with different distribution typesnp.random.seed(42)n = 100 # Normal residualse_normal = np.random.randn(n)result = comprehensive_qq_plot(e_normal, "Normal Residuals")plt.savefig('qq_normal.png', dpi=150, bbox_inches='tight')plt.show() # Heavy-tailed (t-distribution)e_heavy = stats.t.rvs(df=3, size=n)result = comprehensive_qq_plot(e_heavy, "Heavy-Tailed Residuals (t, df=3)")plt.savefig('qq_heavy_tailed.png', dpi=150, bbox_inches='tight')plt.show() # Right-skewede_skewed = stats.lognorm.rvs(s=0.5, size=n)e_skewed = e_skewed - np.mean(e_skewed)result = comprehensive_qq_plot(e_skewed, "Right-Skewed Residuals")plt.savefig('qq_skewed.png', dpi=150, bbox_inches='tight')plt.show()The Shapiro-Wilk test is widely considered the most powerful omnibus normality test for small to moderate sample sizes. It should be your default formal test for normality assessment.
The Shapiro-Wilk statistic is defined as: $$W = \frac{\left(\sum_{i=1}^{n} a_i x_{(i)}\right)^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2}$$
where:
W statistic range: $0 < W \leq 1$
Hypothesis test:
Decision rule: Reject normality if p-value < $\alpha$ (typically 0.05)
Sample size limits:
Power characteristics:
For large samples (n > 1000), formal normality tests become problematic. They detect trivial departures that have no practical impact on inference. In these cases, rely more heavily on visual assessment (Q-Q plots) and effect sizes (skewness, kurtosis) rather than p-values.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
import numpy as npfrom scipy import stats def shapiro_wilk_analysis(residuals, alpha=0.05): """ Comprehensive Shapiro-Wilk normality test with interpretation. Parameters: ----------- residuals : array-like Residuals from regression model alpha : float Significance level Returns: -------- dict with test results and interpretation """ e = np.array(residuals) n = len(e) # Check sample size constraints if n < 3: raise ValueError("Shapiro-Wilk requires at least 3 observations") if n > 5000: print("Warning: n > 5000. Shapiro-Wilk may be overly sensitive.") # Perform test statistic, p_value = stats.shapiro(e) # Effect size measures skewness = stats.skew(e) kurtosis = stats.kurtosis(e) # Excess kurtosis # Interpretation if p_value > alpha: decision = "FAIL TO REJECT H₀" conclusion = "No significant evidence against normality." else: decision = "REJECT H₀" conclusion = "Significant evidence against normality." # Severity assessment based on effect sizes if abs(skewness) < 0.5 and abs(kurtosis) < 1: severity = "Negligible: Minor departures unlikely to affect inference" elif abs(skewness) < 1 and abs(kurtosis) < 2: severity = "Moderate: May affect small-sample inference" else: severity = "Severe: Consider transformations or robust methods" # Large-sample warning if n > 100 and p_value < alpha: large_sample_note = ("⚠ With n > 100, statistical significance may not " "indicate practical significance. Check effect sizes.") else: large_sample_note = "" results = { 'n': n, 'statistic': statistic, 'p_value': p_value, 'alpha': alpha, 'decision': decision, 'conclusion': conclusion, 'skewness': skewness, 'kurtosis': kurtosis, 'severity': severity, 'large_sample_note': large_sample_note } # Print report print("═" * 50) print("SHAPIRO-WILK NORMALITY TEST") print("═" * 50) print(f"Sample size (n): {n}") print(f"W statistic: {statistic:.6f}") print(f"P-value: {p_value:.6f}") print(f"Significance (α): {alpha}") print("─" * 50) print(f"Decision: {decision}") print(f"Conclusion: {conclusion}") print("─" * 50) print("Effect Sizes:") print(f" Skewness: {skewness:.4f}") print(f" Excess Kurtosis: {kurtosis:.4f}") print(f" Assessment: {severity}") if large_sample_note: print("─" * 50) print(large_sample_note) print("═" * 50) return results # Demonstrationnp.random.seed(42) # Test 1: Normal residualsprint("\n" + "="*60)print("EXAMPLE 1: True Normal Residuals")print("="*60)e_normal = np.random.randn(50)results1 = shapiro_wilk_analysis(e_normal) # Test 2: Mildly skewed (practical concern: usually OK)print("\n" + "="*60)print("EXAMPLE 2: Mildly Skewed Residuals")print("="*60)e_mild = stats.skewnorm.rvs(a=2, size=50) # Mild skewnesse_mild = e_mild - np.mean(e_mild)results2 = shapiro_wilk_analysis(e_mild) # Test 3: Heavily skewed (practical concern: may need action)print("\n" + "="*60)print("EXAMPLE 3: Heavily Skewed Residuals")print("="*60)e_heavy = stats.lognorm.rvs(s=1, size=50)e_heavy = e_heavy - np.mean(e_heavy)results3 = shapiro_wilk_analysis(e_heavy)While Shapiro-Wilk is generally recommended, other normality tests serve specific purposes and provide complementary information.
The Anderson-Darling test is an empirical distribution function (EDF) test that places more weight on the tails than the Kolmogorov-Smirnov test.
Statistic: $$A^2 = -n - \frac{1}{n}\sum_{i=1}^{n}(2i-1)[\ln F(x_{(i)}) + \ln(1-F(x_{(n+1-i)}))]$$
where $F$ is the hypothesized CDF (standard normal with estimated parameters).
Strengths:
When to use: When tail behavior is of particular concern (e.g., financial data, extreme value analysis).
The Jarque-Bera test directly tests whether skewness and kurtosis match normal distribution values.
Statistic: $$JB = \frac{n}{6}\left(S^2 + \frac{(K-3)^2}{4}\right)$$
where $S$ is sample skewness and $K$ is sample kurtosis. Under $H_0$, $JB \sim \chi^2_2$ (approximately, for large n).
Strengths:
Weaknesses:
Combines standardized skewness and kurtosis statistics: $$K^2 = Z_s^2 + Z_k^2$$
where $Z_s$ and $Z_k$ are transformed skewness and kurtosis with approximately standard normal distributions. Under $H_0$, $K^2 \sim \chi^2_2$.
Strengths:
| Test | Best Sample Size | Special Sensitivity | Recommended Use |
|---|---|---|---|
| Shapiro-Wilk | n < 50 | Overall departures | Default choice for small samples |
| Anderson-Darling | n < 5000 | Tail behavior | When tails are critical |
| Jarque-Bera | n > 50 | Skewness & kurtosis | Large samples, econometrics |
| D'Agostino-Pearson | n > 20 | Skewness & kurtosis | Medium samples, omnibus test |
| Kolmogorov-Smirnov | Any | Central distribution | Generally less powerful—avoid |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
import numpy as npfrom scipy import stats def comprehensive_normality_battery(residuals, alpha=0.05): """ Run multiple normality tests for comprehensive assessment. Parameters: ----------- residuals : array-like Residuals from regression model alpha : float Significance level Returns: -------- dict with all test results """ e = np.array(residuals) n = len(e) results = {'n': n, 'alpha': alpha} print("═" * 60) print("COMPREHENSIVE NORMALITY TEST BATTERY") print("═" * 60) print(f"Sample size: {n}") print(f"Significance level: {alpha}") print("═" * 60) # 1. Shapiro-Wilk if n <= 5000: sw_stat, sw_p = stats.shapiro(e) results['shapiro_wilk'] = {'statistic': sw_stat, 'p_value': sw_p} verdict = "✓ PASS" if sw_p > alpha else "✗ FAIL" print(f"Shapiro-Wilk: W = {sw_stat:.4f}, p = {sw_p:.4f} {verdict}") else: print("Shapiro-Wilk: Skipped (n > 5000)") # 2. Anderson-Darling ad_result = stats.anderson(e, dist='norm') # Anderson-Darling returns critical values, not p-value directly # We use the statistic and compare to critical value at 5% ad_stat = ad_result.statistic ad_critical_5 = ad_result.critical_values[2] # Index 2 is 5% level ad_pass = ad_stat < ad_critical_5 results['anderson_darling'] = { 'statistic': ad_stat, 'critical_5pct': ad_critical_5 } verdict = "✓ PASS" if ad_pass else "✗ FAIL" print(f"Anderson-Darling: A² = {ad_stat:.4f}, crit(5%) = {ad_critical_5:.4f} {verdict}") # 3. D'Agostino-Pearson (K² test) if n >= 20: dp_stat, dp_p = stats.normaltest(e) results['dagostino_pearson'] = {'statistic': dp_stat, 'p_value': dp_p} verdict = "✓ PASS" if dp_p > alpha else "✗ FAIL" print(f"D'Agostino-Pearson: K² = {dp_stat:.4f}, p = {dp_p:.4f} {verdict}") else: print("D'Agostino-Pearson: Skipped (n < 20)") # 4. Jarque-Bera jb_stat, jb_p = stats.jarque_bera(e) results['jarque_bera'] = {'statistic': jb_stat, 'p_value': jb_p} verdict = "✓ PASS" if jb_p > alpha else "✗ FAIL" if n < 50: print(f"Jarque-Bera: JB = {jb_stat:.4f}, p = {jb_p:.4f} {verdict} (⚠ n<50)") else: print(f"Jarque-Bera: JB = {jb_stat:.4f}, p = {jb_p:.4f} {verdict}") # Effect sizes skew = stats.skew(e) kurt = stats.kurtosis(e) results['skewness'] = skew results['kurtosis'] = kurt print("─" * 60) print(f"Effect sizes: Skewness = {skew:.4f}, Excess Kurtosis = {kurt:.4f}") # Overall assessment print("═" * 60) # Count failures n_tests = 0 n_failures = 0 if 'shapiro_wilk' in results and results['shapiro_wilk']['p_value'] <= alpha: n_failures += 1 if 'shapiro_wilk' in results: n_tests += 1 if not ad_pass: n_failures += 1 n_tests += 1 if 'dagostino_pearson' in results and results['dagostino_pearson']['p_value'] <= alpha: n_failures += 1 if 'dagostino_pearson' in results: n_tests += 1 if n >= 50 and jb_p <= alpha: n_failures += 1 if n >= 50: n_tests += 1 print(f"Tests failed: {n_failures}/{n_tests}") if n_failures == 0: print("OVERALL: No evidence against normality") elif n_failures <= n_tests // 2: print("OVERALL: Mixed evidence—inspect Q-Q plot carefully") else: print("OVERALL: Substantial evidence against normality") print("═" * 60) return results # Demonstrationnp.random.seed(42) print("\n" + "▓" * 60)print(" NORMAL DISTRIBUTION ")print("▓" * 60)e_normal = np.random.randn(100)r1 = comprehensive_normality_battery(e_normal) print("\n" + "▓" * 60)print(" HEAVY-TAILED (t, df=4) ")print("▓" * 60)e_heavy = stats.t.rvs(df=4, size=100)r2 = comprehensive_normality_battery(e_heavy) print("\n" + "▓" * 60)print(" RIGHT-SKEWED (lognormal) ")print("▓" * 60)e_skewed = np.exp(np.random.randn(100) * 0.5)e_skewed = e_skewed - np.mean(e_skewed)r3 = comprehensive_normality_battery(e_skewed)When normality is seriously violated and inference is affected, several remedial strategies are available. The choice depends on the nature of the non-normality and your analytical goals.
Applying a monotonic transformation to $y$ can often normalize residuals:
Box-Cox Transformation: $$y^{(\lambda)} = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \lambda \neq 0 \ \ln(y) & \lambda = 0 \end{cases}$$
The optimal $\lambda$ is chosen to maximize the (profiled) log-likelihood. Common special cases:
Limitations:
When outliers or heavy tails drive non-normality, robust regression methods downweight influential observations:
M-estimation: Minimizes $\sum \rho(e_i)$ where $\rho$ is a robust loss function (Huber, bisquare/Tukey)
Iteratively Reweighted Least Squares (IRLS): Implements M-estimation by repeatedly fitting weighted OLS
Advantages:
When distributional assumptions are suspect, the bootstrap provides nonparametric inference:
Residual Bootstrap:
Advantage: Makes no assumptions about error distribution
When to use: Small to moderate samples where CLT can't be relied upon
Before choosing a remedy, diagnose the source of non-normality. If it's a few outliers, robust regression may suffice. If it's systematic skewness, transformation addresses the root cause. If it's complex non-normality, bootstrap may be the safest approach.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182
import numpy as npfrom scipy import statsfrom scipy.optimize import minimize_scalarimport matplotlib.pyplot as plt def box_cox_transform(y, lmbda): """Apply Box-Cox transformation.""" if lmbda == 0: return np.log(y) else: return (y**lmbda - 1) / lmbda def optimal_box_cox(y, X=None): """ Find optimal Box-Cox lambda using profile likelihood. Parameters: ----------- y : array-like Positive response values X : array-like, optional Design matrix (for model-based optimization) Returns: -------- dict with optimal lambda and transformed values """ y = np.array(y) if np.any(y <= 0): shift = np.abs(y.min()) + 1 y = y + shift print(f"Warning: y shifted by {shift:.2f} to make all values positive") else: shift = 0 n = len(y) def neg_log_likelihood(lmbda): """Negative profile log-likelihood for Box-Cox.""" y_transformed = box_cox_transform(y, lmbda) # Variance of transformed data var_t = np.var(y_transformed, ddof=1) # Log-likelihood (up to constants) ll = -0.5 * n * np.log(var_t) + (lmbda - 1) * np.sum(np.log(y)) return -ll # Find optimal lambda result = minimize_scalar(neg_log_likelihood, bounds=(-2, 2), method='bounded') optimal_lmbda = result.x # Transform with optimal lambda y_transformed = box_cox_transform(y, optimal_lmbda) # Test normality of transformed values sw_stat, sw_p = stats.shapiro(y_transformed) # Common named transformations if abs(optimal_lmbda) < 0.05: transform_name = "Log" elif abs(optimal_lmbda - 0.5) < 0.1: transform_name = "Square Root" elif abs(optimal_lmbda - 1) < 0.1: transform_name = "None (linear)" elif abs(optimal_lmbda + 1) < 0.1: transform_name = "Inverse" else: transform_name = f"Power {optimal_lmbda:.2f}" return { 'optimal_lambda': optimal_lmbda, 'transform_name': transform_name, 'y_transformed': y_transformed, 'shift_applied': shift, 'shapiro_w': sw_stat, 'shapiro_p': sw_p } def residual_bootstrap_ci(X, y, n_bootstrap=1000, alpha=0.05, seed=42): """ Compute bootstrap confidence intervals for regression coefficients. Parameters: ----------- X : ndarray of shape (n, p) Design matrix (without intercept) y : ndarray of shape (n,) Response vector n_bootstrap : int Number of bootstrap samples alpha : float Significance level for CI Returns: -------- dict with coefficients and confidence intervals """ np.random.seed(seed) n = len(y) # Add intercept X_full = np.column_stack([np.ones(n), X]) p = X_full.shape[1] # Fit original model XtX_inv = np.linalg.inv(X_full.T @ X_full) beta_hat = XtX_inv @ X_full.T @ y y_hat = X_full @ beta_hat residuals = y - y_hat # Bootstrap beta_boot = np.zeros((n_bootstrap, p)) for b in range(n_bootstrap): # Resample residuals with replacement e_star = residuals[np.random.choice(n, n, replace=True)] # Create bootstrap response y_star = y_hat + e_star # Fit to bootstrap sample beta_boot[b] = XtX_inv @ X_full.T @ y_star # Compute percentile CIs lower = np.percentile(beta_boot, 100 * alpha / 2, axis=0) upper = np.percentile(beta_boot, 100 * (1 - alpha / 2), axis=0) se_boot = np.std(beta_boot, axis=0, ddof=1) return { 'coefficients': beta_hat, 'se_bootstrap': se_boot, 'ci_lower': lower, 'ci_upper': upper, 'beta_boot': beta_boot } # Demonstrationnp.random.seed(42)n = 100 # Create data with skewed responseX = np.random.randn(n, 2)y_latent = 2 + 1.5 * X[:, 0] - 0.8 * X[:, 1] + np.random.randn(n) * 0.5y = np.exp(y_latent) # Response is lognormal print("="*60)print("BOX-COX TRANSFORMATION ANALYSIS")print("="*60) # Original residualsfrom sklearn.linear_model import LinearRegressionmodel_orig = LinearRegression()model_orig.fit(X, y)resid_orig = y - model_orig.predict(X)sw_orig, p_orig = stats.shapiro(resid_orig)print(f"\nOriginal residuals: Shapiro-Wilk p = {p_orig:.4f}") # Find optimal Box-Coxbc_result = optimal_box_cox(y)print(f"\nOptimal λ: {bc_result['optimal_lambda']:.4f}")print(f"Suggested transformation: {bc_result['transform_name']}")print(f"Transformed residuals: Shapiro-Wilk p = {bc_result['shapiro_p']:.4f}") print("\n" + "="*60)print("RESIDUAL BOOTSTRAP INFERENCE")print("="*60) # Apply log transformation and bootstrapy_log = np.log(y)boot_results = residual_bootstrap_ci(X, y_log, n_bootstrap=2000) print(f"\nCoefficients (log scale):")for i, (coef, lower, upper, se) in enumerate(zip( boot_results['coefficients'], boot_results['ci_lower'], boot_results['ci_upper'], boot_results['se_bootstrap'])): name = 'Intercept' if i == 0 else f'β{i}' print(f" {name}: {coef:.4f} (SE: {se:.4f}), 95% CI: [{lower:.4f}, {upper:.4f}]")Normality testing occupies a nuanced position in regression diagnostics. It's important to understand both when to worry and when not to.
You now understand when normality matters, how to assess it visually and formally, and what to do when it fails. The next page tackles heteroscedasticity—the violation of constant variance that can fundamentally undermine both estimation efficiency and inference validity.