Loading content...
Every regression model tells two stories: the signal story (what the model captures about the relationship between predictors and response) and the residual story (what the model fails to explain). A fitted regression equation might look impressive on paper, but without examining residuals—the differences between observed and predicted values—we're flying blind.
Residual analysis is the diagnostic lens through which we scrutinize our models. It reveals whether our assumptions hold, whether our predictions are reliable, and whether hidden patterns lurk in the data that our model missed. In this sense, residual analysis isn't just a validation step—it's detective work.
This page will equip you with the mathematical foundations, visual diagnostic tools, and practical intuition needed to conduct rigorous residual analysis. You'll learn to read residual plots like a radiologist reads an X-ray—identifying subtle anomalies that reveal deeper structural issues.
By the end of this page, you will understand how to define, compute, and interpret residuals; distinguish between different residual types (raw, standardized, studentized); construct and analyze key diagnostic plots; and recognize the specific assumption violations each plot reveals. This knowledge is essential for any production ML system built on regression.
Let's establish the precise mathematical foundation. In a regression setting with response variable $y$ and fitted values $\hat{y}$, the residual for observation $i$ is defined as:
$$e_i = y_i - \hat{y}_i$$
This deceptively simple quantity encapsulates everything the model cannot explain. If our model were perfect and the true relationship were deterministic, all residuals would be exactly zero. In practice, residuals arise from:
The goal of residual analysis is to distinguish between these sources—determining whether observed residual patterns are consistent with random noise or indicative of systematic problems.
Residuals (e_i) are observable estimates of the true errors (ε_i). The true error ε_i = y_i − E[y_i|x_i] involves the unknown population regression function, while the residual e_i = y_i − ŷ_i uses our estimated function. This distinction matters: residuals are correlated with each other and with fitted values in ways that true errors are not. Understanding this distinction is essential for proper inference.
Under ordinary least squares (OLS) estimation, residuals possess important mathematical properties that arise directly from the normal equations. For a model $y = X\beta + \varepsilon$, the OLS residual vector is:
$$\mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} = \mathbf{y} - X\hat{\beta} = (I - H)\mathbf{y}$$
where $H = X(X^TX)^{-1}X^T$ is the hat matrix (also called the projection matrix). Several key properties follow:
Property 1: Zero Sum $$\sum_{i=1}^{n} e_i = 0$$ Proof: The first normal equation states $X^T e = 0$. If the model includes an intercept, the first column of $X$ is $\mathbf{1}$ (all ones), so $\mathbf{1}^T e = \sum e_i = 0$.
Property 2: Orthogonality to Predictors $$X^T \mathbf{e} = 0$$ Meaning: Residuals are uncorrelated with each predictor column. This is exactly the condition that defines OLS—minimizing squared residuals yields predictions orthogonal to the residual space.
Property 3: Orthogonality to Fitted Values $$\hat{\mathbf{y}}^T \mathbf{e} = 0$$ Meaning: Residuals are uncorrelated with fitted values—a consequence of the previous property since $\hat{y}$ is a linear combination of predictor columns.
| Property | Mathematical Statement | Interpretation |
|---|---|---|
| Zero sum | $\sum e_i = 0$ | Intercept absorbs mean offset |
| Orthogonal to X | $X^T e = 0$ | No remaining linear signal in residuals |
| Orthogonal to ŷ | $\hat{y}^T e = 0$ | Predictions and errors are uncorrelated |
| Expected value zero | $E[e_i] = 0$ | Unbiased on average |
| Variance relationship | $\text{Var}(e_i) = \sigma^2(1-h_{ii})$ | Variance depends on leverage |
Not all residuals are created equal. For diagnostic purposes, we often transform raw residuals to account for the non-constant variance inherent in OLS residual structure. Understanding these transformations is critical for proper interpretation.
The simplest form, raw residuals are just the observed minus predicted: $$e_i = y_i - \hat{y}_i$$
Limitation: Raw residuals have unequal variances. Specifically, $\text{Var}(e_i) = \sigma^2(1 - h_{ii})$ where $h_{ii}$ is the $i$-th diagonal element of the hat matrix. Points with high leverage (large $h_{ii}$) have artificially smaller residuals, making comparison across observations misleading.
To enable comparison, we standardize by the estimated standard deviation: $$r_i = \frac{e_i}{\hat{\sigma}}$$
where $\hat{\sigma} = \sqrt{\frac{1}{n-p}\sum_{i=1}^{n}e_i^2}$ is the residual standard error (RSS divided by degrees of freedom). This gives residuals on a consistent scale—approximately unit variance under the model assumptions.
Limitation: This still doesn't account for the fact that different observations have different residual variances due to leverage differences.
The diagonal element h_ii measures the 'leverage' of observation i—how much influence its x-values have on the regression fit. Values range from 1/n (for a point at the predictor centroid) to 1 (for perfectly high-leverage points). We'll explore leverage in depth when we cover influential points.
To properly account for leverage, we divide by the estimated standard deviation of each residual: $$r_i^{\text{int}} = \frac{e_i}{\hat{\sigma}\sqrt{1-h_{ii}}}$$
These are sometimes called standardized residuals in software (causing confusion with our earlier definition). Under normal errors, internally studentized residuals approximately follow a standard normal distribution.
The most sophisticated form uses a leave-one-out approach. For each observation $i$, we compute: $$t_i = \frac{e_i}{\hat{\sigma}{(i)}\sqrt{1-h{ii}}}$$
where $\hat{\sigma}_{(i)}$ is the residual standard error computed from a model fitted without observation $i$. This removes the self-influence problem—the residual is being standardized by variance estimated from independent data.
Computational efficiency: Remarkably, we don't need to refit $n$ separate models. The externally studentized residual can be computed as: $$t_i = r_i^{\text{int}} \sqrt{\frac{n-p-1}{n-p-(r_i^{\text{int}})^2}}$$
Under normal errors, $t_i$ follows a $t$-distribution with $n-p-1$ degrees of freedom, enabling formal outlier testing.
| Residual Type | Formula | Distribution (under H₀) | Primary Use |
|---|---|---|---|
| Raw | $e_i = y_i - \hat{y}_i$ | Non-standard | Basic diagnostics, RSS |
| Standardized | $e_i / \hat{\sigma}$ | Approx. N(0,1) | Scale-free comparison |
| Internally Studentized | $e_i / (\hat{\sigma}\sqrt{1-h_{ii}})$ | Approx. N(0,1) | Leverage-adjusted comparison |
| Externally Studentized | $e_i / (\hat{\sigma}{(i)}\sqrt{1-h{ii}})$ | t(n-p-1) | Formal outlier testing |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
import numpy as npfrom scipy import stats def compute_residual_types(X, y, include_intercept=True): """ Compute all four types of residuals for regression diagnostics. Parameters: ----------- X : ndarray of shape (n, p) Design matrix (without intercept column if include_intercept=True) y : ndarray of shape (n,) Response vector include_intercept : bool Whether to add intercept column to X Returns: -------- dict with keys: 'raw', 'standardized', 'internally_studentized', 'externally_studentized', 'leverage', 'sigma_hat' """ n = len(y) # Add intercept if needed if include_intercept: X = np.column_stack([np.ones(n), X]) p = X.shape[1] # Number of parameters including intercept # Compute OLS estimates XtX_inv = np.linalg.inv(X.T @ X) beta_hat = XtX_inv @ X.T @ y y_hat = X @ beta_hat # Raw residuals e = y - y_hat # Hat matrix diagonal (leverage values) H = X @ XtX_inv @ X.T h = np.diag(H) # Residual standard error RSS = np.sum(e**2) sigma_hat = np.sqrt(RSS / (n - p)) # Standardized residuals r_std = e / sigma_hat # Internally studentized residuals r_int = e / (sigma_hat * np.sqrt(1 - h)) # Externally studentized residuals (efficient formula) r_ext = r_int * np.sqrt((n - p - 1) / (n - p - r_int**2)) return { 'raw': e, 'standardized': r_std, 'internally_studentized': r_int, 'externally_studentized': r_ext, 'leverage': h, 'sigma_hat': sigma_hat, 'n': n, 'p': p } # Example usage with synthetic datanp.random.seed(42)n = 100X = np.random.randn(n, 2)true_beta = np.array([2, -1, 0.5]) # intercept, beta1, beta2y = true_beta[0] + X @ true_beta[1:] + np.random.randn(n) * 0.5 # Add an outliery[0] = y[0] + 5 results = compute_residual_types(X, y)print(f"Observation 1 (outlier) residuals:")print(f" Raw: {results['raw'][0]:.4f}")print(f" Standardized: {results['standardized'][0]:.4f}")print(f" Internally Studentized: {results['internally_studentized'][0]:.4f}")print(f" Externally Studentized: {results['externally_studentized'][0]:.4f}")print(f" Leverage (h_11): {results['leverage'][0]:.4f}")The residual vs. fitted values plot is the workhorse of regression diagnostics. It reveals multiple assumption violations simultaneously and should be your first stop in any analysis.
Plot residuals $e_i$ (or standardized/studentized versions) on the y-axis against fitted values $\hat{y}_i$ on the x-axis. Each point represents one observation.
Under correct model specification and valid assumptions, this plot should show:
The null pattern is a formless cloud—any structure reveals a problem.
The residual vs. fitted plot can reveal: (1) non-linearity (curved patterns), (2) heteroscedasticity (funnel shapes), (3) outliers (isolated points), and (4) missing predictors (systematic patterns). It cannot easily distinguish between these—additional plots are needed—but it alerts you that something is wrong.
Pattern 1: Curvature (U-shape or inverted U)
Diagnosis: The relationship between predictors and response is nonlinear. A linear model is missing quadratic or higher-order terms.
Example: Fitting a linear model to $y = \beta_0 + \beta_1 x + \beta_2 x^2 + \varepsilon$ without the $x^2$ term produces residuals that are negative for extreme $\hat{y}$ values and positive in the middle (or vice versa).
Remedy: Add polynomial terms, apply transformations (log, sqrt), or consider nonlinear models.
Pattern 2: Funnel Shape (Heteroscedasticity)
Diagnosis: Residual variance increases (or decreases) with fitted values. The constant variance assumption is violated.
Example: In income prediction, high-income individuals show more variability than low-income individuals. Residuals fan out as $\hat{y}$ increases.
Remedy: Log-transform the response, use weighted least squares, or fit a model that accounts for heteroscedasticity (e.g., generalized least squares).
Pattern 3: Horizontal Band with Isolated Points
Diagnosis: General structure is acceptable but individual outliers exist. These may be data errors, special cases, or genuine unusual observations.
Remedy: Investigate outliers individually. Consider robust regression if outliers are genuine but shouldn't dominate the fit.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats def create_diagnostic_plots(y, y_hat, residuals, leverage=None, figsize=(14, 10)): """ Create a comprehensive 2x2 grid of regression diagnostic plots. Parameters: ----------- y : array-like Observed response values y_hat : array-like Fitted/predicted values residuals : dict Dictionary with residual types from compute_residual_types() leverage : array-like, optional Leverage values (h_ii) """ fig, axes = plt.subplots(2, 2, figsize=figsize) # Plot 1: Residuals vs Fitted ax1 = axes[0, 0] ax1.scatter(y_hat, residuals['raw'], alpha=0.6, edgecolors='k', linewidth=0.5) ax1.axhline(y=0, color='red', linestyle='--', linewidth=2) ax1.set_xlabel('Fitted Values', fontsize=12) ax1.set_ylabel('Residuals', fontsize=12) ax1.set_title('Residuals vs Fitted', fontsize=14, fontweight='bold') # Add LOWESS smoother for pattern detection from statsmodels.nonparametric.smoothers_lowess import lowess smoothed = lowess(residuals['raw'], y_hat, frac=0.5) ax1.plot(smoothed[:, 0], smoothed[:, 1], color='blue', linewidth=2, label='LOWESS smoother') ax1.legend() # Plot 2: Q-Q Plot ax2 = axes[0, 1] stats.probplot(residuals['internally_studentized'], dist="norm", plot=ax2) ax2.set_title('Normal Q-Q', fontsize=14, fontweight='bold') ax2.get_lines()[0].set_markersize(5) ax2.get_lines()[0].set_alpha(0.6) # Plot 3: Scale-Location Plot ax3 = axes[1, 0] sqrt_abs_std_resid = np.sqrt(np.abs(residuals['internally_studentized'])) ax3.scatter(y_hat, sqrt_abs_std_resid, alpha=0.6, edgecolors='k', linewidth=0.5) ax3.set_xlabel('Fitted Values', fontsize=12) ax3.set_ylabel('√|Standardized Residuals|', fontsize=12) ax3.set_title('Scale-Location', fontsize=14, fontweight='bold') smoothed = lowess(sqrt_abs_std_resid, y_hat, frac=0.5) ax3.plot(smoothed[:, 0], smoothed[:, 1], color='red', linewidth=2) # Plot 4: Residuals vs Leverage if leverage is not None: ax4 = axes[1, 1] ax4.scatter(leverage, residuals['internally_studentized'], alpha=0.6, edgecolors='k', linewidth=0.5) ax4.axhline(y=0, color='gray', linestyle='--') ax4.set_xlabel('Leverage (h_ii)', fontsize=12) ax4.set_ylabel('Standardized Residuals', fontsize=12) ax4.set_title('Residuals vs Leverage', fontsize=14, fontweight='bold') # Add Cook's distance contours n = len(y) p = len(y) - len(residuals['raw']) + residuals['p'] # Approximate for cook_d in [0.5, 1.0]: h_range = np.linspace(0.001, max(leverage) * 1.1, 100) y_cook = np.sqrt(cook_d * p * (1 - h_range) / h_range) ax4.plot(h_range, y_cook, 'r--', alpha=0.4) ax4.plot(h_range, -y_cook, 'r--', alpha=0.4) plt.tight_layout() return fig # Demonstrate with examples showing different patternsfig, axes = plt.subplots(1, 3, figsize=(15, 4)) # Example 1: Good residual patternnp.random.seed(42)n = 100x = np.random.randn(n)y_good = 2 + 3*x + np.random.randn(n)y_hat_good = 2 + 3*xe_good = y_good - y_hat_good axes[0].scatter(y_hat_good, e_good, alpha=0.6)axes[0].axhline(0, color='red', linestyle='--')axes[0].set_xlabel('Fitted Values')axes[0].set_ylabel('Residuals')axes[0].set_title('Good Pattern(Random Scatter)', fontweight='bold') # Example 2: Nonlinearity (quadratic term missing)y_quad = 2 + 3*x + 2*x**2 + np.random.randn(n)y_hat_linear = 2 + 3*x # Wrong model (missing x²)e_nonlin = y_quad - y_hat_linear axes[1].scatter(y_hat_linear, e_nonlin, alpha=0.6)axes[1].axhline(0, color='red', linestyle='--')axes[1].set_xlabel('Fitted Values')axes[1].set_ylabel('Residuals')axes[1].set_title('Nonlinearity(U-shaped Pattern)', fontweight='bold') # Example 3: Heteroscedasticity (funnel)y_hetero = 2 + 3*x + np.random.randn(n) * (1 + 2*np.abs(x))y_hat_hetero = 2 + 3*xe_hetero = y_hetero - y_hat_hetero axes[2].scatter(y_hat_hetero, e_hetero, alpha=0.6)axes[2].axhline(0, color='red', linestyle='--')axes[2].set_xlabel('Fitted Values')axes[2].set_ylabel('Residuals')axes[2].set_title('Heteroscedasticity(Funnel Shape)', fontweight='bold') plt.tight_layout()plt.savefig('residual_patterns.png', dpi=150, bbox_inches='tight')plt.show()While residual vs. fitted plots reveal overall model issues, partial residual plots (also called component-plus-residual plots) let us examine the functional form of individual predictors. This is crucial for detecting nonlinear relationships in specific variables.
For predictor $x_j$, the partial residual is: $$e_i^{(j)} = e_i + \hat{\beta}j x{ij}$$
We then plot $e_i^{(j)}$ against $x_{ij}$. The resulting scatter should approximate the partial relationship between $y$ and $x_j$ while controlling for other predictors.
If the true relationship between $y$ and $x_j$ is linear, the partial residual plot will show a linear pattern with slope approximately $\hat{\beta}_j$. Curvature in the plot suggests the linear term is inadequate—we might need polynomial terms, splines, or transformations.
Consider the true model: $E[y|x] = \alpha + \beta_j x_j + g(x_{-j})$ where $g(x_{-j})$ captures all other predictor effects. The OLS residual is approximately: $$e \approx \varepsilon + (\text{nonlinearity in } x_j)$$
Adding back $\hat{\beta}_j x_j$ recovers the total contribution of $x_j$: $$e + \hat{\beta}_j x_j \approx \beta_j x_j + (\text{nonlinearity in } x_j) + \varepsilon$$
Plotting this against $x_j$ reveals the true functional form.
Component-plus-Residual plots assume other predictors have correct functional forms. If this is violated, patterns can be misleading. CERES (Combining conditional Expectations and RESiduals) plots address this by using smoother estimates instead of linear terms for other variables. Use car::ceresPlots() in R for this enhanced diagnostic.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegressionfrom statsmodels.nonparametric.smoothers_lowess import lowess def partial_residual_plots(X, y, feature_names=None, figsize=(14, 5)): """ Create partial residual (component-plus-residual) plots for each predictor. Parameters: ----------- X : ndarray of shape (n, p) Design matrix of predictors y : ndarray of shape (n,) Response variable feature_names : list of str, optional Names for each predictor column """ n, p = X.shape if feature_names is None: feature_names = [f'X{i+1}' for i in range(p)] # Fit the model model = LinearRegression() model.fit(X, y) y_hat = model.predict(X) residuals = y - y_hat betas = model.coef_ # Create subplot grid fig, axes = plt.subplots(1, p, figsize=figsize) if p == 1: axes = [axes] for j, (ax, name, beta) in enumerate(zip(axes, feature_names, betas)): # Compute partial residuals partial_resid = residuals + beta * X[:, j] # Plot ax.scatter(X[:, j], partial_resid, alpha=0.6, edgecolors='k', linewidth=0.5) # Add the fitted component line (slope = beta) x_range = np.linspace(X[:, j].min(), X[:, j].max(), 100) ax.plot(x_range, beta * x_range, 'r-', linewidth=2, label=f'Linear fit (β={beta:.3f})') # Add LOWESS smoother to detect nonlinearity smoothed = lowess(partial_resid, X[:, j], frac=0.5) ax.plot(smoothed[:, 0], smoothed[:, 1], 'b--', linewidth=2, label='LOWESS smoother') ax.set_xlabel(name, fontsize=12) ax.set_ylabel(f'e + β_{name}·{name}', fontsize=10) ax.set_title(f'Partial Residual: {name}', fontweight='bold') ax.legend(fontsize=8) plt.tight_layout() return fig # Demonstration: detecting need for quadratic termnp.random.seed(42)n = 200 # X1 has a linear effect, X2 has a quadratic effect (which we'll miss initially)X1 = np.random.uniform(-3, 3, n)X2 = np.random.uniform(-3, 3, n)y = 2 + 3*X1 + 1.5*X2 + 0.8*X2**2 + np.random.randn(n) X = np.column_stack([X1, X2]) # Create partial residual plotsfig = partial_residual_plots(X, y, feature_names=['X1 (linear)', 'X2 (quadratic ignored)'])plt.suptitle('Partial Residual Plots Reveal Missing Quadratic Term in X2', fontsize=14, fontweight='bold', y=1.02)plt.savefig('partial_residual_demo.png', dpi=150, bbox_inches='tight')plt.show() # The plot for X2 will show clear curvature, indicating we need X2² termAdded variable plots (AVPs), also called partial regression plots, serve a different purpose from partial residual plots. They visualize the unique contribution of a predictor after controlling for other predictors, revealing the marginal relationship between $x_j$ and $y$.
For predictor $x_j$:
The slope of the regression line through this scatter equals $\hat{\beta}_j$ from the full model (Frisch-Waugh-Lovell theorem).
AVPs show the marginal relationship between $x_j$ and $y$—what $x_j$ explains that no other predictor can. This is crucial because:
| Feature | Partial Residual Plot | Added Variable Plot |
|---|---|---|
| Purpose | Detect functional form (nonlinearity) | Visualize marginal/unique effect |
| X-axis | Original predictor values | Residualized predictor values |
| Y-axis | Residual + β·x | Residualized response |
| Slope meaning | Approximate marginal effect | Exact coefficient β_j |
| Reveals | Need for transformations | Unique contribution after controlling |
| Outlier detection | Y-direction outliers | High leverage points for x_j |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegression def added_variable_plots(X, y, feature_names=None, figsize=(14, 5)): """ Create added variable (partial regression) plots. For each predictor x_j: 1. Regress y on all X except x_j -> residuals = y_tilde 2. Regress x_j on all X except x_j -> residuals = x_tilde 3. Plot y_tilde vs x_tilde The slope of this plot equals the coefficient β_j from full model. """ n, p = X.shape if feature_names is None: feature_names = [f'X{i+1}' for i in range(p)] # Fit full model to get coefficients for comparison full_model = LinearRegression() full_model.fit(X, y) full_betas = full_model.coef_ # Create subplot grid fig, axes = plt.subplots(1, p, figsize=figsize) if p == 1: axes = [axes] for j, (ax, name, full_beta) in enumerate(zip(axes, feature_names, full_betas)): # Get indices of other predictors other_idx = [i for i in range(p) if i != j] X_other = X[:, other_idx] # Regress y on other predictors model_y = LinearRegression() model_y.fit(X_other, y) y_tilde = y - model_y.predict(X_other) # Regress x_j on other predictors model_x = LinearRegression() model_x.fit(X_other, X[:, j]) x_tilde = X[:, j] - model_x.predict(X_other) # Plot the added variable plot ax.scatter(x_tilde, y_tilde, alpha=0.6, edgecolors='k', linewidth=0.5) # Fit line through the AVP (slope should equal full_beta) avp_slope = np.cov(x_tilde, y_tilde)[0, 1] / np.var(x_tilde) x_range = np.linspace(x_tilde.min(), x_tilde.max(), 100) ax.plot(x_range, avp_slope * x_range, 'r-', linewidth=2, label=f'AVP slope: {avp_slope:.4f}') ax.axhline(0, color='gray', linestyle='--', alpha=0.5) ax.axvline(0, color='gray', linestyle='--', alpha=0.5) ax.set_xlabel(f'{name} | Others', fontsize=10) ax.set_ylabel(f'Y | Others', fontsize=10) ax.set_title(f'Added Variable: {name}(Full model β={full_beta:.4f})', fontweight='bold') ax.legend(fontsize=8) plt.tight_layout() return fig # Demonstration: multicollinearity impactnp.random.seed(42)n = 200 # Create correlated predictorsX1 = np.random.randn(n)X2 = 0.9 * X1 + 0.1 * np.random.randn(n) # X2 highly correlated with X1X3 = np.random.randn(n) # X3 independent # True model: y depends on X1 and X3, not directly on X2y = 2 + 3*X1 + 1.5*X3 + np.random.randn(n) X = np.column_stack([X1, X2, X3]) # Create added variable plotsfig = added_variable_plots(X, y, feature_names=['X1', 'X2 (corr with X1)', 'X3'])plt.suptitle('Added Variable Plots: X2 Shows Weak Marginal Effect (As Expected)', fontsize=12, fontweight='bold', y=1.05)plt.savefig('added_variable_demo.png', dpi=150, bbox_inches='tight')plt.show()In time series or spatially ordered data, residuals may exhibit autocorrelation—correlation between residuals at different time points or locations. This violates the independence assumption and has serious consequences for inference.
When residuals are autocorrelated:
This is a severe problem: your model appears to work better than it actually does.
Visual Method: Residual Time Series Plot
Plot residuals in their natural order (time index, spatial index, etc.). Look for:
Formal Test: Durbin-Watson Statistic
The Durbin-Watson (DW) test detects first-order autocorrelation: $$d = \frac{\sum_{t=2}^{n}(e_t - e_{t-1})^2}{\sum_{t=1}^{n}e_t^2}$$
Interpretation:
The test has complex critical values depending on sample size and number of predictors, but a rule of thumb is: $1.5 < d < 2.5$ suggests acceptable levels.
The DW test only detects first-order autocorrelation (correlation with immediately adjacent residuals). Higher-order autocorrelation requires other tests like the Breusch-Godfrey test. Also, DW is invalid if the model includes lagged dependent variables as predictors.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
import numpy as npimport matplotlib.pyplot as pltfrom scipy import statsfrom statsmodels.stats.stattools import durbin_watsonfrom statsmodels.graphics.tsaplots import plot_acf def autocorrelation_diagnostics(residuals, max_lags=20, figsize=(14, 4)): """ Comprehensive autocorrelation diagnostics for residuals. Parameters: ----------- residuals : array-like Residuals from regression (in time order) max_lags : int Maximum number of lags for ACF plot """ e = np.array(residuals) n = len(e) # Compute Durbin-Watson statistic dw = durbin_watson(e) # Compute first-order autocorrelation coefficient rho1 = np.corrcoef(e[:-1], e[1:])[0, 1] # Create diagnostic plots fig, axes = plt.subplots(1, 3, figsize=figsize) # Plot 1: Residuals over time ax1 = axes[0] ax1.plot(range(n), e, 'b-', alpha=0.7, linewidth=1) ax1.scatter(range(n), e, s=20, alpha=0.6) ax1.axhline(0, color='red', linestyle='--', linewidth=1.5) ax1.fill_between(range(n), e, 0, alpha=0.2) ax1.set_xlabel('Observation Order (Time)', fontsize=11) ax1.set_ylabel('Residual', fontsize=11) ax1.set_title('Residuals vs. Order', fontweight='bold') # Plot 2: Lag-1 scatter plot ax2 = axes[1] ax2.scatter(e[:-1], e[1:], alpha=0.6, edgecolors='k', linewidth=0.5) ax2.axhline(0, color='gray', linestyle='--', alpha=0.5) ax2.axvline(0, color='gray', linestyle='--', alpha=0.5) # Add regression line slope = rho1 x_range = np.linspace(e.min(), e.max(), 100) ax2.plot(x_range, slope * x_range, 'r-', linewidth=2, label=f'ρ₁ = {rho1:.3f}') ax2.set_xlabel('e(t-1)', fontsize=11) ax2.set_ylabel('e(t)', fontsize=11) ax2.set_title('Lag-1 Scatter Plot', fontweight='bold') ax2.legend() # Plot 3: Autocorrelation function ax3 = axes[2] plot_acf(e, ax=ax3, lags=max_lags, alpha=0.05) ax3.set_xlabel('Lag', fontsize=11) ax3.set_title('Autocorrelation Function (ACF)', fontweight='bold') plt.suptitle(f'Durbin-Watson: {dw:.3f} | First-order autocorr: {rho1:.3f}', fontsize=12, fontweight='bold', y=1.02) plt.tight_layout() return { 'durbin_watson': dw, 'rho1': rho1, 'figure': fig } # Demonstration: compare independent vs autocorrelated residualsnp.random.seed(42)n = 100t = np.arange(n) # Case 1: No autocorrelatione_independent = np.random.randn(n) # Case 2: Positive autocorrelation (AR(1) process)rho = 0.8e_positive = np.zeros(n)e_positive[0] = np.random.randn()for i in range(1, n): e_positive[i] = rho * e_positive[i-1] + np.sqrt(1-rho**2) * np.random.randn() # Case 3: Negative autocorrelatione_negative = np.zeros(n)e_negative[0] = np.random.randn()for i in range(1, n): e_negative[i] = -0.6 * e_negative[i-1] + np.sqrt(1-0.36) * np.random.randn() # Run diagnostics on eachprint("=== Independent Residuals ===")results_ind = autocorrelation_diagnostics(e_independent)print(f"D-W: {results_ind['durbin_watson']:.3f}, ρ₁: {results_ind['rho1']:.3f}")plt.savefig('autocorr_independent.png', dpi=150, bbox_inches='tight')plt.show() print("=== Positive Autocorrelation ===")results_pos = autocorrelation_diagnostics(e_positive)print(f"D-W: {results_pos['durbin_watson']:.3f}, ρ₁: {results_pos['rho1']:.3f}")plt.savefig('autocorr_positive.png', dpi=150, bbox_inches='tight')plt.show()Residual analysis is the first line of defense against flawed regression models. A model that looks great on summary statistics can harbor deep structural problems that only residual plots reveal.
You now have the foundational toolkit for residual analysis. In the next page, we'll dive deep into normality testing—examining why the normality assumption matters, which tests to use, and how to interpret departures from normality in your regression residuals.