Loading learning content...
Throughout our treatment of linear regression, we've implicitly assumed that all observations are equally reliable. Every residual (yᵢ - xᵢᵀβ)² contributes equally to the loss function.
But this assumption often fails in practice:
Weighted Least Squares (WLS) generalizes OLS by allowing each observation to have a different weight in the objective function.
By the end of this page, you will: (1) understand when and why to use weighted least squares, (2) derive the WLS estimator and its properties, (3) implement WLS efficiently, and (4) connect WLS to generalized least squares and iteratively reweighted methods.
Recall the OLS assumptions:
When assumption 3 (homoscedasticity) fails, we have heteroscedasticity:
$$\text{Var}(\varepsilon_i) = \sigma_i^2$$
where σᵢ² varies across observations.
Consequences of Ignoring Heteroscedasticity:
The Core Insight:
If observation i has variance σᵢ², it should receive weight wᵢ = 1/σᵢ². High-variance (unreliable) observations get low weight; low-variance (precise) observations get high weight.
| Domain | Pattern | Cause |
|---|---|---|
| Financial data | Volatility clustering | Variance increases during market stress |
| Survey data | Group size effects | Aggregate means from different sample sizes |
| Sensor data | Calibration variation | Different sensors have different precision |
| Time series | Level-dependent variance | Larger values have proportionally larger errors |
| Cross-sectional | Scale effects | Large companies have larger absolute variations |
WLS handles cases where we know the relative variances. More generally, Generalized Least Squares (GLS) handles both heteroscedasticity AND correlated errors (e.g., time series autocorrelation). WLS is the special case of GLS with diagonal covariance matrix.
The WLS Objective:
Instead of minimizing the sum of squared residuals, WLS minimizes the weighted sum:
$$\mathcal{L}{WLS}(\beta) = \sum{i=1}^{n} w_i (y_i - x_i^\top \beta)^2 = (y - X\beta)^\top W (y - X\beta)$$
where W = diag(w₁, w₂, ..., wₙ) is the diagonal weight matrix.
Matrix Form:
Let W^(1/2) = diag(√w₁, ..., √wₙ). Then we can write:
$$\mathcal{L}_{WLS}(\beta) = |W^{1/2}(y - X\beta)|^2 = |\tilde{y} - \tilde{X}\beta|^2$$
where ỹ = W^(1/2)y and X̃ = W^(1/2)X are weighted versions of the data.
This is just OLS on transformed data!
Derivation of the WLS Solution:
Taking the gradient and setting to zero:
$$\nabla_\beta \mathcal{L}_{WLS} = -2X^\top W(y - X\beta) = 0$$
$$X^\top W X \beta = X^\top W y$$
The WLS estimator:
$$\boxed{\hat{\beta}_{WLS} = (X^\top W X)^{-1} X^\top W y}$$
This is the normal equations with W inserted. Compare with OLS: β̂_OLS = (XᵀX)⁻¹Xᵀy.
Equivalently via transformation:
$$\hat{\beta}{WLS} = (\tilde{X}^\top \tilde{X})^{-1} \tilde{X}^\top \tilde{y} = \hat{\beta}{OLS}(\tilde{X}, \tilde{y})$$
To solve WLS, you can either: (1) modify your solver to handle weights, or (2) transform data by √wᵢ and use standard OLS. The second approach lets you reuse all your OLS machinery—QR decomposition, regularization, etc.—without modification.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
import numpy as npfrom scipy.linalg import solve, cholesky def weighted_least_squares(X, y, weights): """ Solve weighted least squares via normal equations. Parameters: ----------- X : ndarray of shape (n, p) Design matrix y : ndarray of shape (n,) Target vector weights : ndarray of shape (n,) Observation weights (must be positive) Returns: -------- beta : ndarray of shape (p,) WLS coefficient estimates """ n, p = X.shape # Construct weight matrix W (diagonal) W = np.diag(weights) # Normal equations: (X^T W X) beta = X^T W y XtW = X.T @ W # (p, n) XtWX = XtW @ X # (p, p) XtWy = XtW @ y # (p,) # Solve via Cholesky (since XtWX is symmetric positive definite) try: L = cholesky(XtWX, lower=True) z = solve(L, XtWy, lower=True) beta = solve(L.T, z, lower=False) except np.linalg.LinAlgError: # Fall back to lstsq if singular beta = np.linalg.lstsq(XtWX, XtWy, rcond=None)[0] return beta def wls_via_transformation(X, y, weights): """ Solve WLS by transforming to standard OLS. Transform: X̃ = W^{1/2} X, ỹ = W^{1/2} y Then solve: X̃.T @ X̃ @ beta = X̃.T @ ỹ This approach lets us use standard OLS machinery (QR, SVD, etc.) """ sqrt_w = np.sqrt(weights) # Transform data X_tilde = X * sqrt_w[:, np.newaxis] # Multiply each row by sqrt(w_i) y_tilde = y * sqrt_w # Standard OLS on transformed data beta = np.linalg.lstsq(X_tilde, y_tilde, rcond=None)[0] return beta def demonstrate_wls(): """ Demonstrate WLS on heteroscedastic data. """ np.random.seed(42) n, p = 200, 3 X = np.column_stack([np.ones(n), np.random.randn(n, p-1)]) beta_true = np.array([2.0, -1.0, 0.5]) # Heteroscedastic errors: variance increases with X[:, 1] sigma = 0.5 + 2.0 * np.abs(X[:, 1]) # Standard deviation epsilon = sigma * np.random.randn(n) y = X @ beta_true + epsilon # True weights (inverse variance) true_weights = 1.0 / sigma**2 # OLS (ignoring heteroscedasticity) beta_ols = np.linalg.lstsq(X, y, rcond=None)[0] # WLS with true weights beta_wls = weighted_least_squares(X, y, true_weights) # WLS via transformation (should match) beta_wls2 = wls_via_transformation(X, y, true_weights) print("=== WLS vs OLS on Heteroscedastic Data ===") print(f"True β: {beta_true}") print(f"OLS β: {np.round(beta_ols, 4)}") print(f"WLS β: {np.round(beta_wls, 4)}") print(f"WLS (transform) β: {np.round(beta_wls2, 4)}") print() print(f"OLS error: {np.linalg.norm(beta_ols - beta_true):.4f}") print(f"WLS error: {np.linalg.norm(beta_wls - beta_true):.4f}") # Simulate many times to compare efficiency n_sim = 1000 ols_estimates = [] wls_estimates = [] for _ in range(n_sim): epsilon = sigma * np.random.randn(n) y_sim = X @ beta_true + epsilon ols_estimates.append(np.linalg.lstsq(X, y_sim, rcond=None)[0]) wls_estimates.append(weighted_least_squares(X, y_sim, true_weights)) ols_estimates = np.array(ols_estimates) wls_estimates = np.array(wls_estimates) print(f"\nVariance of estimates (avg over {n_sim} simulations):") print(f" OLS variance: {np.var(ols_estimates, axis=0).sum():.6f}") print(f" WLS variance: {np.var(wls_estimates, axis=0).sum():.6f}") print(f" Efficiency gain: {np.var(ols_estimates, axis=0).sum() / np.var(wls_estimates, axis=0).sum():.2f}x") demonstrate_wls()When Weights Are Correct (wᵢ = 1/σᵢ²):
If the weights are chosen to be the inverse variances of the errors, WLS has remarkable properties:
1. Unbiasedness: $$E[\hat{\beta}_{WLS}] = \beta$$
WLS remains unbiased (same as OLS).
2. BLUE (Best Linear Unbiased Estimator):
Under heteroscedasticity, WLS is BLUE—it achieves the minimum variance among all linear unbiased estimators. OLS is no longer BLUE under heteroscedasticity.
3. Variance of WLS Estimator: $$\text{Var}(\hat{\beta}_{WLS}) = (X^\top W X)^{-1}$$
when the weights equal inverse variances.
This is the Gauss-Markov theorem extended to the heteroscedastic case.
When Weights Are Misspecified:
In practice, we rarely know the true σᵢ². What happens if our weights are wrong?
1. Still unbiased: WLS remains unbiased for any positive weights (assuming the linear model is correct).
2. No longer BLUE: You lose efficiency compared to using the correct weights.
3. Standard errors need correction: The naive variance formula (XᵀWX)⁻¹ is wrong. Use the sandwich estimator:
$$\text{Var}(\hat{\beta}_{WLS}) = (X^\top W X)^{-1} X^\top W \Sigma W X (X^\top W X)^{-1}$$
where Σ = diag(σ₁², ..., σₙ²) is the true error covariance.
This is often called the "robust" or "heteroscedasticity-consistent" covariance estimator.
To compute optimal weights, you need to know the error variances σᵢ². But to estimate σᵢ², you need the residuals. And to get good residuals, you need a good fit. This circular dependency is why WLS often requires iterative approaches or external variance estimates.
| Scenario | OLS | WLS (correct weights) | WLS (wrong weights) |
|---|---|---|---|
| Homoscedastic data | BLUE | Same as OLS (if W∝I) | Unbiased, inefficient |
| Heteroscedastic data | Unbiased but not BLUE | BLUE | Unbiased but not BLUE |
| Correct standard errors? | No (wrong variance) | Yes | No (need sandwich) |
Since true variances are unknown, we need practical strategies to estimate or choose weights:
Strategy 1: Squared Residual Regression
This is called Feasible GLS (FGLS) when iterated.
Strategy 2: Group-Based Weights
If observations naturally fall into groups with different variances:
Example: Panel data with different time periods having different volatility.
Strategy 3: Known Functional Form
Sometimes theory suggests a variance structure:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
import numpy as np def feasible_gls(X, y, max_iter=10, tol=1e-6): """ Feasible Generalized Least Squares (FGLS). Iteratively estimates variance structure and re-fits WLS. Parameters: ----------- X : ndarray of shape (n, p) Design matrix y : ndarray of shape (n,) Target vector max_iter : int Maximum iterations tol : float Convergence tolerance Returns: -------- beta : ndarray of shape (p,) FGLS coefficient estimates weights : ndarray of shape (n,) Estimated optimal weights """ n, p = X.shape # Step 1: Initial OLS fit beta = np.linalg.lstsq(X, y, rcond=None)[0] for iteration in range(max_iter): beta_old = beta.copy() # Step 2: Compute residuals residuals = y - X @ beta # Step 3: Model the variance # Simple approach: regress log(resid²) on log(|predictions|) # This assumes variance proportional to some power of E[y] log_resid_sq = np.log(residuals**2 + 1e-10) # Avoid log(0) y_pred = X @ beta # Fit a simple model for variance: log(σ²) = α + β₁·log(|ŷ|) # This captures multiplicative heteroscedasticity design_var = np.column_stack([ np.ones(n), np.log(np.abs(y_pred) + 1) # Avoid log(0) ]) gamma = np.linalg.lstsq(design_var, log_resid_sq, rcond=None)[0] # Predicted variances log_var_pred = design_var @ gamma var_pred = np.exp(log_var_pred) # Weights = 1/variance weights = 1.0 / np.maximum(var_pred, 1e-10) # Step 4: WLS with estimated weights sqrt_w = np.sqrt(weights) X_weighted = X * sqrt_w[:, np.newaxis] y_weighted = y * sqrt_w beta = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)[0] # Check convergence if np.linalg.norm(beta - beta_old) < tol: print(f"FGLS converged in {iteration + 1} iterations") break return beta, weights def group_based_weights(X, y, groups): """ Estimate weights from group-specific variances. Parameters: ----------- X : ndarray of shape (n, p) Design matrix y : ndarray of shape (n,) Target vector groups : ndarray of shape (n,) Group labels for each observation Returns: -------- beta : ndarray of shape (p,) WLS estimates weights : ndarray of shape (n,) Group-based weights """ unique_groups = np.unique(groups) # Initial OLS beta = np.linalg.lstsq(X, y, rcond=None)[0] residuals = y - X @ beta # Estimate variance per group weights = np.zeros(len(y)) for g in unique_groups: mask = groups == g group_resid = residuals[mask] group_var = np.var(group_resid, ddof=1) weights[mask] = 1.0 / group_var # WLS with group weights sqrt_w = np.sqrt(weights) X_weighted = X * sqrt_w[:, np.newaxis] y_weighted = y * sqrt_w beta = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)[0] return beta, weights def demonstrate_weight_estimation(): """ Compare different weight estimation strategies. """ np.random.seed(42) n, p = 300, 3 X = np.column_stack([np.ones(n), np.random.randn(n, p-1)]) beta_true = np.array([1.0, 2.0, -1.0]) # Create heteroscedastic data: variance increases with |X @ beta| signal = X @ beta_true sigma = 0.5 + 0.5 * np.abs(signal) epsilon = sigma * np.random.randn(n) y = signal + epsilon # True optimal weights true_weights = 1.0 / sigma**2 # Method 1: OLS (ignores heteroscedasticity) beta_ols = np.linalg.lstsq(X, y, rcond=None)[0] # Method 2: WLS with true weights (oracle) sqrt_w = np.sqrt(true_weights) beta_oracle = np.linalg.lstsq(X * sqrt_w[:, np.newaxis], y * sqrt_w, rcond=None)[0] # Method 3: FGLS (estimated weights) beta_fgls, weights_fgls = feasible_gls(X, y) print("=== Weight Estimation Comparison ===") print(f"True β: {beta_true}") print(f"OLS: {np.round(beta_ols, 4)} (error: {np.linalg.norm(beta_ols - beta_true):.4f})") print(f"Oracle: {np.round(beta_oracle, 4)} (error: {np.linalg.norm(beta_oracle - beta_true):.4f})") print(f"FGLS: {np.round(beta_fgls, 4)} (error: {np.linalg.norm(beta_fgls - beta_true):.4f})") # Check weight quality weight_corr = np.corrcoef(true_weights, weights_fgls)[0, 1] print(f"\nCorrelation between true and estimated weights: {weight_corr:.3f}") demonstrate_weight_estimation()Iteratively Reweighted Least Squares (IRLS) is a powerful technique that recasts certain optimization problems as a sequence of weighted least squares problems. It's used extensively in:
The General Framework:
Many loss functions can be written as:
$$\mathcal{L}(\beta) = \sum_{i=1}^{n} \rho(y_i - x_i^\top \beta)$$
for some function ρ. If ρ(r) can be expressed as:
$$\rho(r) = w(r) \cdot r^2$$
then minimizing is equivalent to iteratively solving:
Example: Huber Regression
The Huber loss is quadratic for small residuals and linear for large ones:
$$\rho_{Huber}(r) = \begin{cases} \frac{1}{2}r^2 & |r| \leq \delta \ \delta|r| - \frac{1}{2}\delta^2 & |r| > \delta \end{cases}$$
The weight function becomes:
$$w(r) = \begin{cases} 1 & |r| \leq \delta \ \delta / |r| & |r| > \delta \end{cases}$$
Large residuals (outliers) get downweighted, making the estimator robust.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160
import numpy as np def huber_weights(residuals, delta=1.345): """ Compute Huber weights for given residuals. Parameters: ----------- residuals : ndarray Current residuals delta : float Threshold for switching from quadratic to linear loss 1.345 gives 95% efficiency at the normal distribution """ abs_r = np.abs(residuals) weights = np.ones_like(residuals) # For large residuals, downweight mask = abs_r > delta weights[mask] = delta / abs_r[mask] return weights def irls_huber(X, y, delta=1.345, max_iter=50, tol=1e-6): """ Huber's M-estimator via IRLS. Robust to outliers: downweights observations with large residuals. """ n, p = X.shape # Initialize with OLS beta = np.linalg.lstsq(X, y, rcond=None)[0] for iteration in range(max_iter): beta_old = beta.copy() # Compute residuals residuals = y - X @ beta # Scale residuals by MAD (robust scale estimate) mad = np.median(np.abs(residuals - np.median(residuals))) scale = mad / 0.6745 # Consistent at normal distribution scaled_residuals = residuals / (scale + 1e-10) # Compute Huber weights weights = huber_weights(scaled_residuals, delta) # WLS step sqrt_w = np.sqrt(weights) X_w = X * sqrt_w[:, np.newaxis] y_w = y * sqrt_w beta = np.linalg.lstsq(X_w, y_w, rcond=None)[0] # Check convergence if np.linalg.norm(beta - beta_old) < tol: print(f"IRLS-Huber converged in {iteration + 1} iterations") break return beta, weights def bisquare_weights(residuals, c=4.685): """ Tukey's bisquare weights (more aggressive downweighting). c=4.685 gives 95% efficiency at the normal distribution. """ abs_r = np.abs(residuals) weights = np.zeros_like(residuals) mask = abs_r <= c u = residuals[mask] / c weights[mask] = (1 - u**2)**2 return weights def irls_bisquare(X, y, c=4.685, max_iter=50, tol=1e-6): """ Tukey's bisquare M-estimator via IRLS. More robust than Huber: completely ignores extreme outliers. """ n, p = X.shape # Initialize with OLS beta = np.linalg.lstsq(X, y, rcond=None)[0] for iteration in range(max_iter): beta_old = beta.copy() residuals = y - X @ beta mad = np.median(np.abs(residuals - np.median(residuals))) scale = mad / 0.6745 scaled_residuals = residuals / (scale + 1e-10) weights = bisquare_weights(scaled_residuals, c) # Avoid all-zero weights if weights.sum() < 0.1: print("Warning: too many outliers, reducing c") c *= 1.5 weights = bisquare_weights(scaled_residuals, c) sqrt_w = np.sqrt(weights) X_w = X * sqrt_w[:, np.newaxis] y_w = y * sqrt_w beta = np.linalg.lstsq(X_w, y_w, rcond=None)[0] if np.linalg.norm(beta - beta_old) < tol: print(f"IRLS-Bisquare converged in {iteration + 1} iterations") break return beta, weights def demonstrate_robust_regression(): """ Compare OLS, Huber, and Bisquare on contaminated data. """ np.random.seed(42) n, p = 100, 3 X = np.column_stack([np.ones(n), np.random.randn(n, p-1)]) beta_true = np.array([1.0, 2.0, -1.0]) # Clean data y_clean = X @ beta_true + 0.5 * np.random.randn(n) # Add outliers (10% contamination) n_outliers = 10 outlier_idx = np.random.choice(n, n_outliers, replace=False) y_contaminated = y_clean.copy() y_contaminated[outlier_idx] += np.random.choice([-1, 1], n_outliers) * 20 print("=== Robust Regression Comparison ===") print(f"True β: {beta_true}") print(f"Data: {n} observations, {n_outliers} outliers (10% contamination)") print() # OLS (sensitive to outliers) beta_ols = np.linalg.lstsq(X, y_contaminated, rcond=None)[0] print(f"OLS: {np.round(beta_ols, 3)} (error: {np.linalg.norm(beta_ols - beta_true):.3f})") # Huber (moderate robustness) beta_huber, w_huber = irls_huber(X, y_contaminated) print(f"Huber: {np.round(beta_huber, 3)} (error: {np.linalg.norm(beta_huber - beta_true):.3f})") # Bisquare (high robustness) beta_bisq, w_bisq = irls_bisquare(X, y_contaminated) print(f"Bisquare: {np.round(beta_bisq, 3)} (error: {np.linalg.norm(beta_bisq - beta_true):.3f})") # Show which points were downweighted print(f"\nOutlier indices: {sorted(outlier_idx)}") print(f"Points with Huber weight < 0.5: {np.where(w_huber < 0.5)[0]}") print(f"Points with Bisquare weight < 0.5: {np.where(w_bisq < 0.5)[0]}") demonstrate_robust_regression()Generalized Linear Models (logistic, Poisson regression) are fit using IRLS with weights derived from the variance function. The famous Fisher scoring algorithm is essentially IRLS. We'll cover this in detail in the GLM chapter.
Generalized Least Squares (GLS) extends WLS to handle correlated errors, not just heteroscedasticity.
Model: $$y = X\beta + \varepsilon, \quad \text{Var}(\varepsilon) = \sigma^2 \Omega$$
where Ω is a general (not necessarily diagonal) covariance structure.
Examples of correlated errors:
The GLS Estimator:
If we know Ω, the GLS estimator is:
$$\hat{\beta}_{GLS} = (X^\top \Omega^{-1} X)^{-1} X^\top \Omega^{-1} y$$
Notice this is WLS with W = Ω⁻¹, but now W is not diagonal.
Transformation Approach:
Since Ω is positive definite, we can factor Ω = LL^ᵀ (Cholesky). Then:
This is called decorrelation or prewhitening.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
import numpy as npfrom scipy.linalg import cholesky, solve_triangular def gls(X, y, Omega): """ Generalized Least Squares estimator. Parameters: ----------- X : ndarray of shape (n, p) Design matrix y : ndarray of shape (n,) Target vector Omega : ndarray of shape (n, n) Error covariance matrix (positive definite) Returns: -------- beta : ndarray of shape (p,) GLS coefficient estimates """ # Method 1: Direct formula (requires inverting Omega) # beta = solve(X.T @ solve(Omega, X), X.T @ solve(Omega, y)) # Method 2: Transform via Cholesky (more stable) # Omega = L @ L.T, so Omega^{-1} = L^{-T} @ L^{-1} L = cholesky(Omega, lower=True) # Transform: X_tilde = L^{-1} @ X, y_tilde = L^{-1} @ y X_tilde = solve_triangular(L, X, lower=True) y_tilde = solve_triangular(L, y, lower=True) # Now apply OLS to transformed data beta = np.linalg.lstsq(X_tilde, y_tilde, rcond=None)[0] return beta def ar1_covariance(n, rho): """ Construct AR(1) correlation matrix. Corr(ε_i, ε_j) = rho^|i-j| """ idx = np.arange(n) return rho ** np.abs(idx[:, np.newaxis] - idx[np.newaxis, :]) def demonstrate_gls(): """ Demonstrate GLS on time series data with autocorrelated errors. """ np.random.seed(42) n, p = 100, 2 rho = 0.8 # Strong autocorrelation # Design matrix (time trend) t = np.linspace(0, 1, n) X = np.column_stack([np.ones(n), t]) beta_true = np.array([1.0, 2.0]) # Generate AR(1) errors sigma_epsilon = 0.5 epsilon = np.zeros(n) epsilon[0] = np.random.randn() for i in range(1, n): epsilon[i] = rho * epsilon[i-1] + np.sqrt(1 - rho**2) * np.random.randn() epsilon *= sigma_epsilon y = X @ beta_true + epsilon # True covariance structure Omega = ar1_covariance(n, rho) # OLS (ignores autocorrelation) beta_ols = np.linalg.lstsq(X, y, rcond=None)[0] # GLS (uses true covariance) beta_gls = gls(X, y, Omega) print("=== GLS vs OLS with Autocorrelated Errors ===") print(f"True β: {beta_true}") print(f"Autocorrelation (ρ): {rho}") print() print(f"OLS: {np.round(beta_ols, 4)} (error: {np.linalg.norm(beta_ols - beta_true):.4f})") print(f"GLS: {np.round(beta_gls, 4)} (error: {np.linalg.norm(beta_gls - beta_true):.4f})") # Simulate many times to compare efficiency n_sim = 500 ols_estimates = [] gls_estimates = [] for _ in range(n_sim): # Generate new AR(1) errors epsilon = np.zeros(n) epsilon[0] = np.random.randn() for i in range(1, n): epsilon[i] = rho * epsilon[i-1] + np.sqrt(1 - rho**2) * np.random.randn() epsilon *= sigma_epsilon y_sim = X @ beta_true + epsilon ols_estimates.append(np.linalg.lstsq(X, y_sim, rcond=None)[0]) gls_estimates.append(gls(X, y_sim, Omega)) ols_estimates = np.array(ols_estimates) gls_estimates = np.array(gls_estimates) print(f"\nEstimator variance (over {n_sim} simulations):") print(f" OLS variance: {np.var(ols_estimates, axis=0)}") print(f" GLS variance: {np.var(gls_estimates, axis=0)}") print(f" Efficiency gain: {np.var(ols_estimates, axis=0).sum() / np.var(gls_estimates, axis=0).sum():.2f}x") demonstrate_gls()In practice, Ω is unknown and must be estimated. Common approaches: (1) For time series, fit an AR/MA model to OLS residuals to estimate ρ. (2) For spatial data, use variogram fitting. (3) For clustered data, use a working correlation structure (GEE approach).
| Method | Error Structure | Weight Matrix | Use Case |
|---|---|---|---|
| OLS | iid, homoscedastic | W = I | Default when assumptions hold |
| WLS | Independent, heteroscedastic | W = diag(1/σᵢ²) | Known variance structure |
| FGLS | Independent, heteroscedastic | W = diag(1/σ̂ᵢ²) | Unknown variance structure |
| GLS | Correlated | W = Ω⁻¹ | Time series, spatial, clustered |
| IRLS | Varies | Iteratively computed | Robust regression, GLMs |
Module Complete!
You've now completed the Computational Considerations module, covering:
These computational skills transform theoretical knowledge into practical engineering capability—the difference between knowing linear regression and being able to deploy it reliably at scale.
Congratulations! You've mastered the computational foundations of linear regression. You now understand not just what linear regression computes, but how to compute it efficiently, stably, and correctly in real-world systems.