Loading content...
The breakdown point tells us when an estimator catastrophically fails—but what about gradual corruption? If we add a single observation at a particular value, how much does the estimate change? Does the effect depend on where the observation is located?
Influence functions answer these questions. Introduced by Frank Hampel in his landmark 1974 paper, the influence function describes how an infinitesimal perturbation at any point affects the estimator. It's the derivative of the estimator with respect to the data distribution.
Think of it this way:
Both perspectives are essential for understanding robustness:
The influence function also forms the foundation of robust statistics theory. It connects to:
This page develops the complete theory of influence functions, from definition to computation to practical applications in regression diagnostics.
By the end of this page, you will understand the formal definition of influence functions, how to compute and visualize influence for common estimators, the connection to M-estimator ψ-functions, and how to use influence diagnostics in regression practice.
Setup:
Let $T(F)$ be a statistical functional—an estimator viewed as a function of the underlying distribution $F$. For example:
The Influence Function:
The influence function of $T$ at distribution $F$ is:
$$\text{IF}(x; T, F) = \lim_{\varepsilon \to 0} \frac{T((1-\varepsilon)F + \varepsilon \delta_x) - T(F)}{\varepsilon}$$
where $\delta_x$ is the point mass distribution at $x$.
Interpretation:
Properties of a "good" influence function:
A robust estimator should have an influence function that is:
The influence function is essentially a Gâteaux derivative—the estimator's sensitivity to perturbations in a specific direction (adding mass at x). Just as derivatives describe local properties of functions, influence functions describe local properties of estimators.
1. Sample Mean
The influence function of the mean at distribution $F$ with mean $\mu$:
$$\text{IF}(x; T_{\text{mean}}, F) = x - \mu$$
Interpretation: The influence is linear in $x$. Points far from the mean have proportionally larger influence. This is unbounded—extreme observations have unlimited influence.
2. Sample Median
For a distribution with density $f$ and median $m$:
$$\text{IF}(x; T_{\text{med}}, F) = \frac{\text{sign}(x - m)}{2f(m)}$$
Interpretation: The influence is bounded! It's either $+1/(2f(m))$ or $-1/(2f(m))$ regardless of how extreme $x$ is. The median has bounded influence.
3. Huber M-estimator
For the Huber estimator with threshold $k$:
$$\text{IF}(x; T_{\text{Huber}}, F) = \psi_k(x - \mu) / \mathbb{E}[\psi'_k(X - \mu)]$$
where $\psi_k$ is the Huber psi-function.
Interpretation: Bounded at $\pm k / \mathbb{E}[\psi'_k]$. The influence is linear near the center and capped for extreme values.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats def influence_mean(x, mu=0): """Influence function of the mean""" return x - mu def influence_median(x, m=0, f_m=None): """ Influence function of the median. f_m is the density at the median. """ if f_m is None: # For standard normal, f(0) = 1/sqrt(2*pi) ≈ 0.399 f_m = 1 / np.sqrt(2 * np.pi) return np.sign(x - m) / (2 * f_m) def influence_huber(x, mu=0, k=1.345): """Influence function of Huber M-estimator""" r = x - mu psi = np.where(np.abs(r) <= k, r, k * np.sign(r)) # E[psi'] for standard normal with k=1.345 is approximately 0.79 E_psi_prime = 2 * stats.norm.cdf(k) - 1 return psi / E_psi_prime def influence_tukey(x, mu=0, c=4.685): """Influence function of Tukey's bisquare (redescending)""" r = x - mu psi = np.where(np.abs(r) <= c, r * (1 - (r/c)**2)**2, 0) # Approximate E[psi'] for standard normal E_psi_prime = 0.78 # approximate for c=4.685 return psi / E_psi_prime # Visualizationx = np.linspace(-8, 8, 1000) fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Influence functionsax1 = axes[0]ax1.plot(x, influence_mean(x), 'b--', label='Mean (unbounded)', linewidth=2)ax1.plot(x, influence_median(x), 'g-', label='Median (bounded, constant)', linewidth=2)ax1.plot(x, influence_huber(x), 'orange', label='Huber (bounded)', linewidth=2)ax1.plot(x, influence_tukey(x), 'r-', label='Tukey (redescending)', linewidth=2)ax1.axhline(y=0, color='gray', linestyle='-', alpha=0.3)ax1.set_xlabel('Observation Value (x)')ax1.set_ylabel('Influence Function IF(x)')ax1.set_title('Influence Functions of Location Estimators')ax1.legend()ax1.grid(True, alpha=0.3)ax1.set_xlim(-8, 8)ax1.set_ylim(-6, 6) # Squared influence (related to variance contribution)ax2 = axes[1]ax2.plot(x, influence_mean(x)**2, 'b--', label='Mean', linewidth=2)ax2.plot(x, influence_median(x)**2, 'g-', label='Median', linewidth=2)ax2.plot(x, influence_huber(x)**2, 'orange', label='Huber', linewidth=2)ax2.plot(x, influence_tukey(x)**2, 'r-', label='Tukey', linewidth=2)ax2.set_xlabel('Observation Value (x)')ax2.set_ylabel('IF(x)²')ax2.set_title('Squared Influence (Variance Contribution)')ax2.legend()ax2.grid(True, alpha=0.3)ax2.set_xlim(-8, 8)ax2.set_ylim(0, 20) plt.tight_layout()plt.show() # Key metricsprint("Gross-Error Sensitivity (GES) = sup|IF(x)|:")print(f" Mean: UNBOUNDED")print(f" Median: {influence_median(10):.3f}")print(f" Huber (k=1.345): {influence_huber(10):.3f}")print(f" Tukey (c=4.685): {max(abs(influence_tukey(x))):.3f} (then → 0)")The influence function at each point tells us local sensitivity. We can summarize this into global measures:
Gross-Error Sensitivity (GES):
$$\gamma^* = \sup_x |\text{IF}(x; T, F)|$$
The maximum possible influence of any single observation. For robust estimators, we want $\gamma^* < \infty$.
Local-Shift Sensitivity (LSS):
$$\lambda^* = \sup_x \left|\frac{d \text{IF}(x; T, F)}{dx}\right|$$
Measures sensitivity to small changes in observation locations. Related to sensitivity to rounding and grouping.
Rejection Point:
$$\rho^* = \inf{x > 0 : \text{IF}(x; T, F) = 0 \text{ for all } |x| > x}$$
For redescending estimators, this is where outliers are completely rejected.
| Estimator | GES (γ*) | Bounded? | Rejection Point (ρ*) |
|---|---|---|---|
| Mean | ∞ | No | None |
| Median | 1.25 | Yes | None |
| Huber (k=1.345) | 1.70 | Yes | None |
| Tukey (c=4.685) | 0.70 | Yes | 4.685 |
| Hampel | ≈1.7 | Yes | 8.5 (default) |
Tukey's bisquare has lower GES than Huber (0.70 vs 1.70) AND has a finite rejection point—extreme outliers have exactly zero influence. This comes at the cost of non-convexity and potential multiple local minima.
The influence function connects directly to the asymptotic variance of an estimator, providing a unified framework for understanding efficiency.
The Fundamental Result:
Under regularity conditions, for an estimator $T$ with influence function IF:
$$\sqrt{n}(T(\hat{F}_n) - T(F)) \xrightarrow{d} \mathcal{N}(0, V)$$
where the asymptotic variance is:
$$V = \mathbb{E}[\text{IF}(X; T, F)^2] = \int \text{IF}(x; T, F)^2 , dF(x)$$
Interpretation:
Efficiency Calculation:
The asymptotic relative efficiency (ARE) of estimator $T_1$ versus $T_2$ is:
$$\text{ARE}(T_1, T_2) = \frac{V_{T_2}}{V_{T_1}} = \frac{\mathbb{E}[\text{IF}_2^2]}{\mathbb{E}[\text{IF}_1^2]}$$
For location estimators under Gaussian $F$:
Bounded influence (small GES) tends to increase variance. The median has excellent robustness (bounded influence) but pays with ~36% efficiency loss. Huber cleverly balances—bounded influence with only ~5% efficiency loss.
In regression, influence analysis has been developed into practical diagnostic tools. These help identify which observations most affect the fitted model.
Leverage (Hat Matrix Diagonal):
The leverage of observation $i$ is:
$$h_{ii} = \mathbf{x}_i^\top (\mathbf{X}^\top\mathbf{X})^{-1} \mathbf{x}_i$$
Leverage measures how extreme observation $i$ is in X-space. High leverage points can be influential but aren't necessarily problematic.
Studentized Residuals:
$$t_i = \frac{r_i}{\hat{\sigma}{-i}\sqrt{1 - h{ii}}}$$
where $\hat{\sigma}_{-i}$ is the standard error estimated without observation $i$. Large $|t_i|$ indicates potential outliers in Y-space.
Cook's Distance:
$$D_i = \frac{(\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}{-i})^\top(\mathbf{X}^\top\mathbf{X})(\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}{-i})}{p \cdot \hat{\sigma}^2}$$
Cook's distance measures the overall change in fitted values when observation $i$ is removed. It combines leverage and residual size.
DFFITS:
$$\text{DFFITS}i = \frac{\hat{y}i - \hat{y}{i,-i}}{\hat{\sigma}{-i}\sqrt{h_{ii}}}$$
Standardized change in the predicted value for observation $i$ when that observation is omitted.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
import numpy as npfrom scipy import linalgimport matplotlib.pyplot as plt def regression_influence_diagnostics(X, y): """ Compute comprehensive influence diagnostics for OLS regression. Returns: -------- diagnostics : dict Dictionary containing leverage, residuals, Cook's D, DFFITS """ n, p = X.shape # OLS fit beta = linalg.lstsq(X, y)[0] y_hat = X @ beta residuals = y - y_hat # Hat matrix and leverage XtX_inv = linalg.inv(X.T @ X) H = X @ XtX_inv @ X.T leverage = np.diag(H) # MSE and standard error mse = np.sum(residuals**2) / (n - p) sigma = np.sqrt(mse) # Standardized residuals std_residuals = residuals / (sigma * np.sqrt(1 - leverage)) # Leave-one-out quantities (using Sherman-Morrison-Woodbury) sigma_loo = np.zeros(n) for i in range(n): r_loo = residuals[i] / (1 - leverage[i]) sse_loo = np.sum(residuals**2) - residuals[i]**2 / (1 - leverage[i]) sigma_loo[i] = np.sqrt(sse_loo / (n - p - 1)) # Studentized residuals (external) studentized = residuals / (sigma_loo * np.sqrt(1 - leverage)) # Cook's distance cooks_d = (std_residuals**2 * leverage) / (p * (1 - leverage)) # DFFITS dffits = studentized * np.sqrt(leverage / (1 - leverage)) return { 'leverage': leverage, 'std_residuals': std_residuals, 'studentized': studentized, 'cooks_d': cooks_d, 'dffits': dffits, 'coefficients': beta, 'y_hat': y_hat, 'residuals': residuals, } # Example with a high-influence pointnp.random.seed(42)n = 50 # Clean dataX_clean = np.random.randn(n-1, 1)y_clean = 2 + 3 * X_clean.flatten() + 0.5 * np.random.randn(n-1) # Add one high-leverage, high-influence pointX_outlier = np.array([[5.0]]) # Far in X-spacey_outlier = np.array([2 + 3*5 + 10]) # Also off the line X = np.vstack([X_clean, X_outlier])y = np.concatenate([y_clean, y_outlier])X_design = np.column_stack([np.ones(n), X]) # Compute diagnosticsdiag = regression_influence_diagnostics(X_design, y) # Find the outlieroutlier_idx = n - 1 print("=== Influence Diagnostics ===")print(f"Observation {outlier_idx} (the outlier):")print(f" Leverage: {diag['leverage'][outlier_idx]:.4f} (avg: {np.mean(diag['leverage']):.4f})")print(f" Studentized residual: {diag['studentized'][outlier_idx]:.4f}")print(f" Cook's distance: {diag['cooks_d'][outlier_idx]:.4f}")print(f" DFFITS: {diag['dffits'][outlier_idx]:.4f}") # Thresholdsprint(f"=== Diagnostic Thresholds ===")print(f"High leverage threshold (2p/n): {2*2/n:.4f}")print(f"Cook's D threshold (4/n): {4/n:.4f}")print(f"DFFITS threshold (2*sqrt(p/n)): {2*np.sqrt(2/n):.4f}") # Count problematic observationshigh_leverage = np.sum(diag['leverage'] > 2*2/n)high_cooks = np.sum(diag['cooks_d'] > 4/n)high_dffits = np.sum(np.abs(diag['dffits']) > 2*np.sqrt(2/n)) print(f"=== Number of Flagged Observations ===")print(f"High leverage: {high_leverage}")print(f"High Cook's D: {high_cooks}")print(f"High |DFFITS|: {high_dffits}")| Diagnostic | Threshold | Interpretation |
|---|---|---|
| Leverage | $h_{ii} > 2p/n$ or $3p/n$ | High leverage in X-space |
| Studentized residual | $|t_i| > 2$ or $3$ | Potential outlier in Y-space |
| Cook's D | $D_i > 4/n$ or $D_i > 1$ | High overall influence |
| DFFITS | $|DFFITS_i| > 2\sqrt{p/n}$ | High influence on own prediction |
| DFBETAS | $|DFBETAS_{ij}| > 2/\sqrt{n}$ | High influence on coefficient j |
High influence doesn't mean 'wrong.' An influential point might be the most informative observation in your dataset—or a critical data error. Always investigate before removing. Consider robust methods that automatically down-weight without requiring manual decisions.
Module Complete:
You have now completed the Robust Regression module. You understand:
Together, these tools equip you to build regression models that work reliably on real-world data—data that is messy, contains errors, and violates the idealized assumptions of classical statistics.
Congratulations! You have mastered robust regression—a critical skill for any practitioner working with real-world data. You can now choose appropriate robust methods, understand their theoretical guarantees, and diagnose potential problems in regression models.