Robust Regression - Learning Module

Loading content...

0/278

Influence Functions: Understanding Local Robustness

The Infinitesimal View of Robustness

The breakdown point tells us when an estimator catastrophically fails—but what about gradual corruption? If we add a single observation at a particular value, how much does the estimate change? Does the effect depend on where the observation is located?

Influence functions answer these questions. Introduced by Frank Hampel in his landmark 1974 paper, the influence function describes how an infinitesimal perturbation at any point affects the estimator. It's the derivative of the estimator with respect to the data distribution.

Think of it this way:

Breakdown point: "How much dynamite can this bridge handle before collapsing?"
Influence function: "How does the bridge flex when I add weight at each point?"

Both perspectives are essential for understanding robustness:

An estimator with bounded influence cannot be drastically pulled by any single observation
An estimator with high influence at extreme values is sensitive to outliers, even if it doesn't break down

The influence function also forms the foundation of robust statistics theory. It connects to:

Asymptotic variance (efficiency)
Sensitivity curves (finite-sample analog)
The ψ-function in M-estimation
Standard errors under model misspecification

This page develops the complete theory of influence functions, from definition to computation to practical applications in regression diagnostics.

What You Will Learn

By the end of this page, you will understand the formal definition of influence functions, how to compute and visualize influence for common estimators, the connection to M-estimator ψ-functions, and how to use influence diagnostics in regression practice.

The Formal Definition

Setup:

Let $T(F)$ be a statistical functional—an estimator viewed as a function of the underlying distribution $F$. For example:

Mean: $T_{\text{mean}}(F) = \int x , dF(x)$
Median: $T_{\text{med}}(F) = F^{-1}(0.5)$
Regression: $T_{\text{OLS}}(F_{XY})$ = coefficient minimizing expected squared error

The Influence Function:

The influence function of $T$ at distribution $F$ is:

$$\text{IF}(x; T, F) = \lim_{\varepsilon \to 0} \frac{T((1-\varepsilon)F + \varepsilon \delta_x) - T(F)}{\varepsilon}$$

where $\delta_x$ is the point mass distribution at $x$.

Interpretation:

$(1-\varepsilon)F + \varepsilon \delta_x$ is the "contaminated" distribution: fraction $(1-\varepsilon)$ from $F$, fraction $\varepsilon$ from a single point at $x$
As $\varepsilon \to 0$, we're adding an infinitesimal mass at $x$
IF$(x; T, F)$ measures the rate of change in the estimate when we add a tiny bit of probability at $x$

Properties of a "good" influence function:

A robust estimator should have an influence function that is:

Bounded: $\sup_x |\text{IF}(x; T, F)| < \infty$ (no single point can have unbounded influence)
Continuous: Small changes in observation location cause small changes in influence
Low maximum: The maximum influence (gross-error sensitivity) should be small

The Derivative Intuition

The influence function is essentially a Gâteaux derivative—the estimator's sensitivity to perturbations in a specific direction (adding mass at x). Just as derivatives describe local properties of functions, influence functions describe local properties of estimators.

Influence Functions of Key Estimators

1. Sample Mean

The influence function of the mean at distribution $F$ with mean $\mu$:

$$\text{IF}(x; T_{\text{mean}}, F) = x - \mu$$

Interpretation: The influence is linear in $x$. Points far from the mean have proportionally larger influence. This is unbounded—extreme observations have unlimited influence.

2. Sample Median

For a distribution with density $f$ and median $m$:

$$\text{IF}(x; T_{\text{med}}, F) = \frac{\text{sign}(x - m)}{2f(m)}$$

Interpretation: The influence is bounded! It's either $+1/(2f(m))$ or $-1/(2f(m))$ regardless of how extreme $x$ is. The median has bounded influence.

3. Huber M-estimator

For the Huber estimator with threshold $k$:

$$\text{IF}(x; T_{\text{Huber}}, F) = \psi_k(x - \mu) / \mathbb{E}[\psi'_k(X - \mu)]$$

where $\psi_k$ is the Huber psi-function.

Interpretation: Bounded at $\pm k / \mathbb{E}[\psi'_k]$. The influence is linear near the center and capped for extreme values.

influence_functions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
 
def influence_mean(x, mu=0):
    """Influence function of the mean"""
    return x - mu
 
def influence_median(x, m=0, f_m=None):
    """
    Influence function of the median.
    f_m is the density at the median.
    """
    if f_m is None:
        # For standard normal, f(0) = 1/sqrt(2*pi) ≈ 0.399
        f_m = 1 / np.sqrt(2 * np.pi)
    return np.sign(x - m) / (2 * f_m)
 
def influence_huber(x, mu=0, k=1.345):
    """Influence function of Huber M-estimator"""
    r = x - mu
    psi = np.where(np.abs(r) <= k, r, k * np.sign(r))
    # E[psi'] for standard normal with k=1.345 is approximately 0.79
    E_psi_prime = 2 * stats.norm.cdf(k) - 1
    return psi / E_psi_prime
 
def influence_tukey(x, mu=0, c=4.685):
    """Influence function of Tukey's bisquare (redescending)"""
    r = x - mu
    psi = np.where(np.abs(r) <= c, r * (1 - (r/c)**2)**2, 0)
    # Approximate E[psi'] for standard normal
    E_psi_prime = 0.78  # approximate for c=4.685
    return psi / E_psi_prime
 
# Visualization
x = np.linspace(-8, 8, 1000)
 
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Influence functions
ax1 = axes[0]
ax1.plot(x, influence_mean(x), 'b--', label='Mean (unbounded)', linewidth=2)
ax1.plot(x, influence_median(x), 'g-', label='Median (bounded, constant)', linewidth=2)
ax1.plot(x, influence_huber(x), 'orange', label='Huber (bounded)', linewidth=2)
ax1.plot(x, influence_tukey(x), 'r-', label='Tukey (redescending)', linewidth=2)
ax1.axhline(y=0, color='gray', linestyle='-', alpha=0.3)
ax1.set_xlabel('Observation Value (x)')
ax1.set_ylabel('Influence Function IF(x)')
ax1.set_title('Influence Functions of Location Estimators')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_xlim(-8, 8)
ax1.set_ylim(-6, 6)
 
# Squared influence (related to variance contribution)
ax2 = axes[1]
ax2.plot(x, influence_mean(x)**2, 'b--', label='Mean', linewidth=2)
ax2.plot(x, influence_median(x)**2, 'g-', label='Median', linewidth=2)
ax2.plot(x, influence_huber(x)**2, 'orange', label='Huber', linewidth=2)
ax2.plot(x, influence_tukey(x)**2, 'r-', label='Tukey', linewidth=2)
ax2.set_xlabel('Observation Value (x)')
ax2.set_ylabel('IF(x)²')
ax2.set_title('Squared Influence (Variance Contribution)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_xlim(-8, 8)
ax2.set_ylim(0, 20)
 
plt.tight_layout()
plt.show()
 
# Key metrics
print("Gross-Error Sensitivity (GES) = sup|IF(x)|:")
print(f"  Mean: UNBOUNDED")
print(f"  Median: {influence_median(10):.3f}")
print(f"  Huber (k=1.345): {influence_huber(10):.3f}")
print(f"  Tukey (c=4.685): {max(abs(influence_tukey(x))):.3f} (then → 0)")

Gross-Error Sensitivity and Related Measures

The influence function at each point tells us local sensitivity. We can summarize this into global measures:

Gross-Error Sensitivity (GES):

$$\gamma^* = \sup_x |\text{IF}(x; T, F)|$$

The maximum possible influence of any single observation. For robust estimators, we want $\gamma^* < \infty$.

Local-Shift Sensitivity (LSS):

$$\lambda^* = \sup_x \left|\frac{d \text{IF}(x; T, F)}{dx}\right|$$

Measures sensitivity to small changes in observation locations. Related to sensitivity to rounding and grouping.

Rejection Point:

$$\rho^* = \inf{x > 0 : \text{IF}(x; T, F) = 0 \text{ for all } |x| > x}$$

For redescending estimators, this is where outliers are completely rejected.

Sensitivity Measures for Common Estimators (at Gaussian F)
Estimator	GES (γ*)	Bounded?	Rejection Point (ρ*)
Mean	∞	No	None
Median	1.25	Yes	None
Huber (k=1.345)	1.70	Yes	None
Tukey (c=4.685)	0.70	Yes	4.685
Hampel	≈1.7	Yes	8.5 (default)

The Tukey Advantage

Tukey's bisquare has lower GES than Huber (0.70 vs 1.70) AND has a finite rejection point—extreme outliers have exactly zero influence. This comes at the cost of non-convexity and potential multiple local minima.

Connection to Asymptotic Variance and Efficiency

The influence function connects directly to the asymptotic variance of an estimator, providing a unified framework for understanding efficiency.

The Fundamental Result:

Under regularity conditions, for an estimator $T$ with influence function IF:

$$\sqrt{n}(T(\hat{F}_n) - T(F)) \xrightarrow{d} \mathcal{N}(0, V)$$

where the asymptotic variance is:

$$V = \mathbb{E}[\text{IF}(X; T, F)^2] = \int \text{IF}(x; T, F)^2 , dF(x)$$

Interpretation:

The variance of an estimator equals the expected squared influence
Large influence → large variance
Outlier-prone distributions inflate variance for non-robust estimators

Efficiency Calculation:

The asymptotic relative efficiency (ARE) of estimator $T_1$ versus $T_2$ is:

$$\text{ARE}(T_1, T_2) = \frac{V_{T_2}}{V_{T_1}} = \frac{\mathbb{E}[\text{IF}_2^2]}{\mathbb{E}[\text{IF}_1^2]}$$

For location estimators under Gaussian $F$:

Mean: $V = 1$ (the benchmark)
Median: $V = \pi/2 \approx 1.57$ → ARE = 64%
Huber (k=1.345): $V \approx 1.05$ → ARE = 95%

The Robustness-Efficiency Tradeoff in Formulas

Bounded influence (small GES) tends to increase variance. The median has excellent robustness (bounded influence) but pays with ~36% efficiency loss. Huber cleverly balances—bounded influence with only ~5% efficiency loss.

Regression Influence Diagnostics

In regression, influence analysis has been developed into practical diagnostic tools. These help identify which observations most affect the fitted model.

Leverage (Hat Matrix Diagonal):

The leverage of observation $i$ is:

$$h_{ii} = \mathbf{x}_i^\top (\mathbf{X}^\top\mathbf{X})^{-1} \mathbf{x}_i$$

Leverage measures how extreme observation $i$ is in X-space. High leverage points can be influential but aren't necessarily problematic.

Studentized Residuals:

$$t_i = \frac{r_i}{\hat{\sigma}{-i}\sqrt{1 - h{ii}}}$$

where $\hat{\sigma}_{-i}$ is the standard error estimated without observation $i$. Large $|t_i|$ indicates potential outliers in Y-space.

Cook's Distance:

$$D_i = \frac{(\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}{-i})^\top(\mathbf{X}^\top\mathbf{X})(\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}{-i})}{p \cdot \hat{\sigma}^2}$$

Cook's distance measures the overall change in fitted values when observation $i$ is removed. It combines leverage and residual size.

DFFITS:

$$\text{DFFITS}i = \frac{\hat{y}i - \hat{y}{i,-i}}{\hat{\sigma}{-i}\sqrt{h_{ii}}}$$

Standardized change in the predicted value for observation $i$ when that observation is omitted.

regression_influence.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt
 
def regression_influence_diagnostics(X, y):
    """
    Compute comprehensive influence diagnostics for OLS regression.
    
    Returns:
    --------
    diagnostics : dict
        Dictionary containing leverage, residuals, Cook's D, DFFITS
    """
    n, p = X.shape
    
    # OLS fit
    beta = linalg.lstsq(X, y)[0]
    y_hat = X @ beta
    residuals = y - y_hat
    
    # Hat matrix and leverage
    XtX_inv = linalg.inv(X.T @ X)
    H = X @ XtX_inv @ X.T
    leverage = np.diag(H)
    
    # MSE and standard error
    mse = np.sum(residuals**2) / (n - p)
    sigma = np.sqrt(mse)
    
    # Standardized residuals
    std_residuals = residuals / (sigma * np.sqrt(1 - leverage))
    
    # Leave-one-out quantities (using Sherman-Morrison-Woodbury)
    sigma_loo = np.zeros(n)
    for i in range(n):
        r_loo = residuals[i] / (1 - leverage[i])
        sse_loo = np.sum(residuals**2) - residuals[i]**2 / (1 - leverage[i])
        sigma_loo[i] = np.sqrt(sse_loo / (n - p - 1))
    
    # Studentized residuals (external)
    studentized = residuals / (sigma_loo * np.sqrt(1 - leverage))
    
    # Cook's distance
    cooks_d = (std_residuals**2 * leverage) / (p * (1 - leverage))
    
    # DFFITS
    dffits = studentized * np.sqrt(leverage / (1 - leverage))
    
    return {
        'leverage': leverage,
        'std_residuals': std_residuals,
        'studentized': studentized,
        'cooks_d': cooks_d,
        'dffits': dffits,
        'coefficients': beta,
        'y_hat': y_hat,
        'residuals': residuals,
    }
 
# Example with a high-influence point
np.random.seed(42)
n = 50
 
# Clean data
X_clean = np.random.randn(n-1, 1)
y_clean = 2 + 3 * X_clean.flatten() + 0.5 * np.random.randn(n-1)
 
# Add one high-leverage, high-influence point
X_outlier = np.array([[5.0]])  # Far in X-space
y_outlier = np.array([2 + 3*5 + 10])  # Also off the line
 
X = np.vstack([X_clean, X_outlier])
y = np.concatenate([y_clean, y_outlier])
X_design = np.column_stack([np.ones(n), X])
 
# Compute diagnostics
diag = regression_influence_diagnostics(X_design, y)
 
# Find the outlier
outlier_idx = n - 1
 
print("=== Influence Diagnostics ===")
print(f"
Observation {outlier_idx} (the outlier):")
print(f"  Leverage: {diag['leverage'][outlier_idx]:.4f} (avg: {np.mean(diag['leverage']):.4f})")
print(f"  Studentized residual: {diag['studentized'][outlier_idx]:.4f}")
print(f"  Cook's distance: {diag['cooks_d'][outlier_idx]:.4f}")
print(f"  DFFITS: {diag['dffits'][outlier_idx]:.4f}")
 
# Thresholds
print(f"
=== Diagnostic Thresholds ===")
print(f"High leverage threshold (2p/n): {2*2/n:.4f}")
print(f"Cook's D threshold (4/n): {4/n:.4f}")
print(f"DFFITS threshold (2*sqrt(p/n)): {2*np.sqrt(2/n):.4f}")
 
# Count problematic observations
high_leverage = np.sum(diag['leverage'] > 2*2/n)
high_cooks = np.sum(diag['cooks_d'] > 4/n)
high_dffits = np.sum(np.abs(diag['dffits']) > 2*np.sqrt(2/n))
 
print(f"
=== Number of Flagged Observations ===")
print(f"High leverage: {high_leverage}")
print(f"High Cook's D: {high_cooks}")
print(f"High |DFFITS|: {high_dffits}")

Practical Guidelines for Using Influence Diagnostics

Influence Analysis Workflow

•Step 1: Fit the model — Start with OLS or your preferred method
•Step 2: Compute diagnostics — Calculate leverage, studentized residuals, Cook's D, DFFITS
•Step 3: Visualize — Create influence plots (leverage vs residual, Cook's D vs index)
•Step 4: Identify flagged observations — Use standard thresholds as guidelines, not hard rules
•Step 5: Investigate flagged points — Are they data errors? Genuine unusual cases? Model misspecification?
•Step 6: Sensitivity analysis — Refit without flagged points; compare results
•Step 7: Decide — Keep, remove, or use robust methods based on domain knowledge

Common Thresholds for Influence Diagnostics
Diagnostic	Threshold	Interpretation
Leverage	$h_{ii} > 2p/n$ or $3p/n$	High leverage in X-space
Studentized residual	$\|t_i\| > 2$ or $3$	Potential outlier in Y-space
Cook's D	$D_i > 4/n$ or $D_i > 1$	High overall influence
DFFITS	$\|DFFITS_i\| > 2\sqrt{p/n}$	High influence on own prediction
DFBETAS	$\|DFBETAS_{ij}\| > 2/\sqrt{n}$	High influence on coefficient j

Don't Blindly Delete Influential Points

High influence doesn't mean 'wrong.' An influential point might be the most informative observation in your dataset—or a critical data error. Always investigate before removing. Consider robust methods that automatically down-weight without requiring manual decisions.

Summary: Influence Functions as Robustness Analysis

Key Takeaways

•Influence function defined: Derivative of estimator with respect to distribution; measures infinitesimal sensitivity
•Bounded influence: Robust estimators have bounded IF; mean has unbounded, median is bounded
•Gross-error sensitivity: Maximum absolute influence; key summary measure
•Connection to variance: Asymptotic variance = E[IF²]; links robustness to efficiency
•Regression diagnostics: Leverage, Cook's D, DFFITS are practical influence tools
•Workflow: Compute diagnostics → visualize → investigate → decide (keep, remove, or use robust method)

Module Complete:

You have now completed the Robust Regression module. You understand:

Huber loss as the foundational robust loss function
M-estimators as the general framework for robust estimation
RANSAC for explicit outlier detection and removal
Breakdown point as the measure of catastrophic robustness
Influence functions as the measure of local/infinitesimal robustness

Together, these tools equip you to build regression models that work reliably on real-world data—data that is messy, contains errors, and violates the idealized assumptions of classical statistics.

Module Complete

Congratulations! You have mastered robust regression—a critical skill for any practitioner working with real-world data. You can now choose appropriate robust methods, understand their theoretical guarantees, and diagnose potential problems in regression models.

Influence Functions: Understanding Local Robustness

The Infinitesimal View of Robustness

Think of it this way:

Breakdown point: "How much dynamite can this bridge handle before collapsing?"
Influence function: "How does the bridge flex when I add weight at each point?"

Both perspectives are essential for understanding robustness:

An estimator with bounded influence cannot be drastically pulled by any single observation
An estimator with high influence at extreme values is sensitive to outliers, even if it doesn't break down

The influence function also forms the foundation of robust statistics theory. It connects to:

Asymptotic variance (efficiency)
Sensitivity curves (finite-sample analog)
The ψ-function in M-estimation
Standard errors under model misspecification

This page develops the complete theory of influence functions, from definition to computation to practical applications in regression diagnostics.

What You Will Learn

The Formal Definition

Setup:

Let $T(F)$ be a statistical functional—an estimator viewed as a function of the underlying distribution $F$. For example:

Mean: $T_{\text{mean}}(F) = \int x , dF(x)$
Median: $T_{\text{med}}(F) = F^{-1}(0.5)$
Regression: $T_{\text{OLS}}(F_{XY})$ = coefficient minimizing expected squared error

The Influence Function:

The influence function of $T$ at distribution $F$ is:

$$\text{IF}(x; T, F) = \lim_{\varepsilon \to 0} \frac{T((1-\varepsilon)F + \varepsilon \delta_x) - T(F)}{\varepsilon}$$

where $\delta_x$ is the point mass distribution at $x$.

Interpretation:

$(1-\varepsilon)F + \varepsilon \delta_x$ is the "contaminated" distribution: fraction $(1-\varepsilon)$ from $F$, fraction $\varepsilon$ from a single point at $x$
As $\varepsilon \to 0$, we're adding an infinitesimal mass at $x$
IF$(x; T, F)$ measures the rate of change in the estimate when we add a tiny bit of probability at $x$

Properties of a "good" influence function:

A robust estimator should have an influence function that is:

Bounded: $\sup_x |\text{IF}(x; T, F)| < \infty$ (no single point can have unbounded influence)
Continuous: Small changes in observation location cause small changes in influence
Low maximum: The maximum influence (gross-error sensitivity) should be small

The Derivative Intuition

Influence Functions of Key Estimators

1. Sample Mean

The influence function of the mean at distribution $F$ with mean $\mu$:

$$\text{IF}(x; T_{\text{mean}}, F) = x - \mu$$

Interpretation: The influence is linear in $x$. Points far from the mean have proportionally larger influence. This is unbounded—extreme observations have unlimited influence.

2. Sample Median

For a distribution with density $f$ and median $m$:

$$\text{IF}(x; T_{\text{med}}, F) = \frac{\text{sign}(x - m)}{2f(m)}$$

Interpretation: The influence is bounded! It's either $+1/(2f(m))$ or $-1/(2f(m))$ regardless of how extreme $x$ is. The median has bounded influence.

3. Huber M-estimator

For the Huber estimator with threshold $k$:

$$\text{IF}(x; T_{\text{Huber}}, F) = \psi_k(x - \mu) / \mathbb{E}[\psi'_k(X - \mu)]$$

where $\psi_k$ is the Huber psi-function.

Interpretation: Bounded at $\pm k / \mathbb{E}[\psi'_k]$. The influence is linear near the center and capped for extreme values.

influence_functions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
 
def influence_mean(x, mu=0):
    """Influence function of the mean"""
    return x - mu
 
def influence_median(x, m=0, f_m=None):
    """
    Influence function of the median.
    f_m is the density at the median.
    """
    if f_m is None:
        # For standard normal, f(0) = 1/sqrt(2*pi) ≈ 0.399
        f_m = 1 / np.sqrt(2 * np.pi)
    return np.sign(x - m) / (2 * f_m)
 
def influence_huber(x, mu=0, k=1.345):
    """Influence function of Huber M-estimator"""
    r = x - mu
    psi = np.where(np.abs(r) <= k, r, k * np.sign(r))
    # E[psi'] for standard normal with k=1.345 is approximately 0.79
    E_psi_prime = 2 * stats.norm.cdf(k) - 1
    return psi / E_psi_prime
 
def influence_tukey(x, mu=0, c=4.685):
    """Influence function of Tukey's bisquare (redescending)"""
    r = x - mu
    psi = np.where(np.abs(r) <= c, r * (1 - (r/c)**2)**2, 0)
    # Approximate E[psi'] for standard normal
    E_psi_prime = 0.78  # approximate for c=4.685
    return psi / E_psi_prime
 
# Visualization
x = np.linspace(-8, 8, 1000)
 
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Influence functions
ax1 = axes[0]
ax1.plot(x, influence_mean(x), 'b--', label='Mean (unbounded)', linewidth=2)
ax1.plot(x, influence_median(x), 'g-', label='Median (bounded, constant)', linewidth=2)
ax1.plot(x, influence_huber(x), 'orange', label='Huber (bounded)', linewidth=2)
ax1.plot(x, influence_tukey(x), 'r-', label='Tukey (redescending)', linewidth=2)
ax1.axhline(y=0, color='gray', linestyle='-', alpha=0.3)
ax1.set_xlabel('Observation Value (x)')
ax1.set_ylabel('Influence Function IF(x)')
ax1.set_title('Influence Functions of Location Estimators')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_xlim(-8, 8)
ax1.set_ylim(-6, 6)
 
# Squared influence (related to variance contribution)
ax2 = axes[1]
ax2.plot(x, influence_mean(x)**2, 'b--', label='Mean', linewidth=2)
ax2.plot(x, influence_median(x)**2, 'g-', label='Median', linewidth=2)
ax2.plot(x, influence_huber(x)**2, 'orange', label='Huber', linewidth=2)
ax2.plot(x, influence_tukey(x)**2, 'r-', label='Tukey', linewidth=2)
ax2.set_xlabel('Observation Value (x)')
ax2.set_ylabel('IF(x)²')
ax2.set_title('Squared Influence (Variance Contribution)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_xlim(-8, 8)
ax2.set_ylim(0, 20)
 
plt.tight_layout()
plt.show()
 
# Key metrics
print("Gross-Error Sensitivity (GES) = sup|IF(x)|:")
print(f"  Mean: UNBOUNDED")
print(f"  Median: {influence_median(10):.3f}")
print(f"  Huber (k=1.345): {influence_huber(10):.3f}")
print(f"  Tukey (c=4.685): {max(abs(influence_tukey(x))):.3f} (then → 0)")

Gross-Error Sensitivity and Related Measures

The influence function at each point tells us local sensitivity. We can summarize this into global measures:

Gross-Error Sensitivity (GES):

$$\gamma^* = \sup_x |\text{IF}(x; T, F)|$$

The maximum possible influence of any single observation. For robust estimators, we want $\gamma^* < \infty$.

Local-Shift Sensitivity (LSS):

$$\lambda^* = \sup_x \left|\frac{d \text{IF}(x; T, F)}{dx}\right|$$

Measures sensitivity to small changes in observation locations. Related to sensitivity to rounding and grouping.

Rejection Point:

$$\rho^* = \inf{x > 0 : \text{IF}(x; T, F) = 0 \text{ for all } |x| > x}$$

For redescending estimators, this is where outliers are completely rejected.

Sensitivity Measures for Common Estimators (at Gaussian F)
Estimator	GES (γ*)	Bounded?	Rejection Point (ρ*)
Mean	∞	No	None
Median	1.25	Yes	None
Huber (k=1.345)	1.70	Yes	None
Tukey (c=4.685)	0.70	Yes	4.685
Hampel	≈1.7	Yes	8.5 (default)

The Tukey Advantage

Connection to Asymptotic Variance and Efficiency

The influence function connects directly to the asymptotic variance of an estimator, providing a unified framework for understanding efficiency.

The Fundamental Result:

Under regularity conditions, for an estimator $T$ with influence function IF:

$$\sqrt{n}(T(\hat{F}_n) - T(F)) \xrightarrow{d} \mathcal{N}(0, V)$$

where the asymptotic variance is:

$$V = \mathbb{E}[\text{IF}(X; T, F)^2] = \int \text{IF}(x; T, F)^2 , dF(x)$$

Interpretation:

The variance of an estimator equals the expected squared influence
Large influence → large variance
Outlier-prone distributions inflate variance for non-robust estimators

Efficiency Calculation:

The asymptotic relative efficiency (ARE) of estimator $T_1$ versus $T_2$ is:

$$\text{ARE}(T_1, T_2) = \frac{V_{T_2}}{V_{T_1}} = \frac{\mathbb{E}[\text{IF}_2^2]}{\mathbb{E}[\text{IF}_1^2]}$$

For location estimators under Gaussian $F$:

Mean: $V = 1$ (the benchmark)
Median: $V = \pi/2 \approx 1.57$ → ARE = 64%
Huber (k=1.345): $V \approx 1.05$ → ARE = 95%

The Robustness-Efficiency Tradeoff in Formulas

Regression Influence Diagnostics

In regression, influence analysis has been developed into practical diagnostic tools. These help identify which observations most affect the fitted model.

Leverage (Hat Matrix Diagonal):

The leverage of observation $i$ is:

$$h_{ii} = \mathbf{x}_i^\top (\mathbf{X}^\top\mathbf{X})^{-1} \mathbf{x}_i$$

Leverage measures how extreme observation $i$ is in X-space. High leverage points can be influential but aren't necessarily problematic.

Studentized Residuals:

$$t_i = \frac{r_i}{\hat{\sigma}{-i}\sqrt{1 - h{ii}}}$$

where $\hat{\sigma}_{-i}$ is the standard error estimated without observation $i$. Large $|t_i|$ indicates potential outliers in Y-space.

Cook's Distance:

$$D_i = \frac{(\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}{-i})^\top(\mathbf{X}^\top\mathbf{X})(\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}{-i})}{p \cdot \hat{\sigma}^2}$$

Cook's distance measures the overall change in fitted values when observation $i$ is removed. It combines leverage and residual size.

DFFITS:

$$\text{DFFITS}i = \frac{\hat{y}i - \hat{y}{i,-i}}{\hat{\sigma}{-i}\sqrt{h_{ii}}}$$

Standardized change in the predicted value for observation $i$ when that observation is omitted.

regression_influence.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt
 
def regression_influence_diagnostics(X, y):
    """
    Compute comprehensive influence diagnostics for OLS regression.
    
    Returns:
    --------
    diagnostics : dict
        Dictionary containing leverage, residuals, Cook's D, DFFITS
    """
    n, p = X.shape
    
    # OLS fit
    beta = linalg.lstsq(X, y)[0]
    y_hat = X @ beta
    residuals = y - y_hat
    
    # Hat matrix and leverage
    XtX_inv = linalg.inv(X.T @ X)
    H = X @ XtX_inv @ X.T
    leverage = np.diag(H)
    
    # MSE and standard error
    mse = np.sum(residuals**2) / (n - p)
    sigma = np.sqrt(mse)
    
    # Standardized residuals
    std_residuals = residuals / (sigma * np.sqrt(1 - leverage))
    
    # Leave-one-out quantities (using Sherman-Morrison-Woodbury)
    sigma_loo = np.zeros(n)
    for i in range(n):
        r_loo = residuals[i] / (1 - leverage[i])
        sse_loo = np.sum(residuals**2) - residuals[i]**2 / (1 - leverage[i])
        sigma_loo[i] = np.sqrt(sse_loo / (n - p - 1))
    
    # Studentized residuals (external)
    studentized = residuals / (sigma_loo * np.sqrt(1 - leverage))
    
    # Cook's distance
    cooks_d = (std_residuals**2 * leverage) / (p * (1 - leverage))
    
    # DFFITS
    dffits = studentized * np.sqrt(leverage / (1 - leverage))
    
    return {
        'leverage': leverage,
        'std_residuals': std_residuals,
        'studentized': studentized,
        'cooks_d': cooks_d,
        'dffits': dffits,
        'coefficients': beta,
        'y_hat': y_hat,
        'residuals': residuals,
    }
 
# Example with a high-influence point
np.random.seed(42)
n = 50
 
# Clean data
X_clean = np.random.randn(n-1, 1)
y_clean = 2 + 3 * X_clean.flatten() + 0.5 * np.random.randn(n-1)
 
# Add one high-leverage, high-influence point
X_outlier = np.array([[5.0]])  # Far in X-space
y_outlier = np.array([2 + 3*5 + 10])  # Also off the line
 
X = np.vstack([X_clean, X_outlier])
y = np.concatenate([y_clean, y_outlier])
X_design = np.column_stack([np.ones(n), X])
 
# Compute diagnostics
diag = regression_influence_diagnostics(X_design, y)
 
# Find the outlier
outlier_idx = n - 1
 
print("=== Influence Diagnostics ===")
print(f"
Observation {outlier_idx} (the outlier):")
print(f"  Leverage: {diag['leverage'][outlier_idx]:.4f} (avg: {np.mean(diag['leverage']):.4f})")
print(f"  Studentized residual: {diag['studentized'][outlier_idx]:.4f}")
print(f"  Cook's distance: {diag['cooks_d'][outlier_idx]:.4f}")
print(f"  DFFITS: {diag['dffits'][outlier_idx]:.4f}")
 
# Thresholds
print(f"
=== Diagnostic Thresholds ===")
print(f"High leverage threshold (2p/n): {2*2/n:.4f}")
print(f"Cook's D threshold (4/n): {4/n:.4f}")
print(f"DFFITS threshold (2*sqrt(p/n)): {2*np.sqrt(2/n):.4f}")
 
# Count problematic observations
high_leverage = np.sum(diag['leverage'] > 2*2/n)
high_cooks = np.sum(diag['cooks_d'] > 4/n)
high_dffits = np.sum(np.abs(diag['dffits']) > 2*np.sqrt(2/n))
 
print(f"
=== Number of Flagged Observations ===")
print(f"High leverage: {high_leverage}")
print(f"High Cook's D: {high_cooks}")
print(f"High |DFFITS|: {high_dffits}")

Practical Guidelines for Using Influence Diagnostics

Influence Analysis Workflow

•Step 1: Fit the model — Start with OLS or your preferred method
•Step 2: Compute diagnostics — Calculate leverage, studentized residuals, Cook's D, DFFITS
•Step 3: Visualize — Create influence plots (leverage vs residual, Cook's D vs index)
•Step 4: Identify flagged observations — Use standard thresholds as guidelines, not hard rules
•Step 5: Investigate flagged points — Are they data errors? Genuine unusual cases? Model misspecification?
•Step 6: Sensitivity analysis — Refit without flagged points; compare results
•Step 7: Decide — Keep, remove, or use robust methods based on domain knowledge

Common Thresholds for Influence Diagnostics
Diagnostic	Threshold	Interpretation
Leverage	$h_{ii} > 2p/n$ or $3p/n$	High leverage in X-space
Studentized residual	$\|t_i\| > 2$ or $3$	Potential outlier in Y-space
Cook's D	$D_i > 4/n$ or $D_i > 1$	High overall influence
DFFITS	$\|DFFITS_i\| > 2\sqrt{p/n}$	High influence on own prediction
DFBETAS	$\|DFBETAS_{ij}\| > 2/\sqrt{n}$	High influence on coefficient j

Don't Blindly Delete Influential Points

Summary: Influence Functions as Robustness Analysis

Key Takeaways

•Influence function defined: Derivative of estimator with respect to distribution; measures infinitesimal sensitivity
•Bounded influence: Robust estimators have bounded IF; mean has unbounded, median is bounded
•Gross-error sensitivity: Maximum absolute influence; key summary measure
•Connection to variance: Asymptotic variance = E[IF²]; links robustness to efficiency
•Regression diagnostics: Leverage, Cook's D, DFFITS are practical influence tools
•Workflow: Compute diagnostics → visualize → investigate → decide (keep, remove, or use robust method)

Module Complete:

You have now completed the Robust Regression module. You understand:

Huber loss as the foundational robust loss function
M-estimators as the general framework for robust estimation
RANSAC for explicit outlier detection and removal
Breakdown point as the measure of catastrophic robustness
Influence functions as the measure of local/infinitesimal robustness

Together, these tools equip you to build regression models that work reliably on real-world data—data that is messy, contains errors, and violates the idealized assumptions of classical statistics.

Module Complete