Robust Regression - Learning Module

Loading content...

0/245

Breakdown Point: The Ultimate Measure of Robustness

How Robust is Robust?

We've seen that different estimators have different sensitivities to outliers. Huber regression is more robust than OLS. Tukey's bisquare is more robust than Huber. RANSAC can handle extreme contamination. But how do we quantify these claims? How do we rigorously compare the robustness of different methods?

Enter the breakdown point—the single most important concept in robust statistics. Introduced by Frank Hampel in his 1968 dissertation and further developed by Peter Rousseeuw, the breakdown point answers a fundamental question:

What fraction of the data can be arbitrarily corrupted before the estimator becomes completely unreliable?

This seemingly simple question has profound implications. An estimator with breakdown point 50% can withstand nearly half the data being replaced with arbitrary values—even adversarially chosen values designed to maximize damage. An estimator with breakdown point 0% (like the sample mean or OLS) can be completely corrupted by a single observation.

The breakdown point provides an absolute limit on robustness. No matter how extreme the outliers, an estimator cannot break down unless the contamination fraction exceeds its breakdown point. This mathematical guarantee is invaluable when dealing with real-world data where you don't know how bad the contamination might be.

This page develops the complete theory of breakdown points, from the formal definition to calculations for specific estimators to practical implications for choosing robust methods.

What You Will Learn

By the end of this page, you will understand the formal definition of breakdown point (finite sample and asymptotic), how to compute breakdown points for common estimators, why 50% is the maximum achievable, and how breakdown point relates to other robustness measures.

The Formal Definition

Finite-Sample Breakdown Point:

Consider an estimator $T$ applied to a sample $\mathbf{Z} = {z_1, z_2, \ldots, z_n}$. The finite-sample breakdown point $\varepsilon^*_n(T, \mathbf{Z})$ is defined as:

$$\varepsilon^*n(T, \mathbf{Z}) = \frac{1}{n} \max\left{m : \sup{\mathbf{Z}'_m} |T(\mathbf{Z}'_m) - T(\mathbf{Z})| < \infty \right}$$

In words: the maximum fraction of observations that can be replaced with arbitrary values while the estimator remains bounded.

Unpacking the definition:

$\mathbf{Z}'_m$: A corrupted sample where $m$ of the original $n$ observations are replaced with arbitrary (possibly infinite) values
$\sup_{\mathbf{Z}'_m}$: The supremum over all possible replacements (worst case)
$|T(\mathbf{Z}'_m) - T(\mathbf{Z})| < \infty$: The estimate remains finite (doesn't explode)

The breakdown point is the largest $m/n$ for which no adversarial corruption can make the estimator explode.

Asymptotic Breakdown Point:

As $n \to \infty$, the finite-sample breakdown point converges to the asymptotic breakdown point:

$$\varepsilon^* = \lim_{n \to \infty} \varepsilon^*_n(T, \mathbf{Z})$$

For most estimators, this limit exists and is independent of the original sample $\mathbf{Z}$.

The Adversarial Perspective

Breakdown point assumes an adversary who knows your estimator and places outliers to cause maximum damage. This is the worst-case scenario. In practice, outliers are usually 'accidental' rather than adversarial, so estimators often perform better than their breakdown point suggests.

Breakdown Points of Common Estimators

Let's compute breakdown points for estimators we've encountered:

1. Sample Mean (Arithmetic Average)

For the sample mean $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$:

Consider replacing just one observation $x_j$ with $x_j' \to \infty$: $$\bar{x}' = \bar{x} + \frac{x_j' - x_j}{n} \to \infty$$

A single corrupted point makes the estimate unbounded.

$$\varepsilon^*_{\text{mean}} = \frac{1}{n} \xrightarrow{n \to \infty} 0$$

The sample mean has 0% asymptotic breakdown point!

2. Sample Median

The median is the middle value when data is sorted. To move the median arbitrarily:

Replace the median and all values above it (>50% of data)
Or replace the median and all values below it (>50% of data)

With less than 50% corruption, the median remains bounded by the uncorrupted data.

$$\varepsilon^*_{\text{median}} = \frac{\lfloor n/2 \rfloor + 1}{n} \xrightarrow{n \to \infty} 0.5$$

The sample median has 50% asymptotic breakdown point!

3. Ordinary Least Squares (OLS)

OLS regression is essentially a conditional mean. A single high-leverage outlier can make coefficients unbounded.

$$\varepsilon^*_{\text{OLS}} = \frac{1}{n} \xrightarrow{n \to \infty} 0$$

OLS has 0% breakdown point—identical to the mean.

Breakdown Points of Common Estimators
Estimator	Breakdown Point	Comments
Sample Mean	1/n → 0%	Completely non-robust
Sample Median	⌊n/2⌋/n → 50%	Maximum achievable for location
OLS Regression	1/n → 0%	Completely non-robust
L1 Regression (LAD)	1/(p+1) (approx)	Low, depends on design matrix
Huber M-estimator	1/n → 0%	Bounded influence ≠ high breakdown
Least Median of Squares (LMS)	⌊(n-p)/2⌋/n → 50%	High breakdown, low efficiency
S-estimators	~50%	High breakdown with tunable efficiency
MM-estimators	~50%	High breakdown, high efficiency

Huber's Paradox

The Huber M-estimator has bounded influence (outliers are down-weighted) but 0% breakdown point. Why? A single point with extreme X (leverage) and moderate Y can still corrupt the fit. Bounded influence is about gradual corruption; breakdown is about catastrophic failure.

Why 50% is the Maximum

It's natural to ask: can we design an estimator with breakdown point higher than 50%? The answer is no—and for a fundamental reason.

The Impossibility Argument:

Consider any sample $\mathbf{Z} = {z_1, \ldots, z_n}$ and any estimator $T$.

Suppose more than 50% of the data can be corrupted—say, we can replace $m > n/2$ observations.

Scenario 1: Replace those $m$ observations with copies of $z_1$. Scenario 2: Replace those $m$ observations with copies of $z_n$ (something very different).

Both resulting samples have the same number of corrupted points, so if the estimator survives one, it must survive the other.

But the two scenarios create completely different data distributions:

Scenario 1 has >50% of data near $z_1$
Scenario 2 has >50% of data near $z_n$

A sensible estimator must recognize these as different situations and produce different estimates. But then the adversary can choose which scenario to create, making the change in the estimate arbitrarily large.

The Formal Statement:

For any location or scale estimator, the maximum achievable breakdown point is:

$$\varepsilon^* \leq \frac{\lfloor n/2 \rfloor + 1}{n} \xrightarrow{n \to \infty} 0.5$$

This is called the 50% breakdown point barrier.

The Majority Rule

The 50% barrier has an intuitive interpretation: with >50% outliers, the outliers become the 'majority.' Any estimator must follow the majority, otherwise the adversary can switch which group is labeled 'outliers.' The best we can do is tie-breaking at exactly 50%.

High Breakdown Regression Estimators

Achieving the 50% breakdown point in regression is harder than for location estimation. The design matrix X introduces complications—outliers can be extreme in X-space (leverage points), Y-space (vertical outliers), or both.

Least Median of Squares (LMS)

Proposed by Rousseeuw (1984), LMS minimizes the median of squared residuals:

$$\hat{\boldsymbol{\beta}}{\text{LMS}} = \arg\min{\boldsymbol{\beta}} \text{median}_i(r_i^2)$$

Properties:

Breakdown point: $\varepsilon^* \approx 50%$ (achieves the barrier!)
Very robust to outliers in both X and Y
But: only $n^{-1/3}$ convergence rate (inefficient)
Computationally expensive (NP-hard exactly, approximations used)

Least Trimmed Squares (LTS)

Also by Rousseeuw, LTS minimizes the sum of the $h$ smallest squared residuals:

$$\hat{\boldsymbol{\beta}}{\text{LTS}} = \arg\min{\boldsymbol{\beta}} \sum_{i=1}^{h} r_{(i)}^2$$

where $r_{(i)}^2$ are the order statistics and $h \approx n/2$ for 50% breakdown.

Properties:

Breakdown point: $(n - h + 1)/n \approx 50%$ for $h \approx n/2$
More efficient than LMS ($n^{-1/2}$ rate)
Still requires global optimization (expensive)

S-Estimators

Minimize a robust scale of residuals:

$$\hat{\boldsymbol{\beta}}{S} = \arg\min{\boldsymbol{\beta}} \hat{\sigma}(r_1, \ldots, r_n)$$

where $\hat{\sigma}$ is an M-estimate of scale satisfying:

$$\frac{1}{n}\sum_{i=1}^n \rho\left(\frac{r_i}{\hat{\sigma}}\right) = K$$

for a bounded ρ-function and appropriately chosen constant $K$.

Properties:

Breakdown point: up to 50% with appropriate ρ
Better efficiency than LMS/LTS
Foundation for MM-estimators

The Efficiency-Robustness Tradeoff

High-breakdown estimators like LMS and LTS achieve 50% breakdown but have low efficiency under Gaussian errors (~30-40%). This created demand for estimators with both high breakdown AND high efficiency—leading to MM-estimators.

MM-Estimators: The Best of Both Worlds

MM-estimators (introduced by Yohai, 1987) achieve the seemingly impossible: 50% breakdown point AND 95% efficiency under Gaussian errors.

The Two-Stage Approach:

Stage 1: High-Breakdown Initial Estimate Compute an S-estimate $\hat{\boldsymbol{\beta}}_S$ using a bounded ρ-function (e.g., bisquare with c yielding 50% breakdown). This gives:

50% breakdown point
Low efficiency (~30%)
A robust scale estimate $\hat{\sigma}_S$

Stage 2: Efficient Local Refinement Starting from $\hat{\boldsymbol{\beta}}_S$ and using the scale $\hat{\sigma}_S$ (held fixed), solve:

$$\hat{\boldsymbol{\beta}}{MM} = \arg\min{\boldsymbol{\beta}} \sum_{i=1}^n \rho^*\left(\frac{y_i - \mathbf{x}_i^\top \boldsymbol{\beta}}{\hat{\sigma}_S}\right)$$

where $\rho^*$ is a second (typically less bounded) function tuned for 95% efficiency.

Key Insight: The first-stage provides a "safe" starting point. The second stage only searches locally, so it can use a more efficient (less robust) criterion without risk of converging to an outlier-induced solution.

mm_estimator.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import numpy as np
from scipy import linalg
from scipy.optimize import minimize
 
def rho_bisquare(u, c=4.685):
    """Tukey's bisquare rho function"""
    result = np.where(
        np.abs(u) <= c,
        (c**2 / 6) * (1 - (1 - (u/c)**2)**3),
        c**2 / 6
    )
    return result
 
def psi_bisquare(u, c=4.685):
    """Derivative of bisquare rho"""
    return np.where(np.abs(u) <= c, u * (1 - (u/c)**2)**2, 0)
 
def s_estimator(X, y, c_s=1.5476, max_iter=100, tol=1e-6):
    """
    S-estimator for high breakdown regression.
    c_s = 1.5476 gives 50% breakdown point with bisquare.
    """
    n, p = X.shape
    
    # Initial estimate via least trimmed squares approximation
    # (simplified: use LAD as starting point)
    beta = linalg.lstsq(X, y)[0]
    
    for _ in range(max_iter):
        r = y - X @ beta
        mad = np.median(np.abs(r - np.median(r)))
        sigma = 1.4826 * mad if mad > 1e-10 else 1.0
        u = r / sigma
        
        # Weights from psi
        w = np.where(np.abs(u) > 1e-10, psi_bisquare(u, c_s) / u, 1)
        w = np.maximum(w, 1e-10)
        
        # Weighted least squares
        W = np.diag(w)
        beta_new = linalg.solve(X.T @ W @ X + 1e-10 * np.eye(p), X.T @ W @ y)
        
        if np.max(np.abs(beta_new - beta)) < tol:
            break
        beta = beta_new
    
    # Final scale estimate
    r = y - X @ beta
    mad = np.median(np.abs(r - np.median(r)))
    sigma = 1.4826 * mad if mad > 1e-10 else 1.0
    
    return beta, sigma
 
def mm_estimator(X, y, c_s=1.5476, c_mm=4.685, max_iter=100, tol=1e-6):
    """
    MM-estimator: high breakdown + high efficiency.
    
    Parameters:
    -----------
    c_s : float
        Tuning constant for S-estimator (breakdown)
    c_mm : float  
        Tuning constant for MM refinement (efficiency)
    """
    n, p = X.shape
    
    # Stage 1: S-estimator for high breakdown + scale
    beta_s, sigma_s = s_estimator(X, y, c_s, max_iter, tol)
    
    # Stage 2: M-estimator starting from beta_s, using fixed sigma_s
    beta = beta_s.copy()
    
    for _ in range(max_iter):
        r = y - X @ beta
        u = r / sigma_s  # Use fixed scale from S-estimator!
        
        # Weights from psi with efficiency-tuned c_mm
        w = np.where(np.abs(u) > 1e-10, psi_bisquare(u, c_mm) / u, 1)
        w = np.maximum(w, 1e-10)
        
        # Weighted least squares
        W = np.diag(w)
        beta_new = linalg.solve(X.T @ W @ X + 1e-10 * np.eye(p), X.T @ W @ y)
        
        if np.max(np.abs(beta_new - beta)) < tol:
            break
        beta = beta_new
    
    return beta, sigma_s
 
# Example: Heavy contamination
np.random.seed(42)
n = 100
 
# 70% inliers, 30% outliers (below M-estimator breakdown!)
n_inliers = 70
n_outliers = 30
 
X_inliers = np.column_stack([np.ones(n_inliers), np.random.randn(n_inliers)])
y_inliers = 2 + 3 * X_inliers[:, 1] + 0.3 * np.random.randn(n_inliers)
 
X_outliers = np.column_stack([np.ones(n_outliers), 3 * np.random.randn(n_outliers)])
y_outliers = -10 + 5 * np.random.randn(n_outliers)  # Different model!
 
X = np.vstack([X_inliers, X_outliers])
y = np.concatenate([y_inliers, y_outliers])
 
# Shuffle
perm = np.random.permutation(n)
X, y = X[perm], y[perm]
 
# Compare methods
beta_ols = linalg.lstsq(X, y)[0]
beta_mm, sigma_mm = mm_estimator(X, y)
 
print("True coefficients: [2.0, 3.0]")
print(f"OLS: [{beta_ols[0]:.3f}, {beta_ols[1]:.3f}]")
print(f"MM:  [{beta_mm[0]:.3f}, {beta_mm[1]:.3f}]")
print(f"\nMM successfully recovers true model despite 30% contamination!")

MM-Estimators: The Gold Standard

MM-estimators achieve 50% breakdown point AND 95% efficiency under Gaussian errors. They represent the state-of-the-art for robust regression when you need both reliability under contamination and efficiency with clean data.

Summary: Breakdown Point as Robustness Foundation

Key Takeaways

•Breakdown point defined: Maximum fraction of data that can be arbitrarily corrupted while estimate remains bounded
•Mean and OLS: 0% breakdown (single point can corrupt completely)
•Median: 50% breakdown (maximum achievable for location)
•50% barrier: No location/scale estimator can have >50% breakdown
•High-breakdown regression: LMS, LTS, S-estimators achieve ~50% but sacrifice efficiency
•MM-estimators: Two-stage approach achieves both 50% breakdown AND 95% efficiency

What's next:

We've now covered breakdown point—the ultimate measure of how much contamination an estimator can survive. The final page introduces influence functions—a complementary measure that describes how individual observations gradually affect estimates. Together, breakdown point and influence functions provide the complete picture of estimator robustness.

Page Complete

You now understand breakdown point as the fundamental measure of robustness, the 50% barrier that limits all estimators, and how MM-estimators achieve the best of both worlds—high breakdown and high efficiency.

Breakdown Point: The Ultimate Measure of Robustness

How Robust is Robust?

What fraction of the data can be arbitrarily corrupted before the estimator becomes completely unreliable?

This page develops the complete theory of breakdown points, from the formal definition to calculations for specific estimators to practical implications for choosing robust methods.

What You Will Learn

The Formal Definition

Finite-Sample Breakdown Point:

Consider an estimator $T$ applied to a sample $\mathbf{Z} = {z_1, z_2, \ldots, z_n}$. The finite-sample breakdown point $\varepsilon^*_n(T, \mathbf{Z})$ is defined as:

$$\varepsilon^*n(T, \mathbf{Z}) = \frac{1}{n} \max\left{m : \sup{\mathbf{Z}'_m} |T(\mathbf{Z}'_m) - T(\mathbf{Z})| < \infty \right}$$

In words: the maximum fraction of observations that can be replaced with arbitrary values while the estimator remains bounded.

Unpacking the definition:

$\mathbf{Z}'_m$: A corrupted sample where $m$ of the original $n$ observations are replaced with arbitrary (possibly infinite) values
$\sup_{\mathbf{Z}'_m}$: The supremum over all possible replacements (worst case)
$|T(\mathbf{Z}'_m) - T(\mathbf{Z})| < \infty$: The estimate remains finite (doesn't explode)

The breakdown point is the largest $m/n$ for which no adversarial corruption can make the estimator explode.

Asymptotic Breakdown Point:

As $n \to \infty$, the finite-sample breakdown point converges to the asymptotic breakdown point:

$$\varepsilon^* = \lim_{n \to \infty} \varepsilon^*_n(T, \mathbf{Z})$$

For most estimators, this limit exists and is independent of the original sample $\mathbf{Z}$.

The Adversarial Perspective

Breakdown Points of Common Estimators

Let's compute breakdown points for estimators we've encountered:

1. Sample Mean (Arithmetic Average)

For the sample mean $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$:

Consider replacing just one observation $x_j$ with $x_j' \to \infty$: $$\bar{x}' = \bar{x} + \frac{x_j' - x_j}{n} \to \infty$$

A single corrupted point makes the estimate unbounded.

$$\varepsilon^*_{\text{mean}} = \frac{1}{n} \xrightarrow{n \to \infty} 0$$

The sample mean has 0% asymptotic breakdown point!

2. Sample Median

The median is the middle value when data is sorted. To move the median arbitrarily:

Replace the median and all values above it (>50% of data)
Or replace the median and all values below it (>50% of data)

With less than 50% corruption, the median remains bounded by the uncorrupted data.

$$\varepsilon^*_{\text{median}} = \frac{\lfloor n/2 \rfloor + 1}{n} \xrightarrow{n \to \infty} 0.5$$

The sample median has 50% asymptotic breakdown point!

3. Ordinary Least Squares (OLS)

OLS regression is essentially a conditional mean. A single high-leverage outlier can make coefficients unbounded.

$$\varepsilon^*_{\text{OLS}} = \frac{1}{n} \xrightarrow{n \to \infty} 0$$

OLS has 0% breakdown point—identical to the mean.

Breakdown Points of Common Estimators
Estimator	Breakdown Point	Comments
Sample Mean	1/n → 0%	Completely non-robust
Sample Median	⌊n/2⌋/n → 50%	Maximum achievable for location
OLS Regression	1/n → 0%	Completely non-robust
L1 Regression (LAD)	1/(p+1) (approx)	Low, depends on design matrix
Huber M-estimator	1/n → 0%	Bounded influence ≠ high breakdown
Least Median of Squares (LMS)	⌊(n-p)/2⌋/n → 50%	High breakdown, low efficiency
S-estimators	~50%	High breakdown with tunable efficiency
MM-estimators	~50%	High breakdown, high efficiency

Huber's Paradox

Why 50% is the Maximum

It's natural to ask: can we design an estimator with breakdown point higher than 50%? The answer is no—and for a fundamental reason.

The Impossibility Argument:

Consider any sample $\mathbf{Z} = {z_1, \ldots, z_n}$ and any estimator $T$.

Suppose more than 50% of the data can be corrupted—say, we can replace $m > n/2$ observations.

Scenario 1: Replace those $m$ observations with copies of $z_1$. Scenario 2: Replace those $m$ observations with copies of $z_n$ (something very different).

Both resulting samples have the same number of corrupted points, so if the estimator survives one, it must survive the other.

But the two scenarios create completely different data distributions:

Scenario 1 has >50% of data near $z_1$
Scenario 2 has >50% of data near $z_n$

The Formal Statement:

For any location or scale estimator, the maximum achievable breakdown point is:

$$\varepsilon^* \leq \frac{\lfloor n/2 \rfloor + 1}{n} \xrightarrow{n \to \infty} 0.5$$

This is called the 50% breakdown point barrier.

The Majority Rule

High Breakdown Regression Estimators

Least Median of Squares (LMS)

Proposed by Rousseeuw (1984), LMS minimizes the median of squared residuals:

$$\hat{\boldsymbol{\beta}}{\text{LMS}} = \arg\min{\boldsymbol{\beta}} \text{median}_i(r_i^2)$$

Properties:

Breakdown point: $\varepsilon^* \approx 50%$ (achieves the barrier!)
Very robust to outliers in both X and Y
But: only $n^{-1/3}$ convergence rate (inefficient)
Computationally expensive (NP-hard exactly, approximations used)

Least Trimmed Squares (LTS)

Also by Rousseeuw, LTS minimizes the sum of the $h$ smallest squared residuals:

$$\hat{\boldsymbol{\beta}}{\text{LTS}} = \arg\min{\boldsymbol{\beta}} \sum_{i=1}^{h} r_{(i)}^2$$

where $r_{(i)}^2$ are the order statistics and $h \approx n/2$ for 50% breakdown.

Properties:

Breakdown point: $(n - h + 1)/n \approx 50%$ for $h \approx n/2$
More efficient than LMS ($n^{-1/2}$ rate)
Still requires global optimization (expensive)

S-Estimators

Minimize a robust scale of residuals:

$$\hat{\boldsymbol{\beta}}{S} = \arg\min{\boldsymbol{\beta}} \hat{\sigma}(r_1, \ldots, r_n)$$

where $\hat{\sigma}$ is an M-estimate of scale satisfying:

$$\frac{1}{n}\sum_{i=1}^n \rho\left(\frac{r_i}{\hat{\sigma}}\right) = K$$

for a bounded ρ-function and appropriately chosen constant $K$.

Properties:

Breakdown point: up to 50% with appropriate ρ
Better efficiency than LMS/LTS
Foundation for MM-estimators

The Efficiency-Robustness Tradeoff

MM-Estimators: The Best of Both Worlds

MM-estimators (introduced by Yohai, 1987) achieve the seemingly impossible: 50% breakdown point AND 95% efficiency under Gaussian errors.

The Two-Stage Approach:

Stage 1: High-Breakdown Initial Estimate Compute an S-estimate $\hat{\boldsymbol{\beta}}_S$ using a bounded ρ-function (e.g., bisquare with c yielding 50% breakdown). This gives:

50% breakdown point
Low efficiency (~30%)
A robust scale estimate $\hat{\sigma}_S$

Stage 2: Efficient Local Refinement Starting from $\hat{\boldsymbol{\beta}}_S$ and using the scale $\hat{\sigma}_S$ (held fixed), solve:

$$\hat{\boldsymbol{\beta}}{MM} = \arg\min{\boldsymbol{\beta}} \sum_{i=1}^n \rho^*\left(\frac{y_i - \mathbf{x}_i^\top \boldsymbol{\beta}}{\hat{\sigma}_S}\right)$$

where $\rho^*$ is a second (typically less bounded) function tuned for 95% efficiency.

mm_estimator.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import numpy as np
from scipy import linalg
from scipy.optimize import minimize
 
def rho_bisquare(u, c=4.685):
    """Tukey's bisquare rho function"""
    result = np.where(
        np.abs(u) <= c,
        (c**2 / 6) * (1 - (1 - (u/c)**2)**3),
        c**2 / 6
    )
    return result
 
def psi_bisquare(u, c=4.685):
    """Derivative of bisquare rho"""
    return np.where(np.abs(u) <= c, u * (1 - (u/c)**2)**2, 0)
 
def s_estimator(X, y, c_s=1.5476, max_iter=100, tol=1e-6):
    """
    S-estimator for high breakdown regression.
    c_s = 1.5476 gives 50% breakdown point with bisquare.
    """
    n, p = X.shape
    
    # Initial estimate via least trimmed squares approximation
    # (simplified: use LAD as starting point)
    beta = linalg.lstsq(X, y)[0]
    
    for _ in range(max_iter):
        r = y - X @ beta
        mad = np.median(np.abs(r - np.median(r)))
        sigma = 1.4826 * mad if mad > 1e-10 else 1.0
        u = r / sigma
        
        # Weights from psi
        w = np.where(np.abs(u) > 1e-10, psi_bisquare(u, c_s) / u, 1)
        w = np.maximum(w, 1e-10)
        
        # Weighted least squares
        W = np.diag(w)
        beta_new = linalg.solve(X.T @ W @ X + 1e-10 * np.eye(p), X.T @ W @ y)
        
        if np.max(np.abs(beta_new - beta)) < tol:
            break
        beta = beta_new
    
    # Final scale estimate
    r = y - X @ beta
    mad = np.median(np.abs(r - np.median(r)))
    sigma = 1.4826 * mad if mad > 1e-10 else 1.0
    
    return beta, sigma
 
def mm_estimator(X, y, c_s=1.5476, c_mm=4.685, max_iter=100, tol=1e-6):
    """
    MM-estimator: high breakdown + high efficiency.
    
    Parameters:
    -----------
    c_s : float
        Tuning constant for S-estimator (breakdown)
    c_mm : float  
        Tuning constant for MM refinement (efficiency)
    """
    n, p = X.shape
    
    # Stage 1: S-estimator for high breakdown + scale
    beta_s, sigma_s = s_estimator(X, y, c_s, max_iter, tol)
    
    # Stage 2: M-estimator starting from beta_s, using fixed sigma_s
    beta = beta_s.copy()
    
    for _ in range(max_iter):
        r = y - X @ beta
        u = r / sigma_s  # Use fixed scale from S-estimator!
        
        # Weights from psi with efficiency-tuned c_mm
        w = np.where(np.abs(u) > 1e-10, psi_bisquare(u, c_mm) / u, 1)
        w = np.maximum(w, 1e-10)
        
        # Weighted least squares
        W = np.diag(w)
        beta_new = linalg.solve(X.T @ W @ X + 1e-10 * np.eye(p), X.T @ W @ y)
        
        if np.max(np.abs(beta_new - beta)) < tol:
            break
        beta = beta_new
    
    return beta, sigma_s
 
# Example: Heavy contamination
np.random.seed(42)
n = 100
 
# 70% inliers, 30% outliers (below M-estimator breakdown!)
n_inliers = 70
n_outliers = 30
 
X_inliers = np.column_stack([np.ones(n_inliers), np.random.randn(n_inliers)])
y_inliers = 2 + 3 * X_inliers[:, 1] + 0.3 * np.random.randn(n_inliers)
 
X_outliers = np.column_stack([np.ones(n_outliers), 3 * np.random.randn(n_outliers)])
y_outliers = -10 + 5 * np.random.randn(n_outliers)  # Different model!
 
X = np.vstack([X_inliers, X_outliers])
y = np.concatenate([y_inliers, y_outliers])
 
# Shuffle
perm = np.random.permutation(n)
X, y = X[perm], y[perm]
 
# Compare methods
beta_ols = linalg.lstsq(X, y)[0]
beta_mm, sigma_mm = mm_estimator(X, y)
 
print("True coefficients: [2.0, 3.0]")
print(f"OLS: [{beta_ols[0]:.3f}, {beta_ols[1]:.3f}]")
print(f"MM:  [{beta_mm[0]:.3f}, {beta_mm[1]:.3f}]")
print(f"\nMM successfully recovers true model despite 30% contamination!")

MM-Estimators: The Gold Standard

Summary: Breakdown Point as Robustness Foundation

Key Takeaways

•Breakdown point defined: Maximum fraction of data that can be arbitrarily corrupted while estimate remains bounded
•Mean and OLS: 0% breakdown (single point can corrupt completely)
•Median: 50% breakdown (maximum achievable for location)
•50% barrier: No location/scale estimator can have >50% breakdown
•High-breakdown regression: LMS, LTS, S-estimators achieve ~50% but sacrifice efficiency
•MM-estimators: Two-stage approach achieves both 50% breakdown AND 95% efficiency

What's next:

Page Complete