Loading content...
We've seen that different estimators have different sensitivities to outliers. Huber regression is more robust than OLS. Tukey's bisquare is more robust than Huber. RANSAC can handle extreme contamination. But how do we quantify these claims? How do we rigorously compare the robustness of different methods?
Enter the breakdown point—the single most important concept in robust statistics. Introduced by Frank Hampel in his 1968 dissertation and further developed by Peter Rousseeuw, the breakdown point answers a fundamental question:
What fraction of the data can be arbitrarily corrupted before the estimator becomes completely unreliable?
This seemingly simple question has profound implications. An estimator with breakdown point 50% can withstand nearly half the data being replaced with arbitrary values—even adversarially chosen values designed to maximize damage. An estimator with breakdown point 0% (like the sample mean or OLS) can be completely corrupted by a single observation.
The breakdown point provides an absolute limit on robustness. No matter how extreme the outliers, an estimator cannot break down unless the contamination fraction exceeds its breakdown point. This mathematical guarantee is invaluable when dealing with real-world data where you don't know how bad the contamination might be.
This page develops the complete theory of breakdown points, from the formal definition to calculations for specific estimators to practical implications for choosing robust methods.
By the end of this page, you will understand the formal definition of breakdown point (finite sample and asymptotic), how to compute breakdown points for common estimators, why 50% is the maximum achievable, and how breakdown point relates to other robustness measures.
Finite-Sample Breakdown Point:
Consider an estimator $T$ applied to a sample $\mathbf{Z} = {z_1, z_2, \ldots, z_n}$. The finite-sample breakdown point $\varepsilon^*_n(T, \mathbf{Z})$ is defined as:
$$\varepsilon^*n(T, \mathbf{Z}) = \frac{1}{n} \max\left{m : \sup{\mathbf{Z}'_m} |T(\mathbf{Z}'_m) - T(\mathbf{Z})| < \infty \right}$$
In words: the maximum fraction of observations that can be replaced with arbitrary values while the estimator remains bounded.
Unpacking the definition:
The breakdown point is the largest $m/n$ for which no adversarial corruption can make the estimator explode.
Asymptotic Breakdown Point:
As $n \to \infty$, the finite-sample breakdown point converges to the asymptotic breakdown point:
$$\varepsilon^* = \lim_{n \to \infty} \varepsilon^*_n(T, \mathbf{Z})$$
For most estimators, this limit exists and is independent of the original sample $\mathbf{Z}$.
Breakdown point assumes an adversary who knows your estimator and places outliers to cause maximum damage. This is the worst-case scenario. In practice, outliers are usually 'accidental' rather than adversarial, so estimators often perform better than their breakdown point suggests.
Let's compute breakdown points for estimators we've encountered:
1. Sample Mean (Arithmetic Average)
For the sample mean $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$:
Consider replacing just one observation $x_j$ with $x_j' \to \infty$: $$\bar{x}' = \bar{x} + \frac{x_j' - x_j}{n} \to \infty$$
A single corrupted point makes the estimate unbounded.
$$\varepsilon^*_{\text{mean}} = \frac{1}{n} \xrightarrow{n \to \infty} 0$$
The sample mean has 0% asymptotic breakdown point!
2. Sample Median
The median is the middle value when data is sorted. To move the median arbitrarily:
With less than 50% corruption, the median remains bounded by the uncorrupted data.
$$\varepsilon^*_{\text{median}} = \frac{\lfloor n/2 \rfloor + 1}{n} \xrightarrow{n \to \infty} 0.5$$
The sample median has 50% asymptotic breakdown point!
3. Ordinary Least Squares (OLS)
OLS regression is essentially a conditional mean. A single high-leverage outlier can make coefficients unbounded.
$$\varepsilon^*_{\text{OLS}} = \frac{1}{n} \xrightarrow{n \to \infty} 0$$
OLS has 0% breakdown point—identical to the mean.
| Estimator | Breakdown Point | Comments |
|---|---|---|
| Sample Mean | 1/n → 0% | Completely non-robust |
| Sample Median | ⌊n/2⌋/n → 50% | Maximum achievable for location |
| OLS Regression | 1/n → 0% | Completely non-robust |
| L1 Regression (LAD) | 1/(p+1) (approx) | Low, depends on design matrix |
| Huber M-estimator | 1/n → 0% | Bounded influence ≠ high breakdown |
| Least Median of Squares (LMS) | ⌊(n-p)/2⌋/n → 50% | High breakdown, low efficiency |
| S-estimators | ~50% | High breakdown with tunable efficiency |
| MM-estimators | ~50% | High breakdown, high efficiency |
The Huber M-estimator has bounded influence (outliers are down-weighted) but 0% breakdown point. Why? A single point with extreme X (leverage) and moderate Y can still corrupt the fit. Bounded influence is about gradual corruption; breakdown is about catastrophic failure.
It's natural to ask: can we design an estimator with breakdown point higher than 50%? The answer is no—and for a fundamental reason.
The Impossibility Argument:
Consider any sample $\mathbf{Z} = {z_1, \ldots, z_n}$ and any estimator $T$.
Suppose more than 50% of the data can be corrupted—say, we can replace $m > n/2$ observations.
Scenario 1: Replace those $m$ observations with copies of $z_1$. Scenario 2: Replace those $m$ observations with copies of $z_n$ (something very different).
Both resulting samples have the same number of corrupted points, so if the estimator survives one, it must survive the other.
But the two scenarios create completely different data distributions:
A sensible estimator must recognize these as different situations and produce different estimates. But then the adversary can choose which scenario to create, making the change in the estimate arbitrarily large.
The Formal Statement:
For any location or scale estimator, the maximum achievable breakdown point is:
$$\varepsilon^* \leq \frac{\lfloor n/2 \rfloor + 1}{n} \xrightarrow{n \to \infty} 0.5$$
This is called the 50% breakdown point barrier.
The 50% barrier has an intuitive interpretation: with >50% outliers, the outliers become the 'majority.' Any estimator must follow the majority, otherwise the adversary can switch which group is labeled 'outliers.' The best we can do is tie-breaking at exactly 50%.
Achieving the 50% breakdown point in regression is harder than for location estimation. The design matrix X introduces complications—outliers can be extreme in X-space (leverage points), Y-space (vertical outliers), or both.
Least Median of Squares (LMS)
Proposed by Rousseeuw (1984), LMS minimizes the median of squared residuals:
$$\hat{\boldsymbol{\beta}}{\text{LMS}} = \arg\min{\boldsymbol{\beta}} \text{median}_i(r_i^2)$$
Properties:
Least Trimmed Squares (LTS)
Also by Rousseeuw, LTS minimizes the sum of the $h$ smallest squared residuals:
$$\hat{\boldsymbol{\beta}}{\text{LTS}} = \arg\min{\boldsymbol{\beta}} \sum_{i=1}^{h} r_{(i)}^2$$
where $r_{(i)}^2$ are the order statistics and $h \approx n/2$ for 50% breakdown.
Properties:
S-Estimators
Minimize a robust scale of residuals:
$$\hat{\boldsymbol{\beta}}{S} = \arg\min{\boldsymbol{\beta}} \hat{\sigma}(r_1, \ldots, r_n)$$
where $\hat{\sigma}$ is an M-estimate of scale satisfying:
$$\frac{1}{n}\sum_{i=1}^n \rho\left(\frac{r_i}{\hat{\sigma}}\right) = K$$
for a bounded ρ-function and appropriately chosen constant $K$.
Properties:
High-breakdown estimators like LMS and LTS achieve 50% breakdown but have low efficiency under Gaussian errors (~30-40%). This created demand for estimators with both high breakdown AND high efficiency—leading to MM-estimators.
MM-estimators (introduced by Yohai, 1987) achieve the seemingly impossible: 50% breakdown point AND 95% efficiency under Gaussian errors.
The Two-Stage Approach:
Stage 1: High-Breakdown Initial Estimate Compute an S-estimate $\hat{\boldsymbol{\beta}}_S$ using a bounded ρ-function (e.g., bisquare with c yielding 50% breakdown). This gives:
Stage 2: Efficient Local Refinement Starting from $\hat{\boldsymbol{\beta}}_S$ and using the scale $\hat{\sigma}_S$ (held fixed), solve:
$$\hat{\boldsymbol{\beta}}{MM} = \arg\min{\boldsymbol{\beta}} \sum_{i=1}^n \rho^*\left(\frac{y_i - \mathbf{x}_i^\top \boldsymbol{\beta}}{\hat{\sigma}_S}\right)$$
where $\rho^*$ is a second (typically less bounded) function tuned for 95% efficiency.
Key Insight: The first-stage provides a "safe" starting point. The second stage only searches locally, so it can use a more efficient (less robust) criterion without risk of converging to an outlier-induced solution.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
import numpy as npfrom scipy import linalgfrom scipy.optimize import minimize def rho_bisquare(u, c=4.685): """Tukey's bisquare rho function""" result = np.where( np.abs(u) <= c, (c**2 / 6) * (1 - (1 - (u/c)**2)**3), c**2 / 6 ) return result def psi_bisquare(u, c=4.685): """Derivative of bisquare rho""" return np.where(np.abs(u) <= c, u * (1 - (u/c)**2)**2, 0) def s_estimator(X, y, c_s=1.5476, max_iter=100, tol=1e-6): """ S-estimator for high breakdown regression. c_s = 1.5476 gives 50% breakdown point with bisquare. """ n, p = X.shape # Initial estimate via least trimmed squares approximation # (simplified: use LAD as starting point) beta = linalg.lstsq(X, y)[0] for _ in range(max_iter): r = y - X @ beta mad = np.median(np.abs(r - np.median(r))) sigma = 1.4826 * mad if mad > 1e-10 else 1.0 u = r / sigma # Weights from psi w = np.where(np.abs(u) > 1e-10, psi_bisquare(u, c_s) / u, 1) w = np.maximum(w, 1e-10) # Weighted least squares W = np.diag(w) beta_new = linalg.solve(X.T @ W @ X + 1e-10 * np.eye(p), X.T @ W @ y) if np.max(np.abs(beta_new - beta)) < tol: break beta = beta_new # Final scale estimate r = y - X @ beta mad = np.median(np.abs(r - np.median(r))) sigma = 1.4826 * mad if mad > 1e-10 else 1.0 return beta, sigma def mm_estimator(X, y, c_s=1.5476, c_mm=4.685, max_iter=100, tol=1e-6): """ MM-estimator: high breakdown + high efficiency. Parameters: ----------- c_s : float Tuning constant for S-estimator (breakdown) c_mm : float Tuning constant for MM refinement (efficiency) """ n, p = X.shape # Stage 1: S-estimator for high breakdown + scale beta_s, sigma_s = s_estimator(X, y, c_s, max_iter, tol) # Stage 2: M-estimator starting from beta_s, using fixed sigma_s beta = beta_s.copy() for _ in range(max_iter): r = y - X @ beta u = r / sigma_s # Use fixed scale from S-estimator! # Weights from psi with efficiency-tuned c_mm w = np.where(np.abs(u) > 1e-10, psi_bisquare(u, c_mm) / u, 1) w = np.maximum(w, 1e-10) # Weighted least squares W = np.diag(w) beta_new = linalg.solve(X.T @ W @ X + 1e-10 * np.eye(p), X.T @ W @ y) if np.max(np.abs(beta_new - beta)) < tol: break beta = beta_new return beta, sigma_s # Example: Heavy contaminationnp.random.seed(42)n = 100 # 70% inliers, 30% outliers (below M-estimator breakdown!)n_inliers = 70n_outliers = 30 X_inliers = np.column_stack([np.ones(n_inliers), np.random.randn(n_inliers)])y_inliers = 2 + 3 * X_inliers[:, 1] + 0.3 * np.random.randn(n_inliers) X_outliers = np.column_stack([np.ones(n_outliers), 3 * np.random.randn(n_outliers)])y_outliers = -10 + 5 * np.random.randn(n_outliers) # Different model! X = np.vstack([X_inliers, X_outliers])y = np.concatenate([y_inliers, y_outliers]) # Shuffleperm = np.random.permutation(n)X, y = X[perm], y[perm] # Compare methodsbeta_ols = linalg.lstsq(X, y)[0]beta_mm, sigma_mm = mm_estimator(X, y) print("True coefficients: [2.0, 3.0]")print(f"OLS: [{beta_ols[0]:.3f}, {beta_ols[1]:.3f}]")print(f"MM: [{beta_mm[0]:.3f}, {beta_mm[1]:.3f}]")print(f"\nMM successfully recovers true model despite 30% contamination!")MM-estimators achieve 50% breakdown point AND 95% efficiency under Gaussian errors. They represent the state-of-the-art for robust regression when you need both reliability under contamination and efficiency with clean data.
What's next:
We've now covered breakdown point—the ultimate measure of how much contamination an estimator can survive. The final page introduces influence functions—a complementary measure that describes how individual observations gradually affect estimates. Together, breakdown point and influence functions provide the complete picture of estimator robustness.
You now understand breakdown point as the fundamental measure of robustness, the 50% barrier that limits all estimators, and how MM-estimators achieve the best of both worlds—high breakdown and high efficiency.