Loading content...
Real-world data is messy. Outliers lurk in datasets due to measurement errors, data entry mistakes, exceptional circumstances, or heavy-tailed distributions that generate extreme observations naturally. Traditional least squares regression—optimized for the mean—is notoriously sensitive to such violations.
Quantile regression, particularly median regression, offers a fundamentally robust alternative.
The robustness of quantile regression isn't an afterthought or an add-on—it emerges naturally from the loss function's geometry. Where squared loss penalizes large residuals quadratically (making extreme observations disproportionately influential), quantile loss penalizes only linearly, bounding the influence of any single point.
This page explores:
By the end of this page, you will understand why quantile regression is robust, how to quantify robustness using influence functions and breakdown points, how it compares to other robust estimators, and when robustness should drive your methodological choices.
Before appreciating quantile regression's robustness, we must understand what makes OLS fragile.
The OLS Objective:
$$\hat{\beta}{OLS} = \arg\min\beta \sum_{i=1}^n (y_i - x_i^\top \beta)^2$$
Why Squaring Creates Sensitivity:
Consider two residuals: $r_1 = 1$ and $r_2 = 10$.
This means a single extreme observation can dominate the minimization, pulling $\hat{\beta}$ away from the pattern in the majority of the data.
A Striking Example:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegression, QuantileRegressor np.random.seed(42) # Generate clean datan = 50X = np.random.uniform(0, 10, n).reshape(-1, 1)y = 2 * X.ravel() + 5 + np.random.normal(0, 1, n) # Add one outlierX_outlier = np.vstack([X, [[5]]])y_outlier = np.append(y, [50]) # Massive outlier # Fit OLS and median regressionols_clean = LinearRegression().fit(X, y)ols_outlier = LinearRegression().fit(X_outlier, y_outlier)qr_outlier = QuantileRegressor(quantile=0.5, alpha=0, solver='highs').fit(X_outlier, y_outlier) # Plotfig, axes = plt.subplots(1, 2, figsize=(14, 5))X_line = np.linspace(0, 10, 100).reshape(-1, 1) ax1 = axes[0]ax1.scatter(X, y, alpha=0.7, s=50, label='Clean Data')ax1.plot(X_line, ols_clean.predict(X_line), 'b-', linewidth=2, label='OLS (Clean)')ax1.set_xlabel('X')ax1.set_ylabel('y')ax1.set_title('Clean Data: OLS Works Well')ax1.legend()ax1.grid(True, alpha=0.3) ax2 = axes[1]ax2.scatter(X, y, alpha=0.7, s=50, label='Clean Data')ax2.scatter([5], [50], color='red', s=200, marker='*', label='Single Outlier', zorder=5)ax2.plot(X_line, ols_outlier.predict(X_line), 'r--', linewidth=2, label='OLS (with outlier)')ax2.plot(X_line, qr_outlier.predict(X_line), 'g-', linewidth=2, label='Median Regression (τ=0.5)')ax2.plot(X_line, ols_clean.predict(X_line), 'b:', linewidth=2, label='OLS (clean, reference)')ax2.set_xlabel('X')ax2.set_ylabel('y')ax2.set_title('One Outlier Destroys OLS; Median Regression Resists')ax2.legend()ax2.grid(True, alpha=0.3)ax2.set_ylim(-5, 55) plt.tight_layout()plt.show() # Coefficient comparisonprint("Slope Estimates (True = 2.0):")print(f" OLS (clean): {ols_clean.coef_[0]:.3f}")print(f" OLS (with outlier): {ols_outlier.coef_[0]:.3f}")print(f" Median (with outlier): {qr_outlier.coef_[0]:.3f}")A single outlier among 50 observations can completely distort the OLS fit. The median regression line remains virtually unchanged. This is not a minor difference—it's the difference between a useful model and a useless one.
The influence function is a fundamental tool from robust statistics that measures how sensitive an estimator is to infinitesimal contamination at a particular point.
Definition (Influence Function):
For an estimator $T$ at distribution $F$, the influence function at point $z$ is:
$$\text{IF}(z; T, F) = \lim_{\varepsilon \to 0} \frac{T((1-\varepsilon)F + \varepsilon \delta_z) - T(F)}{\varepsilon}$$
where $\delta_z$ is a point mass at $z$.
Interpretation: IF$(z)$ measures how the estimator changes when an infinitesimal fraction of data comes from a point mass at $z$. If IF$(z)$ grows unboundedly as $z \to \infty$, the estimator is sensitive to extreme observations.
Influence Functions for Key Estimators:
| Estimator | Influence Function IF(y) | Boundedness |
|---|---|---|
| Sample Mean | y − μ | Unbounded: IF → ±∞ as y → ±∞ |
| Sample Median | sign(y − μ) / (2f(μ)) | Bounded: IF ∈ {−c, +c} |
| OLS slope | Proportional to x(y − xβ) | Unbounded in both x and y |
| Quantile Regression | Bounded transformation | Bounded in y, depends on x distribution |
The Mean's Unbounded Influence:
For the sample mean: $$\text{IF}(y) = y - \mu$$
As $y \to \infty$, IF$(y) \to \infty$. A single extreme observation has arbitrarily large influence.
The Median's Bounded Influence:
For the sample median with symmetric density $f$: $$\text{IF}(y) = \frac{\text{sign}(y - \mu)}{2f(\mu)}$$
This is bounded! No matter how extreme $y$ is, it contributes the same amount to moving the median. Once an observation falls above (or below) the median, moving it further has no additional effect.
Quantile Regression's Bounded Influence:
For quantile regression at level $\tau$, the influence function in the $y$-direction is bounded. The loss function $\rho_\tau(y - x^\top \beta)$ grows only linearly in $|y|$, leading to:
$$\frac{\partial}{\partial y} \rho_\tau(y - x^\top \beta) = \tau - \mathbb{1}{y < x^\top \beta}$$
This derivative is bounded in $[-1, 1]$, ensuring stable estimates.
A bounded influence function is the hallmark of a robust estimator. It guarantees that no single observation—no matter how extreme—can arbitrarily distort the estimate. Quantile regression achieves this naturally through its linear (rather than quadratic) loss.
While the influence function measures sensitivity to infinitesimal contamination, the breakdown point measures resistance to gross contamination.
Definition (Finite-Sample Breakdown Point):
The breakdown point is the maximum fraction of data that can be arbitrarily corrupted while the estimator remains bounded:
$$\varepsilon^* = \min\left{\frac{m}{n} : \sup_{|y'_1|, ..., |y'm| \to \infty} |T(y_1, ..., y{n-m}, y'_1, ..., y'_m)| = \infty\right}$$
Breakdown Points for Common Estimators:
| Estimator | Breakdown Point | Interpretation |
|---|---|---|
| Sample Mean | ε* = 0 (or 1/n) | A single outlier can corrupt the mean arbitrarily |
| Sample Median | ε* = 0.5 | Up to ~50% of data can be outliers before breakdown |
| OLS Regression | ε* = 1/n | One high-leverage outlier can destroy the fit |
| Median Regression (τ=0.5) | ε* ≈ 0.5 | Approximately 50% breakdown point |
| General Quantile (τ) | ε* ≈ min(τ, 1-τ) | Depends on τ; most robust at τ=0.5 |
| Trimmed Mean (10%) | ε* = 0.1 | 10% of extremes in each tail can be corrupted |
| LMS/LTS Regression | ε* ≈ 0.5 | High breakdown, but computationally expensive |
Why 50% is the Maximum:
No reasonable estimator can have breakdown point above 50%. If more than half the data is corrupted, the "outliers" become the majority, and distinguishing signal from contamination becomes impossible.
Median Regression's High Breakdown:
Median regression (quantile regression with $\tau = 0.5$) achieves near-optimal breakdown:
$$\varepsilon^* \approx \frac{1}{2} \cdot \frac{1}{p+1}$$
where $p$ is the number of predictors. For univariate regression ($p=1$), this gives $\varepsilon^* \approx 0.25$, meaning up to ~25% of data can be arbitrarily corrupted.
Extreme Quantiles Are Less Robust:
For $\tau = 0.1$ (10th percentile), approximately $\min(0.1, 0.9) = 0.1$ fraction can be corrupted. Extreme quantiles estimate tail behavior, which requires extreme observations to be genuine—making robustness to tail contamination limited.
Extreme quantile estimation inherently requires observations from the tails. If tail observations are outliers, they contain genuine information about the tail distribution. If they are errors, they corrupt the estimate. Distinguishing these cases is fundamentally difficult—hence the lower breakdown point for extreme τ.
Quantile regression is one member of a broader family of robust regression methods. Understanding the alternatives clarifies when each is appropriate.
M-Estimators (Huber, Tukey):
M-estimators generalize maximum likelihood by solving: $$\hat{\beta} = \arg\min_\beta \sum_{i=1}^n \rho(y_i - x_i^\top \beta)$$
for various $\rho$ functions:
Huber loss: Quadratic near zero, linear in tails. Robust to y-outliers but sensitive to x-outliers (leverage points).
Tukey's biweight: Completely downweights extreme residuals (redescending). Very robust but can be multimodal.
Least Median of Squares (LMS) / Least Trimmed Squares (LTS):
Both achieve 50% breakdown but are computationally expensive (combinatorial search).
| Method | Breakdown Point | Y-Outliers | X-Outliers | Computation | Efficiency |
|---|---|---|---|---|---|
| OLS | 0 | Sensitive | Sensitive | O(np²) | 100% (Gaussian) |
| Median Reg. (τ=0.5) | ~50%/(p+1) | Robust | Moderate | O(n log n) | 64% (Gaussian) |
| Huber M-estimator | ~0 | Robust | Sensitive | O(np²) | 95% (Gaussian) |
| Tukey M-estimator | ~0 | Very robust | Sensitive | O(np²) | Variable |
| LMS/LTS | ~50% | Very robust | Robust | Exponential | Low |
| MM-estimator | ~50% | Very robust | Robust | Iterative | High |
Use quantile regression when: (1) you want interpretable quantile estimates, (2) heteroscedasticity is present, or (3) you need distribution-free inference. Use Huber M-estimation for near-Gaussian data with occasional outliers. Use MM-estimators when high breakdown and high efficiency are both critical.
Robustness isn't free. There's a fundamental trade-off between robust estimators and efficient ones.
Definition (Relative Efficiency):
The asymptotic relative efficiency (ARE) of estimator $T_1$ relative to $T_2$ is:
$$\text{ARE}(T_1, T_2) = \frac{\text{Var}(T_2)}{\text{Var}(T_1)}$$
Higher ARE means $T_1$ is more efficient (lower variance).
Efficiency of Median Regression:
Under Gaussian errors, median regression has: $$\text{ARE}(\text{Median}, \text{Mean}) = \frac{2}{\pi} \approx 0.637$$
Median regression is 64% as efficient as OLS when errors are truly Gaussian. You need approximately 1/0.637 ≈ 1.57× more data to achieve the same precision.
But Under Heavy Tails:
For heavier-tailed distributions (Laplace, t-distribution), the efficiency comparison reverses:
| Distribution | ARE(Median, Mean) | Interpretation |
|---|---|---|
| Normal (Gaussian) | 0.637 (64%) | Mean is 1.57× more efficient |
| Laplace (double exponential) | 2.0 (200%) | Median is 2× more efficient |
| t-distribution (df=5) | 1.07 (107%) | Approximately equal |
| t-distribution (df=3) | ∞ | Mean variance is infinite! |
| Cauchy | undefined/∞ | Mean doesn't exist; median still works |
| Contaminated Normal (10%) | 1 | Median more efficient with outliers |
The Key Insight:
If errors are Gaussian: OLS is more efficient, and the efficiency loss from quantile regression is the price of unused robustness.
If errors are heavy-tailed: Quantile regression can be MORE efficient than OLS, plus it's robust.
If you're unsure: The robustness insurance of quantile regression may be worth the mild efficiency loss under Gaussianity.
Gross Error Sensitivity:
In practice, data rarely follows perfect Gaussian assumptions. A more realistic model is the contaminated normal:
$$Y \sim (1-\varepsilon) \cdot N(\mu, \sigma^2) + \varepsilon \cdot N(\mu, k^2\sigma^2)$$
where $\varepsilon$ fraction of observations come from a high-variance component. With even 5% contamination, robust estimators often outperform OLS.
Think of robustness as insurance. Under ideal conditions (Gaussian errors), you pay a premium (64% efficiency). Under adverse conditions (outliers, heavy tails), the insurance pays off—your estimates remain valid while OLS collapses.
An important nuance: quantile regression is robust to y-direction outliers but has only moderate resistance to x-direction outliers (leverage points).
Definitions:
Why Leverage Points Are Problematic:
In regression, observations with extreme $x$ values have more influence on the slope because they have longer "lever arms." This is true for both OLS and quantile regression.
Geometric Intuition:
Imagine fitting a line to data. A point far from the center of the x-distribution acts like a pivot point—moving its y-value rotates the line substantially. Points near the center of x affect the line much less.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LinearRegression, QuantileRegressor np.random.seed(42) # Generate data with X in [0, 5]n = 50X = np.random.uniform(0, 5, n).reshape(-1, 1)y = 2 * X.ravel() + 5 + np.random.normal(0, 1, n) # Scenario 1: Vertical outlier near center of XX_v = np.vstack([X, [[2.5]]]) # x near centery_v = np.append(y, [50]) # extreme y # Scenario 2: High-leverage outlier (extreme x AND y)X_l = np.vstack([X, [[10]]]) # x far from datay_l = np.append(y, [5]) # y not following pattern # Fit modelsols_clean = LinearRegression().fit(X, y)ols_v = LinearRegression().fit(X_v, y_v)ols_l = LinearRegression().fit(X_l, y_l)qr_v = QuantileRegressor(quantile=0.5, alpha=0, solver='highs').fit(X_v, y_v)qr_l = QuantileRegressor(quantile=0.5, alpha=0, solver='highs').fit(X_l, y_l) # Plotsfig, axes = plt.subplots(1, 2, figsize=(14, 5))X_line = np.linspace(-1, 12, 100).reshape(-1, 1) ax1 = axes[0]ax1.scatter(X, y, alpha=0.7, s=50)ax1.scatter([2.5], [50], color='red', s=200, marker='*', label='Vertical Outlier', zorder=5)ax1.plot(X_line, ols_clean.predict(X_line), 'b:', linewidth=2, label='OLS (clean)')ax1.plot(X_line, ols_v.predict(X_line), 'r--', linewidth=2, label='OLS (with outlier)')ax1.plot(X_line, qr_v.predict(X_line), 'g-', linewidth=2, label='Median Reg.')ax1.set_xlabel('X')ax1.set_ylabel('y')ax1.set_title('Vertical Outlier: Median Regression Resists')ax1.legend()ax1.grid(True, alpha=0.3)ax1.set_xlim(-1, 12)ax1.set_ylim(-5, 55) ax2 = axes[1]ax2.scatter(X, y, alpha=0.7, s=50)ax2.scatter([10], [5], color='orange', s=200, marker='*', label='Leverage Point', zorder=5)ax2.plot(X_line, ols_clean.predict(X_line), 'b:', linewidth=2, label='OLS (clean)')ax2.plot(X_line, ols_l.predict(X_line), 'r--', linewidth=2, label='OLS (with outlier)')ax2.plot(X_line, qr_l.predict(X_line), 'g-', linewidth=2, label='Median Reg.')ax2.set_xlabel('X')ax2.set_ylabel('y')ax2.set_title('Leverage Point: Both Methods Affected!')ax2.legend()ax2.grid(True, alpha=0.3)ax2.set_xlim(-1, 12)ax2.set_ylim(-5, 55) plt.tight_layout()plt.show() print("Slope Estimates (True = 2.0):")print(f" Clean OLS: {ols_clean.coef_[0]:.3f}")print(f" Vertical outlier OLS: {ols_v.coef_[0]:.3f}")print(f" Vertical outlier QR: {qr_v.coef_[0]:.3f}")print(f" Leverage point OLS: {ols_l.coef_[0]:.3f}")print(f" Leverage point QR: {qr_l.coef_[0]:.3f}")Quantile regression provides excellent protection against vertical outliers but only moderate protection against high-leverage points. For complete robustness, consider: (1) detecting and examining leverage points, (2) using bounded-influence regression, or (3) applying robust distance measures in the x-space.
Not every analysis needs robust methods. Here's guidance on when robustness should be a priority.
A Practical Decision Framework:
Always examine residuals before deciding. Large residuals suggest potential outliers.
Compare OLS and median regression. If estimates differ substantially, outliers are influencing OLS.
Assess downstream impact. If the analysis drives important decisions, robustness is worth the efficiency cost.
Consider the worst case. What happens if there are undetected outliers? Robust methods provide insurance.
Report both when uncertain. Presenting OLS and quantile regression side-by-side illuminates sensitivity.
In practice, the efficiency loss from using robust methods on clean data is modest (64% for median regression). The protection against undetected outliers is substantial. When in doubt, robust methods provide better risk-adjusted performance.
We have explored the robustness properties of quantile regression—an essential advantage for real-world applications.
Module Conclusion: Quantile Regression Mastery
Over these five pages, you have built a comprehensive understanding of quantile regression:
Quantile regression complements mean regression by providing:
Congratulations! You have mastered quantile regression—a powerful technique that models the entire conditional distribution, provides robust estimates, and enables principled uncertainty quantification. This knowledge extends your regression toolkit far beyond mean-focused methods, preparing you for the complexities of real-world data analysis.