Loading content...
Standard regression provides a single number for each input: the conditional mean $\mathbb{E}[Y \mid X = x]$. But this summary collapses an entire distribution into one point. What about the spread? The tails? The asymmetry?
Conditional quantile regression addresses these questions by estimating:
$$Q_\tau(Y \mid X = x) = \inf{y : P(Y \leq y \mid X = x) \geq \tau}$$
for any desired $\tau \in (0, 1)$. By fitting models for multiple quantile levels, we reconstruct the shape of the conditional distribution—not just its center.
Why This Matters:
By the end of this page, you will understand how to interpret quantile regression coefficients, detect heteroscedasticity through quantile spreads, visualize conditional distributions, and identify when covariates have quantile-specific effects.
Let's formalize the conditional quantile function and its relationship to the conditional CDF.
Definition (Conditional CDF):
For random variables $Y$ and $X$, the conditional cumulative distribution function is:
$$F_{Y|X}(y \mid x) = P(Y \leq y \mid X = x)$$
Definition (Conditional Quantile Function):
The conditional quantile function is the inverse:
$$Q_\tau(Y \mid X = x) = F_{Y|X}^{-1}(\tau \mid x) = \inf{y : F_{Y|X}(y \mid x) \geq \tau}$$
Key Properties:
Linear Conditional Quantile Model:
The most common specification assumes a linear model for each quantile:
$$Q_\tau(Y \mid X = x) = x^\top \beta(\tau)$$
where $\beta(\tau) \in \mathbb{R}^p$ is the coefficient vector specific to quantile $\tau$.
Critically, each quantile has its own coefficients. The effect of a covariate on the 90th percentile may differ substantially from its effect on the median or the 10th percentile.
Alternative: Location-Scale Model:
A more restrictive model assumes:
$$Y \mid X = X^\top \beta + (X^\top \gamma) \cdot \epsilon$$
where $\epsilon$ has a fixed distribution. Under this model:
$$Q_\tau(Y \mid X) = X^\top \beta + (X^\top \gamma) \cdot q_\tau^\epsilon$$
This implies $\beta_j(\tau) = \beta_j + \gamma_j \cdot q_\tau^\epsilon$—coefficients change linearly with $q_\tau^\epsilon$.
The unrestricted linear quantile model Q_τ(Y|X) = X'β(τ) is more flexible than location-scale models. It allows covariates to affect different quantiles in completely different ways—even with opposite signs. The data determine the pattern.
Interpreting quantile regression coefficients requires care—they mean something different than OLS coefficients.
OLS Coefficient Interpretation:
$$\mathbb{E}[Y \mid X = x] = x^\top \beta$$
$\beta_j$: The expected change in $Y$ for a one-unit increase in $x_j$, holding other covariates constant.
Quantile Regression Coefficient Interpretation:
$$Q_\tau(Y \mid X = x) = x^\top \beta(\tau)$$
$\beta_j(\tau)$: The change in the $\tau$-th conditional quantile of $Y$ for a one-unit increase in $x_j$, holding other covariates constant.
Example: Education and Wages
Suppose we model log wages with years of education as a predictor:
| Quantile τ | β(τ) for Education |
|---|---|
| 0.10 | 0.08 |
| 0.50 | 0.12 |
| 0.90 | 0.18 |
Interpretation:
Conclusion: Education has larger returns for high earners. The same variable has heterogeneous effects across the wage distribution.
β(0.9) does NOT describe the effect for 'high-y individuals.' It describes the effect on the 90th percentile of the conditional distribution at each x. The same individual could be at different conditional quantiles depending on their covariate values.
One of the most powerful applications of quantile regression is detecting and characterizing heteroscedasticity—changing variance across covariate values.
The OLS Problem:
OLS assumes constant variance (homoscedasticity): $$\text{Var}(Y \mid X = x) = \sigma^2$$
When violated, OLS remains consistent but:
How Quantile Regression Reveals Heteroscedasticity:
If the conditional distribution's spread changes with $x$, the spacing between quantiles will change:
$$\text{IQR}(x) = Q_{0.75}(Y \mid X=x) - Q_{0.25}(Y \mid X=x)$$
If IQR varies with $x$, heteroscedasticity is present.
Geometric Interpretation:
In a quantile regression plot:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import QuantileRegressor np.random.seed(42)n = 500 # Scenario 1: Homoscedastic dataX_homo = np.random.uniform(0, 10, n)y_homo = 2 * X_homo + 5 + np.random.normal(0, 2, n) # Scenario 2: Heteroscedastic data (variance increases with X)X_hetero = np.random.uniform(0, 10, n)y_hetero = 2 * X_hetero + 5 + np.random.normal(0, 0.5 + 0.5 * X_hetero, n) def fit_and_plot_quantiles(X, y, title, ax): """Fit quantile regressions and plot.""" X = X.reshape(-1, 1) quantiles = [0.1, 0.25, 0.5, 0.75, 0.9] ax.scatter(X, y, alpha=0.3, s=20, c='gray') X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1) colors = plt.cm.RdYlBu(np.linspace(0.1, 0.9, len(quantiles))) for tau, color in zip(quantiles, colors): model = QuantileRegressor(quantile=tau, alpha=0, solver='highs') model.fit(X, y) y_pred = model.predict(X_plot) ax.plot(X_plot, y_pred, color=color, linewidth=2, label=f'τ = {tau}') ax.set_xlabel('X') ax.set_ylabel('y') ax.set_title(title) ax.legend(loc='upper left') ax.grid(True, alpha=0.3) fig, axes = plt.subplots(1, 2, figsize=(14, 5)) fit_and_plot_quantiles(X_homo, y_homo, 'Homoscedastic: Parallel Lines', axes[0])fit_and_plot_quantiles(X_hetero, y_hetero, 'Heteroscedastic: Fan Pattern', axes[1]) plt.tight_layout()plt.show() # Quantitative test: Compare IQR at different X valuesprint("Interquartile Range (IQR) Analysis:")print("=" * 50) for scenario, X, y in [("Homoscedastic", X_homo, y_homo), ("Heteroscedastic", X_hetero, y_hetero)]: X = X.reshape(-1, 1) # Fit Q25 and Q75 model_25 = QuantileRegressor(quantile=0.25, alpha=0, solver='highs').fit(X, y) model_75 = QuantileRegressor(quantile=0.75, alpha=0, solver='highs').fit(X, y) # IQR at X=2 vs X=8 iqr_low = model_75.predict([[2]]) - model_25.predict([[2]]) iqr_high = model_75.predict([[8]]) - model_25.predict([[8]]) print(f"\n{scenario}:") print(f" IQR at X=2: {iqr_low[0]:.2f}") print(f" IQR at X=8: {iqr_high[0]:.2f}") print(f" Ratio: {iqr_high[0]/iqr_low[0]:.2f}x")Plotting quantile regression lines is often more informative than formal heteroscedasticity tests (Breusch-Pagan, White). The visual pattern immediately reveals not just whether variance changes, but how it changes and whether the effect is symmetric.
Quantile regression excels at capturing complex distributional effects—situations where covariates don't simply shift or scale the distribution but reshape it entirely.
Types of Distributional Effects:
1. Location Shift Only:
2. Location-Scale Change:
3. Skewness Change:
4. Complex Reshaping:
| Pattern in β(τ) | Interpretation | Example |
|---|---|---|
| Constant across τ | Pure location shift | Fixed dollar raise for all employees |
| Increases with τ | Positive compression (high quantiles affected more) | Percent-based raise; training with diminishing returns for low performers |
| Decreases with τ | Negative compression (low quantiles affected more) | Minimum wage increase; safety net policies |
| Positive for low τ, negative for high τ | Compression toward center | Regulations reducing inequality |
| Negative for low τ, positive for high τ | Expansion from center | Deregulation increasing variance |
| Non-monotonic (U-shape, etc.) | Complex distributional changes | Technology affecting middle-skilled workers negatively |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import QuantileRegressor np.random.seed(42)n = 1000 # Create treatment indicator and baselinetreatment = np.random.binomial(1, 0.5, n)baseline_ability = np.random.normal(0, 1, n) # Treatment has HETEROGENEOUS effects:# - Helps low-baseline individuals more # - Has minimal effect on high-baseline individuals# This is a "compression" effect treatment_effect = np.where(treatment == 1, 3 - 1.5 * baseline_ability, # Larger effect for low baseline 0) y = 5 + 2 * baseline_ability + treatment_effect + np.random.normal(0, 1, n) # Fit quantile regressions for treatment effectX = np.column_stack([np.ones(n), treatment])quantiles = [0.1, 0.25, 0.5, 0.75, 0.9] treatment_effects = []for tau in quantiles: model = QuantileRegressor(quantile=tau, alpha=0, solver='highs') model.fit(X, y) treatment_effects.append(model.coef_[1]) # Treatment coefficient # Visualizationfig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Plot 1: Treatment effect by quantileax1 = axes[0]ax1.plot(quantiles, treatment_effects, 'o-', linewidth=2, markersize=10)ax1.axhline(y=np.mean(treatment_effects), color='red', linestyle='--', label=f'OLS estimate (approx.)')ax1.set_xlabel('Quantile τ', fontsize=12)ax1.set_ylabel('Treatment Effect β(τ)', fontsize=12)ax1.set_title('Heterogeneous Treatment Effect Across Quantiles', fontsize=14)ax1.legend()ax1.grid(True, alpha=0.3) # Add interpretation annotationsax1.annotate('Larger effect\nfor lower quantiles\n(struggling students)', xy=(0.1, treatment_effects[0]), xytext=(0.25, treatment_effects[0] + 0.5), fontsize=10, arrowprops=dict(arrowstyle='->', color='gray')) # Plot 2: Distribution comparisonax2 = axes[1]control = y[treatment == 0]treated = y[treatment == 1] ax2.hist(control, bins=40, alpha=0.5, density=True, label='Control', color='blue')ax2.hist(treated, bins=40, alpha=0.5, density=True, label='Treatment', color='green')ax2.axvline(np.median(control), color='blue', linestyle='--', linewidth=2)ax2.axvline(np.median(treated), color='green', linestyle='--', linewidth=2)ax2.set_xlabel('Outcome y', fontsize=12)ax2.set_ylabel('Density', fontsize=12)ax2.set_title('Outcome Distributions by Treatment Status', fontsize=14)ax2.legend()ax2.grid(True, alpha=0.3) plt.tight_layout()plt.show() print("Treatment Effect by Quantile:")print("-" * 40)for tau, effect in zip(quantiles, treatment_effects): print(f"τ = {tau}: β = {effect:.3f}")Understanding distributional effects is crucial for policy analysis. A job training program with positive average effect might actually hurt high-ability workers while dramatically helping low-ability workers. Quantile regression reveals this heterogeneity invisible to OLS.
The quantile coefficient plot (also called the "quantile process plot") is the definitive visualization for quantile regression results.
Construction:
Interpretation Guide:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import QuantileRegressor, LinearRegressionfrom scipy import stats def quantile_coefficient_plot(X, y, covariate_idx, covariate_name, quantiles=None, n_bootstrap=100, ax=None): """ Create a quantile coefficient plot with confidence bands. Parameters: ----------- X : np.ndarray Feature matrix (n_samples, n_features) y : np.ndarray Target variable covariate_idx : int Index of the covariate to plot covariate_name : str Name for plot label quantiles : list, optional Quantile levels to estimate n_bootstrap : int Number of bootstrap samples for confidence bands ax : matplotlib.axes, optional Axes to plot on """ if quantiles is None: quantiles = np.arange(0.05, 0.96, 0.05) n = len(y) # Fit OLS for reference ols = LinearRegression().fit(X, y) ols_coef = ols.coef_[covariate_idx] # Fit quantile regressions qr_coefs = [] for tau in quantiles: model = QuantileRegressor(quantile=tau, alpha=0, solver='highs') model.fit(X, y) qr_coefs.append(model.coef_[covariate_idx]) qr_coefs = np.array(qr_coefs) # Bootstrap for confidence bands boot_coefs = np.zeros((n_bootstrap, len(quantiles))) for b in range(n_bootstrap): # Resample idx = np.random.choice(n, size=n, replace=True) X_boot, y_boot = X[idx], y[idx] for i, tau in enumerate(quantiles): try: model = QuantileRegressor(quantile=tau, alpha=0, solver='highs') model.fit(X_boot, y_boot) boot_coefs[b, i] = model.coef_[covariate_idx] except: boot_coefs[b, i] = np.nan # Compute confidence bands (2.5th and 97.5th percentiles) lower = np.nanpercentile(boot_coefs, 2.5, axis=0) upper = np.nanpercentile(boot_coefs, 97.5, axis=0) # Plot if ax is None: fig, ax = plt.subplots(figsize=(10, 6)) ax.fill_between(quantiles, lower, upper, alpha=0.3, color='blue', label='95% Confidence Band') ax.plot(quantiles, qr_coefs, 'b-', linewidth=2, label='Quantile Estimates') ax.axhline(y=ols_coef, color='red', linestyle='--', linewidth=2, label=f'OLS Estimate: {ols_coef:.3f}') ax.axhline(y=0, color='gray', linestyle=':', linewidth=1) ax.set_xlabel('Quantile τ', fontsize=12) ax.set_ylabel(f'β(τ) for {covariate_name}', fontsize=12) ax.set_title(f'Quantile Coefficients: {covariate_name}', fontsize=14) ax.legend(loc='best') ax.grid(True, alpha=0.3) return ax, qr_coefs, (lower, upper) # Example: Simulated heterogeneous effectnp.random.seed(42)n = 800 # X1: Constant effect across quantiles# X2: Effect increases with quantile (heterogeneous)X1 = np.random.normal(0, 1, n)X2 = np.random.normal(0, 1, n) # Generate y with heterogeneous effect for X2epsilon = np.random.standard_t(df=5, size=n) # Heavy-tailed errorsy = 3 + 2 * X1 + (1 + 0.5 * epsilon) * X2 + epsilon X = np.column_stack([X1, X2]) # Create plotsfig, axes = plt.subplots(1, 2, figsize=(14, 5)) quantile_coefficient_plot(X, y, 0, 'X₁ (Homogeneous Effect)', ax=axes[0])quantile_coefficient_plot(X, y, 1, 'X₂ (Heterogeneous Effect)', ax=axes[1]) plt.tight_layout()plt.show() print("X₁ has roughly constant effect across quantiles (homogeneous)")print("X₂ shows increasing effect at higher quantiles (heterogeneous)")Quantile coefficient plots with confidence bands are the standard way to present quantile regression results in academic papers. They efficiently communicate effect heterogeneity, statistical significance, and comparison to OLS in a single elegant display.
In causal inference, Quantile Treatment Effects (QTE) provide richer information than the Average Treatment Effect (ATE).
Definition (Quantile Treatment Effect):
For treatment $D \in {0, 1}$ and outcome $Y$, the QTE at quantile $\tau$ is:
$$\text{QTE}(\tau) = Q_\tau(Y^1) - Q_\tau(Y^0)$$
where $Y^1$ and $Y^0$ are potential outcomes under treatment and control.
Interpretation:
QTE($\tau$) measures how the treatment shifts the $\tau$-th quantile of the outcome distribution.
Important Caveat:
The QTE compares the $\tau$-th quantiles of two distributions—not the same individuals. The person at the 90th percentile of $Y^1$ may not be the same as the person at the 90th percentile of $Y^0$.
When QTE Equals Individual Treatment Effect:
For QTE to equal the individual treatment effect for someone at quantile $\tau$, we need rank invariance—the assumption that individuals maintain their rank in the distribution regardless of treatment. This is strong but sometimes plausible.
| Concept | Definition | Interpretation |
|---|---|---|
| ATE | E[Y¹ - Y⁰] | Average effect across entire population |
| QTE(τ) | Q_τ(Y¹) - Q_τ(Y⁰) | Effect on τ-th quantile of distributions |
| ATT | E[Y¹ - Y⁰ | D=1] | Average effect on the treated |
| CATE | E[Y¹ - Y⁰ | X=x] | Conditional average effect given covariates |
Why QTE Matters:
Example: Minimum Wage Increase
| Effect | Interpretation |
|---|---|
| ATE = $0.50/hr | Average wage increases modestly |
| QTE(0.10) = $1.50/hr | Large gains at the 10th percentile (low earners) |
| QTE(0.50) = $0.30/hr | Small median effect |
| QTE(0.90) = $0.00/hr | No effect on high earners |
The ATE misses the distributional story: minimum wage disproportionately helps those at the bottom.
Estimation:
With random assignment and the linearity assumption: $$Q_\tau(Y \mid D) = \alpha(\tau) + \text{QTE}(\tau) \cdot D$$
Fitting quantile regression of $Y$ on treatment indicator $D$ gives QTE($\tau$) directly.
Without rank invariance, QTE(τ) tells you about distributional shifts but not necessarily about individual treatment effects. The person at the 90th percentile when treated may have been at the 70th when untreated if treatment reshuffles ranks.
Real applications involve multiple covariates, each potentially having quantile-specific effects.
The Full Model:
$$Q_\tau(Y \mid X) = \beta_0(\tau) + \beta_1(\tau) X_1 + \cdots + \beta_p(\tau) X_p$$
Interpretation Challenges:
Dimensionality: With $K$ quantiles and $p$ covariates, you have $K \times p$ coefficients to interpret
Ceteris paribus: $\beta_j(\tau)$ is the effect of $X_j$ on $Q_\tau(Y|X)$ holding other $X$s constant
Multicollinearity: Affects quantile regression just like OLS—correlated features have unstable coefficients
Strategies for Summarization:
Testing for Quantile Coefficient Equality:
To test whether an effect is truly heterogeneous across quantiles, we can use:
Wald Test: $$H_0: \beta_j(\tau_1) = \beta_j(\tau_2) \quad \text{vs.} \quad H_1: \beta_j(\tau_1) \neq \beta_j(\tau_2)$$
Procedure:
Joint Test (Khmaladze Transform):
More sophisticated tests consider the entire quantile process ${\beta_j(\tau) : \tau \in (0,1)}$ simultaneously, controlling for the multiple comparison problem.
When examining many covariates across many quantiles, some differences will appear significant by chance. Use appropriate corrections (Bonferroni, FDR) or joint testing procedures to avoid false discoveries.
We have explored how conditional quantiles reveal the full distributional relationship between covariates and outcomes.
What's Next:
In the next page, we'll turn to prediction intervals—using quantile regression to construct intervals that capture uncertainty in predictions. We'll see how pairs of quantile predictions (e.g., τ = 0.1 and τ = 0.9) form prediction intervals with controlled coverage.
You now understand how conditional quantiles reveal the full distributional relationship between features and outcomes. This goes far beyond mean regression, exposing heterogeneity, heteroscedasticity, and complex distributional effects. Next, we'll apply this understanding to construct reliable prediction intervals.