Quantile Regression - Learning Module

Loading content...

0/278

Prediction Intervals

Quantifying Uncertainty in Predictions

A point prediction—no matter how accurate—is fundamentally incomplete. When a weather app says "Tomorrow's high: 72°F," you naturally wonder: How sure are they? Could it be 65°F or 80°F? A prediction without uncertainty quantification is like a balance sheet without error bars.

Prediction intervals address this need by providing a range that is expected to contain the true outcome with a specified probability. Traditional approaches assume parametric error distributions (typically Gaussian), but quantile regression offers a distribution-free alternative that adapts to heteroscedastic, asymmetric, and heavy-tailed data.

The Appeal of Quantile-Based Prediction Intervals:

No distributional assumptions: Works regardless of error distribution shape
Adapts to heteroscedasticity: Interval width naturally varies with covariates
Captures asymmetry: Intervals need not be symmetric around the point prediction
Valid finite-sample coverage: With proper techniques (conformal inference), guarantees hold exactly

What You Will Learn

By the end of this page, you will understand how to construct prediction intervals using quantile regression, evaluate their calibration, compare to Gaussian-based intervals, and leverage conformal prediction for finite-sample guarantees.

Point Predictions Versus Prediction Intervals

Let's clarify the distinction between point predictions, confidence intervals, and prediction intervals.

Point Prediction:

A single best guess, typically $\hat{y}(x) = \hat{\mathbb{E}}[Y \mid X = x]$ or $\hat{y}(x) = \hat{Q}_{0.5}(Y \mid X = x)$.

Confidence Interval for the Mean:

An interval $[L(x), U(x)]$ such that: $$P(\mathbb{E}[Y \mid X = x] \in [L(x), U(x)]) \geq 1 - \alpha$$

This covers the true conditional mean, which is a fixed (but unknown) quantity.

Prediction Interval:

An interval $[L(x), U(x)]$ such that: $$P(Y_{new} \in [L(x), U(x)] \mid X_{new} = x) \geq 1 - \alpha$$

This covers a future observation $Y_{new}$—a random variable.

Comparison of Interval Types
Aspect	Confidence Interval	Prediction Interval
Target	True conditional mean E[Y\|X=x]	Future observation Y_new
Width	Decreases with n → ∞	Converges to nonzero limit (irreducible noise)
Uncertainty source	Estimation uncertainty only	Estimation + inherent noise
Typical width	Narrow (with enough data)	Wider (includes noise variance)
OLS formula	ŷ ± t_{α/2} · SE(ŷ)	ŷ ± t_{α/2} · √(σ̂² + SE²(ŷ))

Key Insight:

Even with infinite data, prediction intervals remain wide because they must account for irreducible noise. Confidence intervals shrink toward zero width as sample size increases, but prediction intervals converge to a band reflecting the fundamental variability in $Y \mid X$.

Why Standard Prediction Intervals Fail:

Classic OLS-based prediction intervals assume: $$Y \mid X = x \sim N(x^\top \beta, \sigma^2)$$

This fails when:

Errors are heteroscedastic (variance depends on $x$)
Errors are skewed (not symmetric around mean)
Errors have heavy tails (extreme observations)

Quantile regression intervals make none of these assumptions.

Common Confusion

Many practitioners confuse confidence intervals with prediction intervals and report intervals that are far too narrow. A 95% prediction interval for individual outcomes will always be wider than a 95% confidence interval for the mean.

Constructing Prediction Intervals with Quantile Regression

The fundamental principle is elegant: a $(1-\alpha)$ prediction interval is bounded by the $\alpha/2$ and $(1-\alpha/2)$ conditional quantiles.

Construction:

For a $(1-\alpha) \times 100%$ prediction interval:

Estimate $\hat{Q}_{\alpha/2}(Y \mid X = x)$ using quantile regression at level $\tau = \alpha/2$
Estimate $\hat{Q}_{1-\alpha/2}(Y \mid X = x)$ using quantile regression at level $\tau = 1 - \alpha/2$
The prediction interval is: $$\text{PI}{1-\alpha}(x) = [\hat{Q}{\alpha/2}(x), \hat{Q}_{1-\alpha/2}(x)]$$

Example: 90% Prediction Interval

Lower bound: $\hat{Q}_{0.05}(Y \mid X = x)$
Upper bound: $\hat{Q}_{0.95}(Y \mid X = x)$

Interpretation: If the quantile models are correctly specified, 90% of future observations at covariate value $x$ should fall within this interval.

Theoretical Justification:

Under correct model specification:

$$P(Y \in [Q_{\alpha/2}(Y|X), Q_{1-\alpha/2}(Y|X)] \mid X = x) = 1 - \alpha$$

This follows directly from the definition of quantiles:

$$P(Y \leq Q_\tau(Y|X)) = \tau$$

So: $$P(Q_{\alpha/2}(Y|X) < Y \leq Q_{1-\alpha/2}(Y|X)) = (1-\alpha/2) - \alpha/2 = 1 - \alpha$$

Key Advantages:

Asymmetric intervals: The lower and upper bounds are fit independently, naturally adapting to skewed distributions
Heteroscedastic coverage: If variance increases with $x$, the quantile predictions automatically diverge, widening the interval
No normality requirement: Works for any continuous distribution

quantile_prediction_intervals.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import QuantileRegressor
from sklearn.model_selection import train_test_split
 
np.random.seed(42)
 
# Generate heteroscedastic data with skewed errors
n = 1000
X = np.random.uniform(0, 10, n).reshape(-1, 1)
 
# Skewed, heteroscedastic errors (exponential-like)
# Variance and skewness increase with X
errors = np.random.exponential(scale=1 + 0.3 * X.ravel()) - (1 + 0.3 * X.ravel())
y = 2 * X.ravel() + 5 + errors
 
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# Fit quantile regression models for 90% prediction interval
alpha = 0.10  # For 90% interval
 
model_lower = QuantileRegressor(quantile=alpha/2, alpha=0, solver='highs')
model_median = QuantileRegressor(quantile=0.5, alpha=0, solver='highs')
model_upper = QuantileRegressor(quantile=1-alpha/2, alpha=0, solver='highs')
 
model_lower.fit(X_train, y_train)
model_median.fit(X_train, y_train)
model_upper.fit(X_train, y_train)
 
# Predictions on test set
lower = model_lower.predict(X_test)
median = model_median.predict(X_test)
upper = model_upper.predict(X_test)
 
# Compute coverage
coverage = np.mean((y_test >= lower) & (y_test <= upper))
avg_width = np.mean(upper - lower)
 
print(f"90% Prediction Interval Evaluation:")
print(f"  Empirical Coverage: {coverage:.1%}")
print(f"  Average Width: {avg_width:.2f}")
print(f"  Expected Coverage: 90%")
 
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Plot 1: Data and prediction intervals
ax1 = axes[0]
sort_idx = np.argsort(X_test.ravel())
X_sorted = X_test.ravel()[sort_idx]
 
ax1.scatter(X_test, y_test, alpha=0.5, s=20, c='gray', label='Test Data')
ax1.plot(X_sorted, lower[sort_idx], 'b--', linewidth=2, label='5th percentile')
ax1.plot(X_sorted, median[sort_idx], 'r-', linewidth=2, label='Median')
ax1.plot(X_sorted, upper[sort_idx], 'b--', linewidth=2, label='95th percentile')
ax1.fill_between(X_sorted, lower[sort_idx], upper[sort_idx], 
                  alpha=0.2, color='blue', label='90% PI')
 
ax1.set_xlabel('X', fontsize=12)
ax1.set_ylabel('y', fontsize=12)
ax1.set_title(f'Quantile Regression Prediction Intervals
(Coverage: {coverage:.1%})', fontsize=14)
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)
 
# Plot 2: Interval width as function of X
ax2 = axes[1]
widths = upper - lower
ax2.scatter(X_test, widths, alpha=0.5, s=30)
ax2.set_xlabel('X', fontsize=12)
ax2.set_ylabel('Interval Width', fontsize=12)
ax2.set_title('Prediction Interval Width vs. Covariate
(Captures Heteroscedasticity)', fontsize=14)
ax2.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.show()

Automatic Adaptation

Notice how the interval width increases with X—exactly matching the increasing variance in the data. This happens automatically without explicitly modeling variance. Traditional normal-theory intervals would have constant width, providing incorrect coverage.

Comparison with Gaussian-Based Intervals

Let's formally compare quantile-based intervals to traditional Gaussian assumptions.

Standard (Gaussian) Prediction Interval:

For OLS with homoscedastic Gaussian errors: $$\text{PI}{1-\alpha}(x) = \hat{y}(x) \pm t{n-p, 1-\alpha/2} \cdot \hat{\sigma} \sqrt{1 + x^\top(X^\top X)^{-1}x}$$

where:

$\hat{y}(x) = x^\top \hat{\beta}$ is the point prediction
$\hat{\sigma}$ is the residual standard error
$t_{n-p, 1-\alpha/2}$ is the t-distribution critical value
The $\sqrt{1 + x^\top(X^\top X)^{-1}x}$ term accounts for estimation uncertainty

Assumptions Embedded in This Formula:

Errors are normally distributed
Errors have constant variance (homoscedasticity)
Errors are symmetric around zero
The conditional distribution is unimodal

Gaussian Interval Limitations

•Symmetric by construction—misses skewed distributions
•Constant width—fails under heteroscedasticity
•Sensitive to outliers (affects σ̂ estimate)
•Assumes errors are exactly Gaussian
•Requires estimating/assuming error variance

Quantile Interval Advantages

•Asymmetric—adapts to skewed distributions
•Variable width—handles heteroscedasticity
•Robust to outliers (bounded influence)
•Distribution-free—works for any shape
•No variance estimation needed

gaussian_vs_quantile_intervals.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, QuantileRegressor
from scipy import stats
 
np.random.seed(42)
n = 500
 
# Generate heavily skewed, heteroscedastic data
X = np.random.uniform(0, 10, n).reshape(-1, 1)
 
# Skewed Chi-squared errors with increasing variance
df_chi = 3
errors = (np.random.chisquare(df_chi, n) - df_chi) * (0.5 + 0.2 * X.ravel())
y = 3 * X.ravel() + 10 + errors
 
# Method 1: Gaussian-based prediction interval
ols = LinearRegression().fit(X, y)
y_pred_ols = ols.predict(X)
residuals = y - y_pred_ols
sigma_hat = np.std(residuals, ddof=2)
 
# For new predictions (ignoring estimation uncertainty for simplicity)
X_grid = np.linspace(0, 10, 100).reshape(-1, 1)
y_pred_grid = ols.predict(X_grid)
gaussian_lower = y_pred_grid - 1.645 * sigma_hat  # 90% interval
gaussian_upper = y_pred_grid + 1.645 * sigma_hat
 
# Method 2: Quantile regression prediction interval
qr_lower = QuantileRegressor(quantile=0.05, alpha=0, solver='highs').fit(X, y)
qr_upper = QuantileRegressor(quantile=0.95, alpha=0, solver='highs').fit(X, y)
qr_lower_pred = qr_lower.predict(X_grid)
qr_upper_pred = qr_upper.predict(X_grid)
 
# Evaluate coverage
qr_coverage = np.mean((y >= qr_lower.predict(X)) & (y <= qr_upper.predict(X)))
gaussian_coverage = np.mean((y >= y_pred_ols - 1.645 * sigma_hat) & 
                            (y <= y_pred_ols + 1.645 * sigma_hat))
 
print("90% Prediction Interval Coverage Comparison:")
print(f"  Quantile Regression: {qr_coverage:.1%}")
print(f"  Gaussian-based:      {gaussian_coverage:.1%}")
print(f"  Target:              90%")
 
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
ax1 = axes[0]
ax1.scatter(X, y, alpha=0.3, s=20, c='gray')
ax1.plot(X_grid, y_pred_grid, 'k-', linewidth=2, label='OLS Mean')
ax1.fill_between(X_grid.ravel(), gaussian_lower, gaussian_upper, 
                  alpha=0.3, color='red', label=f'Gaussian 90% PI (Cov: {gaussian_coverage:.0%})')
ax1.set_xlabel('X')
ax1.set_ylabel('y')
ax1.set_title('Gaussian Prediction Interval')
ax1.legend()
ax1.grid(True, alpha=0.3)
 
ax2 = axes[1]
ax2.scatter(X, y, alpha=0.3, s=20, c='gray')
qr_median = QuantileRegressor(quantile=0.5, alpha=0, solver='highs').fit(X, y)
ax2.plot(X_grid, qr_median.predict(X_grid), 'k-', linewidth=2, label='QR Median')
ax2.fill_between(X_grid.ravel(), qr_lower_pred, qr_upper_pred, 
                  alpha=0.3, color='blue', label=f'Quantile 90% PI (Cov: {qr_coverage:.0%})')
ax2.set_xlabel('X')
ax2.set_ylabel('y')
ax2.set_title('Quantile Regression Prediction Interval')
ax2.legend()
ax2.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.show()

When Gaussian Intervals Work

Gaussian intervals perform well when data are truly normal and homoscedastic. For large samples from approximately normal populations, the difference may be negligible. The quantile approach shines in challenging cases: skewed data, heavy tails, or heteroscedasticity.

Calibration and Coverage Assessment

A prediction interval is only as good as its calibration—the match between stated and actual coverage.

Definition (Coverage):

The empirical coverage of a $(1-\alpha)$ prediction interval on a test set is:

$$\hat{C} = \frac{1}{n_{test}} \sum_{i=1}^{n_{test}} \mathbb{1}{Y_i \in [L(X_i), U(X_i)]}$$

A well-calibrated interval has $\hat{C} \approx 1 - \alpha$.

Coverage Plots:

To assess calibration across multiple coverage levels:

For various $\alpha$ values (e.g., 0.1, 0.2, ..., 0.9), compute prediction intervals
Measure empirical coverage for each
Plot empirical coverage vs. nominal coverage
Perfect calibration: points on the 45° line

Miscoverage Patterns:

Pattern	Diagnosis	Cause
Empirical < Nominal	Under-coverage	Intervals too narrow; model overconfident
Empirical > Nominal	Over-coverage	Intervals too wide; model underconfident
Miscoverage at tails only	Tail misspecification	Model fails at extreme quantiles
Miscoverage depends on X	Conditional miscalibration	Model wrong for some covariate values

Conditional Coverage:

Strong calibration requires covering correctly conditional on $X$: $$P(Y \in [L(X), U(X)] \mid X = x) = 1 - \alpha \quad \forall x$$

Marginal coverage (average over $X$) is weaker but easier to achieve.

calibration_assessment.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import QuantileRegressor
from sklearn.model_selection import train_test_split
 
def calibration_plot(X_train, y_train, X_test, y_test):
    """
    Generate calibration plot for quantile regression prediction intervals.
    """
    # Nominal coverage levels to evaluate
    nominal_levels = np.arange(0.1, 1.0, 0.1)  # 10%, 20%, ..., 90%
    
    empirical_coverage = []
    
    for level in nominal_levels:
        alpha = 1 - level
        tau_lower = alpha / 2
        tau_upper = 1 - alpha / 2
        
        # Fit quantile models
        model_lower = QuantileRegressor(quantile=tau_lower, alpha=0, solver='highs')
        model_upper = QuantileRegressor(quantile=tau_upper, alpha=0, solver='highs')
        
        model_lower.fit(X_train, y_train)
        model_upper.fit(X_train, y_train)
        
        # Predict on test set
        lower = model_lower.predict(X_test)
        upper = model_upper.predict(X_test)
        
        # Compute coverage
        covered = (y_test >= lower) & (y_test <= upper)
        empirical_coverage.append(np.mean(covered))
    
    # Plot
    fig, ax = plt.subplots(figsize=(8, 8))
    
    ax.plot([0, 1], [0, 1], 'k--', linewidth=2, label='Perfect Calibration')
    ax.plot(nominal_levels, empirical_coverage, 'bo-', markersize=10, 
            linewidth=2, label='Quantile Regression')
    
    # Shaded region for acceptable deviation
    ax.fill_between([0, 1], [0 - 0.05, 1 - 0.05], [0 + 0.05, 1 + 0.05], 
                     alpha=0.2, color='gray', label='±5% Tolerance')
    
    ax.set_xlabel('Nominal Coverage Level', fontsize=12)
    ax.set_ylabel('Empirical Coverage', fontsize=12)
    ax.set_title('Prediction Interval Calibration Plot', fontsize=14)
    ax.legend(loc='lower right')
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.grid(True, alpha=0.3)
    ax.set_aspect('equal')
    
    return empirical_coverage
 
# Generate data
np.random.seed(42)
n = 2000
X = np.random.uniform(0, 10, n).reshape(-1, 1)
y = 2 * X.ravel() + np.random.normal(0, 1 + 0.3 * X.ravel(), n)
 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
 
# Generate calibration plot
coverage = calibration_plot(X_train, y_train, X_test, y_test)
 
print("
Calibration Summary:")
print("-" * 40)
for level, emp in zip(np.arange(0.1, 1.0, 0.1), coverage):
    print(f"  Nominal {level:.0%}: Empirical {emp:.1%}")

Model Misspecification Effects

If the linear quantile model is misspecified (the true conditional quantile function is nonlinear), calibration will suffer. Use flexible models (splines, gradient boosting, neural networks) when relationships are complex.

Conformal Prediction: Finite-Sample Guarantees

Standard quantile regression provides asymptotically valid intervals—coverage is correct as $n \to \infty$. But in finite samples, coverage may deviate from the nominal level due to estimation error.

Conformal prediction provides a framework for constructing prediction intervals with exact finite-sample coverage guarantees under minimal assumptions.

The Conformal Idea:

Instead of directly predicting quantiles, conformal prediction:

Defines a nonconformity score measuring how "strange" an observation is
Uses held-out calibration data to find the quantile of this score
Constructs intervals by inverting the score function

Split Conformal Prediction:

Split data into training and calibration sets
Train model on training set
Compute residuals $R_i = |Y_i - \hat{Y}_i|$ on calibration set
Find $\hat{q}$: the $(1-\alpha)(1 + 1/n_{cal})$-th quantile of residuals
Prediction interval: $[\hat{Y}(x) - \hat{q}, \hat{Y}(x) + \hat{q}]$

Theorem (Finite-Sample Validity):

If $(X_i, Y_i)$ are exchangeable, the conformal interval has: $$P(Y_{new} \in \text{PI}(X_{new})) \geq 1 - \alpha$$

This holds exactly, not just asymptotically!

Conformalized Quantile Regression (CQR):

CQR combines the adaptivity of quantile regression with conformal guarantees:

Train quantile regressors $\hat{Q}{\alpha/2}$ and $\hat{Q}{1-\alpha/2}$ on training data
On calibration data, compute conformity scores: $$E_i = \max{\hat{Q}{\alpha/2}(X_i) - Y_i, , Y_i - \hat{Q}{1-\alpha/2}(X_i)}$$
Find $\hat{q}$: the $(1-\alpha)(1 + 1/n_{cal})$-th quantile of scores
Prediction interval: $$[\hat{Q}{\alpha/2}(x) - \hat{q}, , \hat{Q}{1-\alpha/2}(x) + \hat{q}]$$

Advantages of CQR:

Inherits adaptivity from quantile regression (variable width)
Gains finite-sample guarantee from conformal framework
Works with any base quantile regression model
Handles heteroscedasticity naturally

conformalized_quantile_regression.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import numpy as np
from sklearn.linear_model import QuantileRegressor
 
def conformalized_quantile_regression(X_train, y_train, X_cal, y_cal, 
                                       X_test, alpha=0.1):
    """
    Conformalized Quantile Regression (CQR) for prediction intervals.
    
    Parameters:
    -----------
    X_train : array (n_train, p)
        Training features
    y_train : array (n_train,)
        Training targets
    X_cal : array (n_cal, p)
        Calibration features
    y_cal : array (n_cal,)
        Calibration targets
    X_test : array (n_test, p)
        Test features
    alpha : float
        Miscoverage level (1-alpha = coverage)
    
    Returns:
    --------
    lower, upper : arrays
        Prediction interval bounds
    """
    # Step 1: Fit quantile regressors on training data
    tau_lo = alpha / 2
    tau_hi = 1 - alpha / 2
    
    qr_lo = QuantileRegressor(quantile=tau_lo, alpha=0, solver='highs')
    qr_hi = QuantileRegressor(quantile=tau_hi, alpha=0, solver='highs')
    
    qr_lo.fit(X_train, y_train)
    qr_hi.fit(X_train, y_train)
    
    # Step 2: Compute conformity scores on calibration data
    q_lo_cal = qr_lo.predict(X_cal)
    q_hi_cal = qr_hi.predict(X_cal)
    
    scores = np.maximum(q_lo_cal - y_cal, y_cal - q_hi_cal)
    
    # Step 3: Find conformal quantile
    n_cal = len(y_cal)
    quantile_level = (1 - alpha) * (1 + 1/n_cal)
    q_hat = np.quantile(scores, min(quantile_level, 1.0))
    
    # Step 4: Construct conformalized intervals on test data
    q_lo_test = qr_lo.predict(X_test)
    q_hi_test = qr_hi.predict(X_test)
    
    lower = q_lo_test - q_hat
    upper = q_hi_test + q_hat
    
    return lower, upper, q_hat
 
 
# Example usage
np.random.seed(42)
n = 2000
 
# Generate heteroscedastic data
X = np.random.uniform(0, 10, n).reshape(-1, 1)
y = 2 * X.ravel() + np.random.normal(0, 0.5 + 0.3 * X.ravel(), n)
 
# Three-way split
n_train = int(0.5 * n)
n_cal = int(0.25 * n)
 
X_train, y_train = X[:n_train], y[:n_train]
X_cal, y_cal = X[n_train:n_train+n_cal], y[n_train:n_train+n_cal]
X_test, y_test = X[n_train+n_cal:], y[n_train+n_cal:]
 
# Apply CQR
alpha = 0.1
lower, upper, q_hat = conformalized_quantile_regression(
    X_train, y_train, X_cal, y_cal, X_test, alpha
)
 
# Evaluate
coverage = np.mean((y_test >= lower) & (y_test <= upper))
avg_width = np.mean(upper - lower)
 
print(f"Conformalized Quantile Regression Results:")
print(f"  Target Coverage: {1-alpha:.0%}")
print(f"  Empirical Coverage: {coverage:.1%}")
print(f"  Average Interval Width: {avg_width:.2f}")
print(f"  Conformal Correction q̂: {q_hat:.3f}")

The Best of Both Worlds

CQR provides: (1) adaptive interval widths from quantile regression, and (2) finite-sample coverage guarantees from conformal prediction. The conformal correction q̂ adjusts for any finite-sample miscalibration of the base quantile regressor.

Advanced Settings: Multi-Output and High Dimensions

Prediction intervals become more challenging in complex settings.

Multi-Output Prediction Regions:

When $Y \in \mathbb{R}^d$ is multivariate, we need prediction regions rather than intervals:

Ellipsoidal regions: Assume multivariate normal; generalize confidence ellipsoids
Marginal intervals: Separate intervals for each dimension (ignores correlation)
Copula-based: Model marginal quantiles and dependence separately
Conformal regions: Extend conformal prediction to multivariate scores

High-Dimensional Covariates:

With many features:

Regularization becomes essential (quantile Lasso, quantile ridge)
Variable selection: Which features affect which quantiles?
Computation increases: Each quantile requires separate fit

Cautions:

Advanced Challenges

•Curse of dimensionality: Extreme quantiles (τ near 0 or 1) need exponentially more data as dimension grows
•Model complexity: Flexible models may overfit at tail quantiles with few effective observations
•Computational cost: Deep quantile networks require training multiple output heads or separate networks per quantile
•Quantile crossing: In complex models, predicted quantiles may violate monotonicity—need architectural constraints
•Coverage vs. sharpness trade-off: Wide intervals achieve coverage but provide less useful predictions

Strategies for High Dimensions:

Regularized Quantile Regression: Add L1 or L2 penalty to quantile objective $$\min_\beta \sum_{i=1}^n \rho_\tau(y_i - x_i^\top \beta) + \lambda |\beta|_1$$
Gradient Boosting Quantile Regression: LightGBM and XGBoost support quantile loss
Deep Quantile Networks: Neural networks with pinball loss; can predict multiple quantiles simultaneously
Dimension Reduction: Apply PCA or other techniques before quantile regression

Practical Guidance

For high-dimensional problems, gradient boosting with quantile loss often outperforms linear quantile regression without requiring explicit regularization tuning—the ensemble structure provides implicit regularization.

Practical Considerations for Deployment

Deploying prediction intervals in production requires attention to several practical issues.

Production Checklist

•Choose coverage level based on use case: 90% intervals for general uncertainty; 95% or 99% for risk-sensitive applications
•Monitor calibration continuously: Coverage can drift as data distribution shifts; set up alerts for miscoverage
•Handle quantile crossing: Implement monotonicity constraints or post-hoc sorting to ensure lower < upper
•Consider computational budget: Training separate models per quantile is expensive; use multi-output architectures when possible
•Communicate uncertainty appropriately: Users may misunderstand intervals; provide clear explanations
•Set up backtesting: Regularly evaluate historical predictions against realized outcomes
•Handle missing covariates gracefully: Intervals should widen when information is partial

Interval Width as a Feature:

The width of prediction intervals can itself be informative:

Wide intervals → High uncertainty → Trigger human review or additional data collection
Narrow intervals → High confidence → Automate decisions

This enables adaptive decision-making systems that escalate uncertain cases.

Example: Loan Underwriting

Predicted Range	Interval Width	Action
[$50K, $55K]	$5K (narrow)	Auto-approve if in policy
[$40K, $70K]	$30K (wide)	Flag for manual review
[$30K, $100K]	$70K (very wide)	Request additional documentation

Uncertainty-Aware Systems

Use interval width to build intelligent systems that know when they don't know. This is crucial for safe deployment of ML models—confident predictions can be automated while uncertain predictions get human oversight.

Summary and Looking Ahead

We have explored how quantile regression enables construction of prediction intervals with principled uncertainty quantification.

Key Takeaways

•Prediction intervals exceed confidence intervals — They must account for irreducible noise, not just estimation uncertainty.
•Quantile regression provides distribution-free intervals — Bounds come from Q_{α/2} and Q_{1-α/2}, requiring no normality assumption.
•Automatic heteroscedasticity adaptation — Interval width naturally varies with covariates where variance changes.
•Calibration must be verified — Compare empirical to nominal coverage; miscoverage indicates model problems.
•Conformal prediction provides finite-sample guarantees — CQR combines adaptivity with exact coverage.
•Interval width enables uncertainty-aware decisions — Use narrow intervals for automation, wide intervals for human review.

What's Next:

In the final page of this module, we'll explore robust estimation with quantile regression—how it naturally resists outliers, the connection to median regression, and when robustness becomes critical for reliable inference.

Page Complete

You now understand how to construct, evaluate, and deploy prediction intervals using quantile regression. These intervals adapt to heteroscedasticity, make no distributional assumptions, and—with conformal calibration—provide finite-sample coverage guarantees. Next, we'll complete the module by examining the robustness properties of quantile regression.