Loading content...
We've developed a complete toolkit for nonparametric regression: local regression (LOESS), kernel smoothing, bandwidth selection, and understanding of the curse of dimensionality. But having tools isn't the same as knowing when to use them.
Every regression problem presents a choice: commit to a parametric form (and gain efficiency if correct), or stay nonparametric (and gain robustness to misspecification). This isn't a philosophical debate—it's a practical decision with real consequences for prediction accuracy, interpretability, computational cost, and scientific insight.
This final page synthesizes everything we've learned into actionable guidance. We'll develop decision frameworks, compare methods head-to-head on important criteria, and equip you to choose wisely for any regression problem you encounter.
By the end of this page, you will understand: (1) The fundamental tradeoffs between parametric and nonparametric approaches; (2) Diagnostic tools for choosing between approaches; (3) Hybrid strategies that combine both; (4) Application-specific recommendations; (5) A decision framework for practical use.
Parametric Models:
Assume a specific functional form: $y = f(x; \boldsymbol{\theta}) + \epsilon$
Nonparametric Models:
Make minimal assumptions about form: $y = f(x) + \epsilon$ where $f \in \mathcal{F}$ (some rich function class).
The Bias-Variance Perspective:
| Aspect | Parametric (underfit) | Parametric (correct) | Nonparametric |
|---|---|---|---|
| Bias | High (if misspecified) | Low | Low (with good bandwidth) |
| Variance | Low | Low | Higher |
| MSE | Poor | Excellent | Moderate |
The key insight: Parametric wins if the assumption is right; nonparametric wins if it's wrong.
This motivates the core decision framework: How confident are you in your parametric assumptions?
From a Bayesian viewpoint, choosing between parametric and nonparametric is about prior beliefs. A parametric model encodes strong prior belief about functional form. A nonparametric model encodes weak/diffuse prior. The 'right' choice depends on how much you genuinely know beforehand.
1. Sample Size:
2. Dimensionality:
3. Domain Knowledge:
| Factor | Favors Parametric | Favors Nonparametric |
|---|---|---|
| Sample size | Small ($n < 100$) | Large ($n > 1000$) |
| Dimensionality | High ($d > 5$) | Low ($d \leq 3$) |
| Domain knowledge | Strong theoretical basis | Exploratory / unknown |
| Interpretability need | High (explain coefficients) | Low (prediction focus) |
| Extrapolation need | Must predict outside training range | Interpolation only |
| Computational budget | Limited | Generous |
| Confidence in form | High confidence | Low confidence or checking |
| Goal | Inference on parameters | Visualization or prediction |
4. Task Objective:
5. Consequences of Error:
Using Nonparametric Fits to Check Parametric Models:
One of the most valuable uses of nonparametric regression is diagnostic: checking whether a parametric model is reasonable.
The Overlay Test:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats def compare_parametric_nonparametric( x: np.ndarray, y: np.ndarray, parametric_degree: int = 1, loess_span: float = 0.75): """ Compare parametric (polynomial) and nonparametric (LOESS) fits. Returns diagnostics to help choose. """ n = len(x) x_sorted = np.sort(x) # Parametric fit (polynomial) poly_coeffs = np.polyfit(x, y, parametric_degree) poly = np.poly1d(poly_coeffs) y_para = poly(x_sorted) y_para_at_x = poly(x) # LOESS fit (simplified implementation) def loess_smooth(x_train, y_train, x_eval, span): n_train = len(x_train) k = int(np.ceil(span * n_train)) y_smooth = np.zeros(len(x_eval)) for j, x0 in enumerate(x_eval): distances = np.abs(x_train - x0) sorted_idx = np.argsort(distances) neighbors = sorted_idx[:k] h = distances[sorted_idx[k-1]] if h < 1e-10: h = 1e-10 u = (x_train[neighbors] - x0) / h w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1) # Local linear fit X_loc = np.column_stack([np.ones(k), x_train[neighbors] - x0]) W = np.diag(w) try: beta = np.linalg.solve(X_loc.T @ W @ X_loc, X_loc.T @ W @ y_train[neighbors]) y_smooth[j] = beta[0] except: y_smooth[j] = np.average(y_train[neighbors], weights=w+1e-10) return y_smooth y_nonpara = loess_smooth(x, y, x_sorted, loess_span) y_nonpara_at_x = loess_smooth(x, y, x, loess_span) # Compute diagnostics residuals_para = y - y_para_at_x residuals_nonpara = y - y_nonpara_at_x mse_para = np.mean(residuals_para**2) mse_nonpara = np.mean(residuals_nonpara**2) # Discrepancy between models discrepancy = np.sqrt(np.mean((y_para - y_nonpara)**2)) # Runs test on parametric residuals (tests for systematic patterns) signs = residuals_para > 0 runs = 1 + np.sum(signs[1:] != signs[:-1]) n_pos = np.sum(signs) n_neg = n - n_pos expected_runs = 1 + 2*n_pos*n_neg / n return { 'mse_parametric': mse_para, 'mse_nonparametric': mse_nonpara, 'discrepancy': discrepancy, 'runs_test': { 'observed': runs, 'expected': expected_runs, 'indicates_misspec': runs < expected_runs * 0.7 }, 'recommendation': 'nonparametric' if mse_nonpara < mse_para * 0.8 else 'parametric adequate' } # =============================================================================# Case 1: Linear relationship (parametric should win)# =============================================================================np.random.seed(42)n = 100x1 = np.linspace(0, 10, n)y1 = 2 + 3*x1 + np.random.normal(0, 2, n) result1 = compare_parametric_nonparametric(x1, y1, parametric_degree=1, loess_span=0.5) print("Case 1: Linear Relationship")print("=" * 60)print(f"Parametric MSE: {result1['mse_parametric']:.4f}")print(f"Nonparametric MSE: {result1['mse_nonparametric']:.4f}")print(f"Model discrepancy: {result1['discrepancy']:.4f}")print(f"Recommendation: {result1['recommendation']}") # =============================================================================# Case 2: Nonlinear relationship (nonparametric should reveal this)# =============================================================================x2 = np.linspace(0, 10, n)y2 = 2 + 0.5*x2 + 3*np.sin(x2) + np.random.normal(0, 1, n) result2 = compare_parametric_nonparametric(x2, y2, parametric_degree=1, loess_span=0.3) print("Case 2: Nonlinear Relationship (sine + linear)")print("=" * 60)print(f"Parametric MSE: {result2['mse_parametric']:.4f}")print(f"Nonparametric MSE: {result2['mse_nonparametric']:.4f}")print(f"Model discrepancy: {result2['discrepancy']:.4f}")print(f"Recommendation: {result2['recommendation']}")Residual Analysis:
After fitting a parametric model, plot residuals against $x$ with a LOESS smooth:
Partial Residual Plots (for multiple regression): Plot $e_i + \hat{\beta}j x{ij}$ against $x_j$ with LOESS overlay. Reveals whether each predictor's effect is linear.
Combining Parametric and Nonparametric:
You don't always have to choose exclusively. Several powerful strategies combine both approaches.
1. Semiparametric Models:
$$y_i = \mathbf{x}_i^T \boldsymbol{\beta} + g(z_i) + \epsilon_i$$
Some predictors enter parametrically ($\mathbf{x}$), others nonparametrically ($z$). Best when you have theory for some relationships but not others.
2. Generalized Additive Models (GAMs):
$$y_i = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \ldots + \epsilon_i$$
Each $f_j$ is estimated nonparametrically, but the additive structure is parametric. Escapes the curse while allowing flexibility.
3. Two-Stage: Nonparametric Discovery → Parametric Fitting:
Workflow:
This uses nonparametric as exploratory tool to inform parametric confirmatory model.
4. Nonparametric Bootstrap for Parametric Inference:
Fit a parametric model, but assess uncertainty via nonparametric (residual) bootstrap. Robust to distributional assumptions while keeping parametric point estimates.
5. Ensemble: Blend Predictions
Final prediction = weighted average of parametric and nonparametric predictions. Weights chosen by cross-validation. Hedges against misspecification.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138
import numpy as npfrom typing import Tuple def semiparametric_regression( X_para: np.ndarray, x_nonpara: np.ndarray, y: np.ndarray, loess_span: float = 0.5, max_iterations: int = 20, tolerance: float = 1e-6) -> Tuple[np.ndarray, np.ndarray]: """ Semiparametric model: y = X_para @ beta + g(x_nonpara) + epsilon Fit via backfitting: 1. Fix g, estimate beta by OLS on y - g 2. Fix beta, estimate g by LOESS on y - X_para @ beta 3. Iterate until convergence """ n = len(y) # Initialize g = np.zeros(n) beta = np.zeros(X_para.shape[1]) def loess_1d(x, y_target, span): """Simple 1D LOESS.""" n = len(x) k = int(np.ceil(span * n)) y_smooth = np.zeros(n) order = np.argsort(x) for j in range(n): x0 = x[j] distances = np.abs(x - x0) sorted_idx = np.argsort(distances) neighbors = sorted_idx[:k] h = max(distances[sorted_idx[k-1]], 1e-10) u = (x[neighbors] - x0) / h w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1) if np.sum(w) > 1e-10: y_smooth[j] = np.average(y_target[neighbors], weights=w) else: y_smooth[j] = np.mean(y_target) return y_smooth - np.mean(y_smooth) # Center for iteration in range(max_iterations): # Step 1: Fix g, estimate beta y_adjusted = y - g beta_new, _, _, _ = np.linalg.lstsq(X_para, y_adjusted, rcond=None) # Step 2: Fix beta, estimate g y_adjusted = y - X_para @ beta_new g_new = loess_1d(x_nonpara, y_adjusted, loess_span) # Check convergence if np.max(np.abs(g_new - g)) < tolerance and np.max(np.abs(beta_new - beta)) < tolerance: break beta = beta_new g = g_new return beta, g def discovery_to_parametric_workflow(x: np.ndarray, y: np.ndarray): """ Demonstrate the two-stage workflow: 1. Nonparametric discovery 2. Parametric fitting based on discovered form """ n = len(x) # Stage 1: LOESS discovery def simple_loess(x, y, x_eval, span=0.5): k = int(np.ceil(span * len(x))) y_smooth = np.zeros(len(x_eval)) for j, x0 in enumerate(x_eval): distances = np.abs(x - x0) sorted_idx = np.argsort(distances) neighbors = sorted_idx[:k] h = max(distances[sorted_idx[k-1]], 1e-10) w = (1 - (np.abs(x[neighbors] - x0) / h)**3)**3 y_smooth[j] = np.average(y[neighbors], weights=w+1e-10) return y_smooth x_sorted = np.sort(x) y_loess = simple_loess(x, y, x_sorted, span=0.3) # Stage 2: Identify curvature (heuristic: compare linear vs quadratic) from numpy.polynomial import polynomial as P lin_coeffs = np.polyfit(x, y, 1) quad_coeffs = np.polyfit(x, y, 2) y_lin = np.polyval(lin_coeffs, x) y_quad = np.polyval(quad_coeffs, x) mse_lin = np.mean((y - y_lin)**2) mse_quad = np.mean((y - y_quad)**2) # F-test for quadratic term f_stat = (mse_lin - mse_quad) / (mse_quad / (n - 3)) p_value = 1 - stats.f.cdf(f_stat, 1, n - 3) if 'stats' in dir() else 0.05 return { 'loess_reveals': 'curvature' if mse_quad < mse_lin * 0.85 else 'linear', 'mse_linear': mse_lin, 'mse_quadratic': mse_quad, 'recommended_model': 'quadratic' if mse_quad < mse_lin * 0.85 else 'linear' } # =============================================================================# Demonstration# =============================================================================np.random.seed(42) # Generate semiparametric data: y = 2*x1 - x2 + sin(z) + noisen = 200x1 = np.random.normal(0, 1, n)x2 = np.random.normal(0, 1, n)z = np.random.uniform(0, 2*np.pi, n)y = 2*x1 - x2 + np.sin(z) + np.random.normal(0, 0.5, n) X_para = np.column_stack([x1, x2])beta_est, g_est = semiparametric_regression(X_para, z, y, loess_span=0.4) print("Semiparametric Model Results")print("=" * 50)print(f"True beta: [2.0, -1.0]")print(f"Estimated beta: [{beta_est[0]:.3f}, {beta_est[1]:.3f}]")print(f"Nonparametric g(z) captures the sine pattern")print("This correctly separates linear effects from nonlinear!")In practice, experienced analysts often: (1) Start with exploratory nonparametric visualization; (2) Develop intuition about functional form; (3) Fit parametric models suggested by exploration; (4) Use nonparametric diagnostics to validate; (5) Report both where appropriate. This iterative refinement beats dogmatic adherence to either approach.
By Application Domain:
Different fields have different conventions and requirements. Here's guidance tailored to common application areas:
| Domain | Typical Choice | Reasoning |
|---|---|---|
| Econometrics | Parametric + diagnostics | Coefficient interpretation crucial; theory-driven; large samples |
| Epidemiology | Semiparametric / Splines | Dose-response often complex; exposure adjustment common |
| Environmental Science | GAMs / LOESS | Relationships often nonlinear; seasonal patterns; uncertainty |
| Machine Learning / Prediction | Whatever cross-validates best | Prediction accuracy is the goal; interpretability secondary |
| Clinical Trials | Parametric (regulated) | Regulatory requirements favor pre-specified models |
| Finance / Quant | Both + ensembles | Regime changes; complex dependencies; prediction-critical |
| Social Science | Parametric + visualization | Theory testing; but nonparametric for exploration |
| Engineering / Physics | Parametric (theory-based) | Physical laws constrain form; data confirm parameters |
By Data Characteristics:
Time Series:
Spatial Data:
Survival/Duration Data:
Panel/Longitudinal:
In regulated industries (pharmaceuticals, medical devices, some finance), there's often a preference for pre-specified parametric models that were defined before seeing the data. Nonparametric methods, while statistically sound, can face pushback as 'data dredging' if not pre-specified. Know your context!
The Decision Tree:
Follow this hierarchical decision process:
1. Is d > 5 and no additive structure expected?
→ YES: Use parametric or tree-based methods
→ NO: Continue
2. Is n < 50?
→ YES: Use parametric (not enough data for nonparametric)
→ NO: Continue
3. Do you have strong theoretical basis for functional form?
→ YES: Use parametric, validate with nonparametric diagnostics
→ NO: Continue
4. Is primary goal interpretation of specific coefficients?
→ YES: Use parametric with carefully chosen form
→ NO: Continue
5. Is d ≤ 3 and you want to visualize the relationship?
→ YES: Use nonparametric (LOESS/kernel smoothing)
→ NO: Continue
6. Is prediction accuracy the main goal?
→ YES: Compare both via cross-validation, use winner
→ NO: Continue
7. Default: Use semiparametric or GAM
Combines flexibility with structure
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
from dataclasses import dataclassfrom enum import Enumfrom typing import Optional class MethodRecommendation(Enum): PARAMETRIC = "parametric" NONPARAMETRIC = "nonparametric" SEMIPARAMETRIC = "semiparametric" GAM = "gam" COMPARE_CV = "compare_via_cv" TREE_BASED = "tree_based" @dataclassclass RegressionContext: """Captures the context of a regression problem.""" n_samples: int n_features: int has_theory: bool need_interpretation: bool primary_goal: str # 'prediction', 'inference', 'visualization' additive_structure_expected: bool domain: Optional[str] = None def recommend_method(context: RegressionContext) -> dict: """ Decision framework for choosing regression method. Returns recommendation and reasoning. """ n = context.n_samples d = context.n_features # Decision tree if d > 5 and not context.additive_structure_expected: return { 'recommendation': MethodRecommendation.TREE_BASED, 'reasoning': f"High dimension (d={d}) without additive structure. " "Use random forest, gradient boosting, or regularized linear models.", 'alternatives': ['Ridge/Lasso regression', 'Random forest', 'XGBoost'] } if n < 50: return { 'recommendation': MethodRecommendation.PARAMETRIC, 'reasoning': f"Small sample (n={n}). Nonparametric would have high variance. " "Use simple parametric model.", 'alternatives': ['Linear model', 'Polynomial (low degree)'] } if context.has_theory: return { 'recommendation': MethodRecommendation.PARAMETRIC, 'reasoning': "Strong theoretical basis available. Use theory-derived model, " "validate with nonparametric diagnostics.", 'alternatives': ['Theory-specified form', 'LOESS for residual diagnostics'] } if context.need_interpretation: return { 'recommendation': MethodRecommendation.SEMIPARAMETRIC, 'reasoning': "Need coefficient interpretation but unknown functional form. " "Use GAM or partial linear model.", 'alternatives': ['GAM (interpretable smooth effects)', 'Spline regression'] } if d <= 3 and context.primary_goal == 'visualization': return { 'recommendation': MethodRecommendation.NONPARAMETRIC, 'reasoning': f"Low dimension (d={d}) and visualization goal. " "LOESS or kernel smoothing will reveal patterns without assumptions.", 'alternatives': ['LOESS', 'Kernel regression', 'Thin plate spline'] } if context.primary_goal == 'prediction': return { 'recommendation': MethodRecommendation.COMPARE_CV, 'reasoning': "Prediction is the goal. Compare parametric and nonparametric " "via cross-validation; use whichever performs better.", 'alternatives': ['All reasonable models', 'Ensemble of both'] } # Default if d <= 4: return { 'recommendation': MethodRecommendation.GAM, 'reasoning': "Moderate dimension, general purpose. GAM provides flexibility " "with additive structure to manage curse of dimensionality.", 'alternatives': ['GAM', 'Semiparametric', 'LOESS (if d=1)'] } else: return { 'recommendation': MethodRecommendation.SEMIPARAMETRIC, 'reasoning': "Higher dimension (d>4). Use additive/semiparametric structure " "to avoid curse of dimensionality.", 'alternatives': ['GAM', 'Additive model', 'Single-index model'] } # =============================================================================# Example usage# =============================================================================print("Decision Framework Examples")print("=" * 70) examples = [ RegressionContext(n_samples=50, n_features=2, has_theory=False, need_interpretation=False, primary_goal='visualization', additive_structure_expected=True), RegressionContext(n_samples=5000, n_features=1, has_theory=True, need_interpretation=True, primary_goal='inference', additive_structure_expected=True, domain='economics'), RegressionContext(n_samples=10000, n_features=8, has_theory=False, need_interpretation=False, primary_goal='prediction', additive_structure_expected=False, domain='ml_competition'), RegressionContext(n_samples=200, n_features=3, has_theory=False, need_interpretation=True, primary_goal='inference', additive_structure_expected=True, domain='epidemiology'),] for i, context in enumerate(examples, 1): result = recommend_method(context) print(f"Example {i}: n={context.n_samples}, d={context.n_features}, " f"goal={context.primary_goal}") print(f" Recommendation: {result['recommendation'].value}") print(f" Reasoning: {result['reasoning']}") print(f" Alternatives: {result['alternatives']}")We've completed our journey through nonparametric regression. Let's consolidate the key decision principles:
Module Complete:
You now have a comprehensive toolkit for nonparametric regression: the methods (LOESS, kernel smoothing), the theory (bias-variance, convergence rates, curse of dimensionality), the practical tools (bandwidth selection), and the decision frameworks for choosing when to use them.
These methods form the foundation for more advanced techniques in machine learning: Gaussian processes, support vector regression, and neural network function approximation all build on the principles developed here. The intuition about local fitting, bias-variance tradeoffs, and the curse of dimensionality will serve you throughout your machine learning journey.
Congratulations! You've mastered nonparametric regression—from the mechanics of LOESS and kernel smoothing to the deep challenge of the curse of dimensionality. You can now apply these methods appropriately, select bandwidths wisely, and make informed decisions about when nonparametric approaches are worth their computational and sample-size costs. These skills form the foundation for advanced statistical learning.