Loading learning content...
We've developed a complete toolkit for nonparametric regression: local regression (LOESS), kernel smoothing, bandwidth selection, and understanding of the curse of dimensionality. But having tools isn't the same as knowing when to use them.\n\nEvery regression problem presents a choice: commit to a parametric form (and gain efficiency if correct), or stay nonparametric (and gain robustness to misspecification). This isn't a philosophical debate—it's a practical decision with real consequences for prediction accuracy, interpretability, computational cost, and scientific insight.\n\nThis final page synthesizes everything we've learned into actionable guidance. We'll develop decision frameworks, compare methods head-to-head on important criteria, and equip you to choose wisely for any regression problem you encounter.
By the end of this page, you will understand: (1) The fundamental tradeoffs between parametric and nonparametric approaches; (2) Diagnostic tools for choosing between approaches; (3) Hybrid strategies that combine both; (4) Application-specific recommendations; (5) A decision framework for practical use.
Parametric Models:\n\nAssume a specific functional form: $y = f(x; \boldsymbol{\theta}) + \epsilon$\n\n- Examples: Linear regression, polynomial regression, logistic regression\n- Pro: If form is correct, statistically optimal (efficient). Fast, interpretable.\n- Con: If form is wrong (misspecification), can be badly biased. All assumptions are 'baked in'.\n\nNonparametric Models:\n\nMake minimal assumptions about form: $y = f(x) + \epsilon$ where $f \in \mathcal{F}$ (some rich function class).\n\n- Examples: LOESS, kernel smoothing, splines, Gaussian processes\n- Pro: Robust to misspecification. Adapts to whatever the data shows.\n- Con: Less efficient when simple forms suffice. Curse of dimensionality. Harder to interpret.
The Bias-Variance Perspective:\n\n| Aspect | Parametric (underfit) | Parametric (correct) | Nonparametric |\n|--------|----------------------|---------------------|---------------|\n| Bias | High (if misspecified) | Low | Low (with good bandwidth) |\n| Variance | Low | Low | Higher |\n| MSE | Poor | Excellent | Moderate |\n\nThe key insight: Parametric wins if the assumption is right; nonparametric wins if it's wrong.\n\nThis motivates the core decision framework: How confident are you in your parametric assumptions?
From a Bayesian viewpoint, choosing between parametric and nonparametric is about prior beliefs. A parametric model encodes strong prior belief about functional form. A nonparametric model encodes weak/diffuse prior. The 'right' choice depends on how much you genuinely know beforehand.
1. Sample Size:\n\n- Small ($n < 100$): Parametric usually better. Nonparametric suffers high variance.\n- Moderate ($100 < n < 1000$): Both viable. Use diagnostics to guide choice.\n- Large ($n > 1000$): Nonparametric shines. Enough data to estimate complex patterns.\n\n2. Dimensionality:\n\n- $d = 1$: Nonparametric nearly always practical and often illuminating.\n- $d = 2-4$: Nonparametric viable; interpret via surface plots.\n- $d > 5$: Use structured nonparametric (additive models) or parametric + validation.\n\n3. Domain Knowledge:\n\n- Strong theory (physics, economics): Parametric is often justified. Use nonparametric to check assumptions.\n- Weak theory (exploratory): Nonparametric for discovery, then fit parametric if patterns emerge.\n- Moderate knowledge: Use both; compare predictions.
| Factor | Favors Parametric | Favors Nonparametric |
|---|---|---|
| Sample size | Small ($n < 100$) | Large ($n > 1000$) |
| Dimensionality | High ($d > 5$) | Low ($d \leq 3$) |
| Domain knowledge | Strong theoretical basis | Exploratory / unknown |
| Interpretability need | High (explain coefficients) | Low (prediction focus) |
| Extrapolation need | Must predict outside training range | Interpolation only |
| Computational budget | Limited | Generous |
| Confidence in form | High confidence | Low confidence or checking |
| Goal | Inference on parameters | Visualization or prediction |
4. Task Objective:\n\n- Prediction accuracy: Use cross-validation to directly compare methods. The winner wins!\n- Scientific interpretation: Parametric models are more interpretable (coefficients have meaning).\n- Visualization: Nonparametric excels at revealing patterns without imposing structure.\n- Inference: Parametric (usually). Nonparametric confidence intervals exist but are complex.\n\n5. Consequences of Error:\n\n- High cost of misspecification: Nonparametric is 'safer'—wrong, but not disastrously wrong.\n- High cost of variance: Parametric is more stable if assumptions are roughly correct.
Using Nonparametric Fits to Check Parametric Models:\n\nOne of the most valuable uses of nonparametric regression is diagnostic: checking whether a parametric model is reasonable.\n\nThe Overlay Test:\n1. Fit your parametric model (e.g., linear regression)\n2. Fit a nonparametric smoother (LOESS) to the same data\n3. Overlay both fits on a scatterplot\n4. If they track closely, the parametric form is adequate\n5. If they diverge systematically, consider refining the parametric model
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats def compare_parametric_nonparametric( x: np.ndarray, y: np.ndarray, parametric_degree: int = 1, loess_span: float = 0.75): """ Compare parametric (polynomial) and nonparametric (LOESS) fits. Returns diagnostics to help choose. """ n = len(x) x_sorted = np.sort(x) # Parametric fit (polynomial) poly_coeffs = np.polyfit(x, y, parametric_degree) poly = np.poly1d(poly_coeffs) y_para = poly(x_sorted) y_para_at_x = poly(x) # LOESS fit (simplified implementation) def loess_smooth(x_train, y_train, x_eval, span): n_train = len(x_train) k = int(np.ceil(span * n_train)) y_smooth = np.zeros(len(x_eval)) for j, x0 in enumerate(x_eval): distances = np.abs(x_train - x0) sorted_idx = np.argsort(distances) neighbors = sorted_idx[:k] h = distances[sorted_idx[k-1]] if h < 1e-10: h = 1e-10 u = (x_train[neighbors] - x0) / h w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1) # Local linear fit X_loc = np.column_stack([np.ones(k), x_train[neighbors] - x0]) W = np.diag(w) try: beta = np.linalg.solve(X_loc.T @ W @ X_loc, X_loc.T @ W @ y_train[neighbors]) y_smooth[j] = beta[0] except: y_smooth[j] = np.average(y_train[neighbors], weights=w+1e-10) return y_smooth y_nonpara = loess_smooth(x, y, x_sorted, loess_span) y_nonpara_at_x = loess_smooth(x, y, x, loess_span) # Compute diagnostics residuals_para = y - y_para_at_x residuals_nonpara = y - y_nonpara_at_x mse_para = np.mean(residuals_para**2) mse_nonpara = np.mean(residuals_nonpara**2) # Discrepancy between models discrepancy = np.sqrt(np.mean((y_para - y_nonpara)**2)) # Runs test on parametric residuals (tests for systematic patterns) signs = residuals_para > 0 runs = 1 + np.sum(signs[1:] != signs[:-1]) n_pos = np.sum(signs) n_neg = n - n_pos expected_runs = 1 + 2*n_pos*n_neg / n return { 'mse_parametric': mse_para, 'mse_nonparametric': mse_nonpara, 'discrepancy': discrepancy, 'runs_test': { 'observed': runs, 'expected': expected_runs, 'indicates_misspec': runs < expected_runs * 0.7 }, 'recommendation': 'nonparametric' if mse_nonpara < mse_para * 0.8 else 'parametric adequate' } # =============================================================================# Case 1: Linear relationship (parametric should win)# =============================================================================np.random.seed(42)n = 100x1 = np.linspace(0, 10, n)y1 = 2 + 3*x1 + np.random.normal(0, 2, n) result1 = compare_parametric_nonparametric(x1, y1, parametric_degree=1, loess_span=0.5) print("Case 1: Linear Relationship")print("=" * 60)print(f"Parametric MSE: {result1['mse_parametric']:.4f}")print(f"Nonparametric MSE: {result1['mse_nonparametric']:.4f}")print(f"Model discrepancy: {result1['discrepancy']:.4f}")print(f"Recommendation: {result1['recommendation']}") # =============================================================================# Case 2: Nonlinear relationship (nonparametric should reveal this)# =============================================================================x2 = np.linspace(0, 10, n)y2 = 2 + 0.5*x2 + 3*np.sin(x2) + np.random.normal(0, 1, n) result2 = compare_parametric_nonparametric(x2, y2, parametric_degree=1, loess_span=0.3) print("\nCase 2: Nonlinear Relationship (sine + linear)")print("=" * 60)print(f"Parametric MSE: {result2['mse_parametric']:.4f}")print(f"Nonparametric MSE: {result2['mse_nonparametric']:.4f}")print(f"Model discrepancy: {result2['discrepancy']:.4f}")print(f"Recommendation: {result2['recommendation']}")Residual Analysis:\n\nAfter fitting a parametric model, plot residuals against $x$ with a LOESS smooth:\n- Flat at zero: Parametric form is adequate\n- Systematic pattern: Form is missing something—consider nonparametric\n\nPartial Residual Plots (for multiple regression):\nPlot $e_i + \hat{\beta}j x{ij}$ against $x_j$ with LOESS overlay. Reveals whether each predictor's effect is linear.
Combining Parametric and Nonparametric:\n\nYou don't always have to choose exclusively. Several powerful strategies combine both approaches.\n\n1. Semiparametric Models:\n\n$$y_i = \mathbf{x}i^T \boldsymbol{\beta} + g(z_i) + \epsilon_i$$\n\nSome predictors enter parametrically ($\mathbf{x}$), others nonparametrically ($z$). Best when you have theory for some relationships but not others.\n\n2. Generalized Additive Models (GAMs):\n\n$$y_i = \beta_0 + f_1(x{i1}) + f_2(x_{i2}) + \ldots + \epsilon_i$$\n\nEach $f_j$ is estimated nonparametrically, but the additive structure is parametric. Escapes the curse while allowing flexibility.
3. Two-Stage: Nonparametric Discovery → Parametric Fitting:\n\nWorkflow:\n1. Fit LOESS to visualize the relationship\n2. Identify the functional form (looks quadratic? exponential? logistic?)\n3. Fit the parametric model suggested by the nonparametric fit\n4. Validate with cross-validation\n\nThis uses nonparametric as exploratory tool to inform parametric confirmatory model.\n\n4. Nonparametric Bootstrap for Parametric Inference:\n\nFit a parametric model, but assess uncertainty via nonparametric (residual) bootstrap. Robust to distributional assumptions while keeping parametric point estimates.\n\n5. Ensemble: Blend Predictions\n\nFinal prediction = weighted average of parametric and nonparametric predictions. Weights chosen by cross-validation. Hedges against misspecification.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
import numpy as npfrom typing import Tuple def semiparametric_regression( X_para: np.ndarray, x_nonpara: np.ndarray, y: np.ndarray, loess_span: float = 0.5, max_iterations: int = 20, tolerance: float = 1e-6) -> Tuple[np.ndarray, np.ndarray]: """ Semiparametric model: y = X_para @ beta + g(x_nonpara) + epsilon Fit via backfitting: 1. Fix g, estimate beta by OLS on y - g 2. Fix beta, estimate g by LOESS on y - X_para @ beta 3. Iterate until convergence """ n = len(y) # Initialize g = np.zeros(n) beta = np.zeros(X_para.shape[1]) def loess_1d(x, y_target, span): """Simple 1D LOESS.""" n = len(x) k = int(np.ceil(span * n)) y_smooth = np.zeros(n) order = np.argsort(x) for j in range(n): x0 = x[j] distances = np.abs(x - x0) sorted_idx = np.argsort(distances) neighbors = sorted_idx[:k] h = max(distances[sorted_idx[k-1]], 1e-10) u = (x[neighbors] - x0) / h w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1) if np.sum(w) > 1e-10: y_smooth[j] = np.average(y_target[neighbors], weights=w) else: y_smooth[j] = np.mean(y_target) return y_smooth - np.mean(y_smooth) # Center for iteration in range(max_iterations): # Step 1: Fix g, estimate beta y_adjusted = y - g beta_new, _, _, _ = np.linalg.lstsq(X_para, y_adjusted, rcond=None) # Step 2: Fix beta, estimate g y_adjusted = y - X_para @ beta_new g_new = loess_1d(x_nonpara, y_adjusted, loess_span) # Check convergence if np.max(np.abs(g_new - g)) < tolerance and np.max(np.abs(beta_new - beta)) < tolerance: break beta = beta_new g = g_new return beta, g def discovery_to_parametric_workflow(x: np.ndarray, y: np.ndarray): """ Demonstrate the two-stage workflow: 1. Nonparametric discovery 2. Parametric fitting based on discovered form """ n = len(x) # Stage 1: LOESS discovery def simple_loess(x, y, x_eval, span=0.5): k = int(np.ceil(span * len(x))) y_smooth = np.zeros(len(x_eval)) for j, x0 in enumerate(x_eval): distances = np.abs(x - x0) sorted_idx = np.argsort(distances) neighbors = sorted_idx[:k] h = max(distances[sorted_idx[k-1]], 1e-10) w = (1 - (np.abs(x[neighbors] - x0) / h)**3)**3 y_smooth[j] = np.average(y[neighbors], weights=w+1e-10) return y_smooth x_sorted = np.sort(x) y_loess = simple_loess(x, y, x_sorted, span=0.3) # Stage 2: Identify curvature (heuristic: compare linear vs quadratic) from numpy.polynomial import polynomial as P lin_coeffs = np.polyfit(x, y, 1) quad_coeffs = np.polyfit(x, y, 2) y_lin = np.polyval(lin_coeffs, x) y_quad = np.polyval(quad_coeffs, x) mse_lin = np.mean((y - y_lin)**2) mse_quad = np.mean((y - y_quad)**2) # F-test for quadratic term f_stat = (mse_lin - mse_quad) / (mse_quad / (n - 3)) p_value = 1 - stats.f.cdf(f_stat, 1, n - 3) if 'stats' in dir() else 0.05 return { 'loess_reveals': 'curvature' if mse_quad < mse_lin * 0.85 else 'linear', 'mse_linear': mse_lin, 'mse_quadratic': mse_quad, 'recommended_model': 'quadratic' if mse_quad < mse_lin * 0.85 else 'linear' } # =============================================================================# Demonstration# =============================================================================np.random.seed(42) # Generate semiparametric data: y = 2*x1 - x2 + sin(z) + noisen = 200x1 = np.random.normal(0, 1, n)x2 = np.random.normal(0, 1, n)z = np.random.uniform(0, 2*np.pi, n)y = 2*x1 - x2 + np.sin(z) + np.random.normal(0, 0.5, n) X_para = np.column_stack([x1, x2])beta_est, g_est = semiparametric_regression(X_para, z, y, loess_span=0.4) print("Semiparametric Model Results")print("=" * 50)print(f"True beta: [2.0, -1.0]")print(f"Estimated beta: [{beta_est[0]:.3f}, {beta_est[1]:.3f}]")print(f"Nonparametric g(z) captures the sine pattern")print("\nThis correctly separates linear effects from nonlinear!")In practice, experienced analysts often: (1) Start with exploratory nonparametric visualization; (2) Develop intuition about functional form; (3) Fit parametric models suggested by exploration; (4) Use nonparametric diagnostics to validate; (5) Report both where appropriate. This iterative refinement beats dogmatic adherence to either approach.
By Application Domain:\n\nDifferent fields have different conventions and requirements. Here's guidance tailored to common application areas:
| Domain | Typical Choice | Reasoning |
|---|---|---|
| Econometrics | Parametric + diagnostics | Coefficient interpretation crucial; theory-driven; large samples |
| Epidemiology | Semiparametric / Splines | Dose-response often complex; exposure adjustment common |
| Environmental Science | GAMs / LOESS | Relationships often nonlinear; seasonal patterns; uncertainty |
| Machine Learning / Prediction | Whatever cross-validates best | Prediction accuracy is the goal; interpretability secondary |
| Clinical Trials | Parametric (regulated) | Regulatory requirements favor pre-specified models |
| Finance / Quant | Both + ensembles | Regime changes; complex dependencies; prediction-critical |
| Social Science | Parametric + visualization | Theory testing; but nonparametric for exploration |
| Engineering / Physics | Parametric (theory-based) | Physical laws constrain form; data confirm parameters |
By Data Characteristics:\n\nTime Series:\n- LOESS excellent for trend extraction (STL decomposition)\n- Combine with parametric ARIMA for forecasting\n\nSpatial Data:\n- Kernel smoothing for spatial surfaces\n- Kriging (GP) for interpolation with uncertainty\n\nSurvival/Duration Data:\n- Cox model (semiparametric) is standard\n- Fully nonparametric for Kaplan-Meier curves\n\nPanel/Longitudinal:\n- Mixed effects with nonparametric smooth of time\n- GAMMs (Generalized Additive Mixed Models)
In regulated industries (pharmaceuticals, medical devices, some finance), there's often a preference for pre-specified parametric models that were defined before seeing the data. Nonparametric methods, while statistically sound, can face pushback as 'data dredging' if not pre-specified. Know your context!
The Decision Tree:\n\nFollow this hierarchical decision process:\n\n\n1. Is d > 5 and no additive structure expected?\n → YES: Use parametric or tree-based methods\n → NO: Continue\n\n2. Is n < 50?\n → YES: Use parametric (not enough data for nonparametric)\n → NO: Continue\n\n3. Do you have strong theoretical basis for functional form?\n → YES: Use parametric, validate with nonparametric diagnostics\n → NO: Continue\n\n4. Is primary goal interpretation of specific coefficients?\n → YES: Use parametric with carefully chosen form\n → NO: Continue\n\n5. Is d ≤ 3 and you want to visualize the relationship?\n → YES: Use nonparametric (LOESS/kernel smoothing)\n → NO: Continue\n\n6. Is prediction accuracy the main goal?\n → YES: Compare both via cross-validation, use winner\n → NO: Continue\n\n7. Default: Use semiparametric or GAM\n Combines flexibility with structure\n
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
from dataclasses import dataclassfrom enum import Enumfrom typing import Optional class MethodRecommendation(Enum): PARAMETRIC = "parametric" NONPARAMETRIC = "nonparametric" SEMIPARAMETRIC = "semiparametric" GAM = "gam" COMPARE_CV = "compare_via_cv" TREE_BASED = "tree_based" @dataclassclass RegressionContext: """Captures the context of a regression problem.""" n_samples: int n_features: int has_theory: bool need_interpretation: bool primary_goal: str # 'prediction', 'inference', 'visualization' additive_structure_expected: bool domain: Optional[str] = None def recommend_method(context: RegressionContext) -> dict: """ Decision framework for choosing regression method. Returns recommendation and reasoning. """ n = context.n_samples d = context.n_features # Decision tree if d > 5 and not context.additive_structure_expected: return { 'recommendation': MethodRecommendation.TREE_BASED, 'reasoning': f"High dimension (d={d}) without additive structure. " "Use random forest, gradient boosting, or regularized linear models.", 'alternatives': ['Ridge/Lasso regression', 'Random forest', 'XGBoost'] } if n < 50: return { 'recommendation': MethodRecommendation.PARAMETRIC, 'reasoning': f"Small sample (n={n}). Nonparametric would have high variance. " "Use simple parametric model.", 'alternatives': ['Linear model', 'Polynomial (low degree)'] } if context.has_theory: return { 'recommendation': MethodRecommendation.PARAMETRIC, 'reasoning': "Strong theoretical basis available. Use theory-derived model, " "validate with nonparametric diagnostics.", 'alternatives': ['Theory-specified form', 'LOESS for residual diagnostics'] } if context.need_interpretation: return { 'recommendation': MethodRecommendation.SEMIPARAMETRIC, 'reasoning': "Need coefficient interpretation but unknown functional form. " "Use GAM or partial linear model.", 'alternatives': ['GAM (interpretable smooth effects)', 'Spline regression'] } if d <= 3 and context.primary_goal == 'visualization': return { 'recommendation': MethodRecommendation.NONPARAMETRIC, 'reasoning': f"Low dimension (d={d}) and visualization goal. " "LOESS or kernel smoothing will reveal patterns without assumptions.", 'alternatives': ['LOESS', 'Kernel regression', 'Thin plate spline'] } if context.primary_goal == 'prediction': return { 'recommendation': MethodRecommendation.COMPARE_CV, 'reasoning': "Prediction is the goal. Compare parametric and nonparametric " "via cross-validation; use whichever performs better.", 'alternatives': ['All reasonable models', 'Ensemble of both'] } # Default if d <= 4: return { 'recommendation': MethodRecommendation.GAM, 'reasoning': "Moderate dimension, general purpose. GAM provides flexibility " "with additive structure to manage curse of dimensionality.", 'alternatives': ['GAM', 'Semiparametric', 'LOESS (if d=1)'] } else: return { 'recommendation': MethodRecommendation.SEMIPARAMETRIC, 'reasoning': "Higher dimension (d>4). Use additive/semiparametric structure " "to avoid curse of dimensionality.", 'alternatives': ['GAM', 'Additive model', 'Single-index model'] } # =============================================================================# Example usage# =============================================================================print("Decision Framework Examples")print("=" * 70) examples = [ RegressionContext(n_samples=50, n_features=2, has_theory=False, need_interpretation=False, primary_goal='visualization', additive_structure_expected=True), RegressionContext(n_samples=5000, n_features=1, has_theory=True, need_interpretation=True, primary_goal='inference', additive_structure_expected=True, domain='economics'), RegressionContext(n_samples=10000, n_features=8, has_theory=False, need_interpretation=False, primary_goal='prediction', additive_structure_expected=False, domain='ml_competition'), RegressionContext(n_samples=200, n_features=3, has_theory=False, need_interpretation=True, primary_goal='inference', additive_structure_expected=True, domain='epidemiology'),] for i, context in enumerate(examples, 1): result = recommend_method(context) print(f"\nExample {i}: n={context.n_samples}, d={context.n_features}, " f"goal={context.primary_goal}") print(f" Recommendation: {result['recommendation'].value}") print(f" Reasoning: {result['reasoning']}") print(f" Alternatives: {result['alternatives']}")We've completed our journey through nonparametric regression. Let's consolidate the key decision principles:
Module Complete:\n\nYou now have a comprehensive toolkit for nonparametric regression: the methods (LOESS, kernel smoothing), the theory (bias-variance, convergence rates, curse of dimensionality), the practical tools (bandwidth selection), and the decision frameworks for choosing when to use them.\n\nThese methods form the foundation for more advanced techniques in machine learning: Gaussian processes, support vector regression, and neural network function approximation all build on the principles developed here. The intuition about local fitting, bias-variance tradeoffs, and the curse of dimensionality will serve you throughout your machine learning journey.
Congratulations! You've mastered nonparametric regression—from the mechanics of LOESS and kernel smoothing to the deep challenge of the curse of dimensionality. You can now apply these methods appropriately, select bandwidths wisely, and make informed decisions about when nonparametric approaches are worth their computational and sample-size costs. These skills form the foundation for advanced statistical learning.