Machine LearningAdvanced Regression

Nonparametric Regression

LevelAdvanced

Duration120 mins

TopicAdvanced Regression

5 / 5

When to Use Nonparametric Methods

Making the Right Choice: Parametric vs. Nonparametric

We've developed a complete toolkit for nonparametric regression: local regression (LOESS), kernel smoothing, bandwidth selection, and understanding of the curse of dimensionality. But having tools isn't the same as knowing when to use them.

Every regression problem presents a choice: commit to a parametric form (and gain efficiency if correct), or stay nonparametric (and gain robustness to misspecification). This isn't a philosophical debate—it's a practical decision with real consequences for prediction accuracy, interpretability, computational cost, and scientific insight.

This final page synthesizes everything we've learned into actionable guidance. We'll develop decision frameworks, compare methods head-to-head on important criteria, and equip you to choose wisely for any regression problem you encounter.

What You Will Learn

By the end of this page, you will understand: (1) The fundamental tradeoffs between parametric and nonparametric approaches; (2) Diagnostic tools for choosing between approaches; (3) Hybrid strategies that combine both; (4) Application-specific recommendations; (5) A decision framework for practical use.

The Fundamental Tradeoff

Parametric Models:

Assume a specific functional form: $y = f(x; \boldsymbol{\theta}) + \epsilon$

Examples: Linear regression, polynomial regression, logistic regression
Pro: If form is correct, statistically optimal (efficient). Fast, interpretable.
Con: If form is wrong (misspecification), can be badly biased. All assumptions are 'baked in'.

Nonparametric Models:

Make minimal assumptions about form: $y = f(x) + \epsilon$ where $f \in \mathcal{F}$ (some rich function class).

Examples: LOESS, kernel smoothing, splines, Gaussian processes
Pro: Robust to misspecification. Adapts to whatever the data shows.
Con: Less efficient when simple forms suffice. Curse of dimensionality. Harder to interpret.

The Bias-Variance Perspective:

Aspect	Parametric (underfit)	Parametric (correct)	Nonparametric
Bias	High (if misspecified)	Low	Low (with good bandwidth)
Variance	Low	Low	Higher
MSE	Poor	Excellent	Moderate

The key insight: Parametric wins if the assumption is right; nonparametric wins if it's wrong.

This motivates the core decision framework: How confident are you in your parametric assumptions?

The Bayesian Perspective

From a Bayesian viewpoint, choosing between parametric and nonparametric is about prior beliefs. A parametric model encodes strong prior belief about functional form. A nonparametric model encodes weak/diffuse prior. The 'right' choice depends on how much you genuinely know beforehand.

Diagnostic Criteria for Choosing

1. Sample Size:

Small ($n < 100$): Parametric usually better. Nonparametric suffers high variance.
Moderate ($100 < n < 1000$): Both viable. Use diagnostics to guide choice.
Large ($n > 1000$): Nonparametric shines. Enough data to estimate complex patterns.

2. Dimensionality:

$d = 1$: Nonparametric nearly always practical and often illuminating.
$d = 2-4$: Nonparametric viable; interpret via surface plots.
$d > 5$: Use structured nonparametric (additive models) or parametric + validation.

3. Domain Knowledge:

Strong theory (physics, economics): Parametric is often justified. Use nonparametric to check assumptions.
Weak theory (exploratory): Nonparametric for discovery, then fit parametric if patterns emerge.
Moderate knowledge: Use both; compare predictions.

Decision Factors: Parametric vs. Nonparametric
Factor	Favors Parametric	Favors Nonparametric
Sample size	Small ($n < 100$)	Large ($n > 1000$)
Dimensionality	High ($d > 5$)	Low ($d \leq 3$)
Domain knowledge	Strong theoretical basis	Exploratory / unknown
Interpretability need	High (explain coefficients)	Low (prediction focus)
Extrapolation need	Must predict outside training range	Interpolation only
Computational budget	Limited	Generous
Confidence in form	High confidence	Low confidence or checking
Goal	Inference on parameters	Visualization or prediction

4. Task Objective:

Prediction accuracy: Use cross-validation to directly compare methods. The winner wins!
Scientific interpretation: Parametric models are more interpretable (coefficients have meaning).
Visualization: Nonparametric excels at revealing patterns without imposing structure.
Inference: Parametric (usually). Nonparametric confidence intervals exist but are complex.

5. Consequences of Error:

High cost of misspecification: Nonparametric is 'safer'—wrong, but not disastrously wrong.
High cost of variance: Parametric is more stable if assumptions are roughly correct.

Visual Diagnostics and Model Checking

Using Nonparametric Fits to Check Parametric Models:

One of the most valuable uses of nonparametric regression is diagnostic: checking whether a parametric model is reasonable.

The Overlay Test:

Fit your parametric model (e.g., linear regression)
Fit a nonparametric smoother (LOESS) to the same data
Overlay both fits on a scatterplot
If they track closely, the parametric form is adequate
If they diverge systematically, consider refining the parametric model

model_comparison_diagnostic.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
 
def compare_parametric_nonparametric(
    x: np.ndarray,
    y: np.ndarray,
    parametric_degree: int = 1,
    loess_span: float = 0.75
):
    """
    Compare parametric (polynomial) and nonparametric (LOESS) fits.
    Returns diagnostics to help choose.
    """
    n = len(x)
    x_sorted = np.sort(x)
    
    # Parametric fit (polynomial)
    poly_coeffs = np.polyfit(x, y, parametric_degree)
    poly = np.poly1d(poly_coeffs)
    y_para = poly(x_sorted)
    y_para_at_x = poly(x)
    
    # LOESS fit (simplified implementation)
    def loess_smooth(x_train, y_train, x_eval, span):
        n_train = len(x_train)
        k = int(np.ceil(span * n_train))
        y_smooth = np.zeros(len(x_eval))
        
        for j, x0 in enumerate(x_eval):
            distances = np.abs(x_train - x0)
            sorted_idx = np.argsort(distances)
            neighbors = sorted_idx[:k]
            h = distances[sorted_idx[k-1]]
            
            if h < 1e-10:
                h = 1e-10
            
            u = (x_train[neighbors] - x0) / h
            w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1)
            
            # Local linear fit
            X_loc = np.column_stack([np.ones(k), x_train[neighbors] - x0])
            W = np.diag(w)
            try:
                beta = np.linalg.solve(X_loc.T @ W @ X_loc, X_loc.T @ W @ y_train[neighbors])
                y_smooth[j] = beta[0]
            except:
                y_smooth[j] = np.average(y_train[neighbors], weights=w+1e-10)
        
        return y_smooth
    
    y_nonpara = loess_smooth(x, y, x_sorted, loess_span)
    y_nonpara_at_x = loess_smooth(x, y, x, loess_span)
    
    # Compute diagnostics
    residuals_para = y - y_para_at_x
    residuals_nonpara = y - y_nonpara_at_x
    
    mse_para = np.mean(residuals_para**2)
    mse_nonpara = np.mean(residuals_nonpara**2)
    
    # Discrepancy between models
    discrepancy = np.sqrt(np.mean((y_para - y_nonpara)**2))
    
    # Runs test on parametric residuals (tests for systematic patterns)
    signs = residuals_para > 0
    runs = 1 + np.sum(signs[1:] != signs[:-1])
    n_pos = np.sum(signs)
    n_neg = n - n_pos
    expected_runs = 1 + 2*n_pos*n_neg / n
    
    return {
        'mse_parametric': mse_para,
        'mse_nonparametric': mse_nonpara,
        'discrepancy': discrepancy,
        'runs_test': {
            'observed': runs,
            'expected': expected_runs,
            'indicates_misspec': runs < expected_runs * 0.7
        },
        'recommendation': 'nonparametric' if mse_nonpara < mse_para * 0.8 else 'parametric adequate'
    }
 
# =============================================================================
# Case 1: Linear relationship (parametric should win)
# =============================================================================
np.random.seed(42)
n = 100
x1 = np.linspace(0, 10, n)
y1 = 2 + 3*x1 + np.random.normal(0, 2, n)
 
result1 = compare_parametric_nonparametric(x1, y1, parametric_degree=1, loess_span=0.5)
 
print("Case 1: Linear Relationship")
print("=" * 60)
print(f"Parametric MSE:    {result1['mse_parametric']:.4f}")
print(f"Nonparametric MSE: {result1['mse_nonparametric']:.4f}")
print(f"Model discrepancy: {result1['discrepancy']:.4f}")
print(f"Recommendation:    {result1['recommendation']}")
 
# =============================================================================
# Case 2: Nonlinear relationship (nonparametric should reveal this)
# =============================================================================
x2 = np.linspace(0, 10, n)
y2 = 2 + 0.5*x2 + 3*np.sin(x2) + np.random.normal(0, 1, n)
 
result2 = compare_parametric_nonparametric(x2, y2, parametric_degree=1, loess_span=0.3)
 
print("
Case 2: Nonlinear Relationship (sine + linear)")
print("=" * 60)
print(f"Parametric MSE:    {result2['mse_parametric']:.4f}")
print(f"Nonparametric MSE: {result2['mse_nonparametric']:.4f}")
print(f"Model discrepancy: {result2['discrepancy']:.4f}")
print(f"Recommendation:    {result2['recommendation']}")

Residual Analysis:

After fitting a parametric model, plot residuals against $x$ with a LOESS smooth:

Flat at zero: Parametric form is adequate
Systematic pattern: Form is missing something—consider nonparametric

Partial Residual Plots (for multiple regression): Plot $e_i + \hat{\beta}j x{ij}$ against $x_j$ with LOESS overlay. Reveals whether each predictor's effect is linear.

Hybrid Strategies: Best of Both Worlds

Combining Parametric and Nonparametric:

You don't always have to choose exclusively. Several powerful strategies combine both approaches.

1. Semiparametric Models:

$$y_i = \mathbf{x}_i^T \boldsymbol{\beta} + g(z_i) + \epsilon_i$$

Some predictors enter parametrically ($\mathbf{x}$), others nonparametrically ($z$). Best when you have theory for some relationships but not others.

2. Generalized Additive Models (GAMs):

$$y_i = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \ldots + \epsilon_i$$

Each $f_j$ is estimated nonparametrically, but the additive structure is parametric. Escapes the curse while allowing flexibility.

3. Two-Stage: Nonparametric Discovery → Parametric Fitting:

Workflow:

Fit LOESS to visualize the relationship
Identify the functional form (looks quadratic? exponential? logistic?)
Fit the parametric model suggested by the nonparametric fit
Validate with cross-validation

This uses nonparametric as exploratory tool to inform parametric confirmatory model.

4. Nonparametric Bootstrap for Parametric Inference:

Fit a parametric model, but assess uncertainty via nonparametric (residual) bootstrap. Robust to distributional assumptions while keeping parametric point estimates.

5. Ensemble: Blend Predictions

Final prediction = weighted average of parametric and nonparametric predictions. Weights chosen by cross-validation. Hedges against misspecification.

hybrid_strategies.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
import numpy as np
from typing import Tuple
 
def semiparametric_regression(
    X_para: np.ndarray,
    x_nonpara: np.ndarray,
    y: np.ndarray,
    loess_span: float = 0.5,
    max_iterations: int = 20,
    tolerance: float = 1e-6
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Semiparametric model: y = X_para @ beta + g(x_nonpara) + epsilon
    
    Fit via backfitting:
    1. Fix g, estimate beta by OLS on y - g
    2. Fix beta, estimate g by LOESS on y - X_para @ beta
    3. Iterate until convergence
    """
    n = len(y)
    
    # Initialize
    g = np.zeros(n)
    beta = np.zeros(X_para.shape[1])
    
    def loess_1d(x, y_target, span):
        """Simple 1D LOESS."""
        n = len(x)
        k = int(np.ceil(span * n))
        y_smooth = np.zeros(n)
        order = np.argsort(x)
        
        for j in range(n):
            x0 = x[j]
            distances = np.abs(x - x0)
            sorted_idx = np.argsort(distances)
            neighbors = sorted_idx[:k]
            h = max(distances[sorted_idx[k-1]], 1e-10)
            
            u = (x[neighbors] - x0) / h
            w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1)
            
            if np.sum(w) > 1e-10:
                y_smooth[j] = np.average(y_target[neighbors], weights=w)
            else:
                y_smooth[j] = np.mean(y_target)
        
        return y_smooth - np.mean(y_smooth)  # Center
    
    for iteration in range(max_iterations):
        # Step 1: Fix g, estimate beta
        y_adjusted = y - g
        beta_new, _, _, _ = np.linalg.lstsq(X_para, y_adjusted, rcond=None)
        
        # Step 2: Fix beta, estimate g
        y_adjusted = y - X_para @ beta_new
        g_new = loess_1d(x_nonpara, y_adjusted, loess_span)
        
        # Check convergence
        if np.max(np.abs(g_new - g)) < tolerance and np.max(np.abs(beta_new - beta)) < tolerance:
            break
        
        beta = beta_new
        g = g_new
    
    return beta, g
 
def discovery_to_parametric_workflow(x: np.ndarray, y: np.ndarray):
    """
    Demonstrate the two-stage workflow:
    1. Nonparametric discovery
    2. Parametric fitting based on discovered form
    """
    n = len(x)
    
    # Stage 1: LOESS discovery
    def simple_loess(x, y, x_eval, span=0.5):
        k = int(np.ceil(span * len(x)))
        y_smooth = np.zeros(len(x_eval))
        
        for j, x0 in enumerate(x_eval):
            distances = np.abs(x - x0)
            sorted_idx = np.argsort(distances)
            neighbors = sorted_idx[:k]
            h = max(distances[sorted_idx[k-1]], 1e-10)
            w = (1 - (np.abs(x[neighbors] - x0) / h)**3)**3
            y_smooth[j] = np.average(y[neighbors], weights=w+1e-10)
        
        return y_smooth
    
    x_sorted = np.sort(x)
    y_loess = simple_loess(x, y, x_sorted, span=0.3)
    
    # Stage 2: Identify curvature (heuristic: compare linear vs quadratic)
    from numpy.polynomial import polynomial as P
    
    lin_coeffs = np.polyfit(x, y, 1)
    quad_coeffs = np.polyfit(x, y, 2)
    
    y_lin = np.polyval(lin_coeffs, x)
    y_quad = np.polyval(quad_coeffs, x)
    
    mse_lin = np.mean((y - y_lin)**2)
    mse_quad = np.mean((y - y_quad)**2)
    
    # F-test for quadratic term
    f_stat = (mse_lin - mse_quad) / (mse_quad / (n - 3))
    p_value = 1 - stats.f.cdf(f_stat, 1, n - 3) if 'stats' in dir() else 0.05
    
    return {
        'loess_reveals': 'curvature' if mse_quad < mse_lin * 0.85 else 'linear',
        'mse_linear': mse_lin,
        'mse_quadratic': mse_quad,
        'recommended_model': 'quadratic' if mse_quad < mse_lin * 0.85 else 'linear'
    }
 
# =============================================================================
# Demonstration
# =============================================================================
np.random.seed(42)
 
# Generate semiparametric data: y = 2*x1 - x2 + sin(z) + noise
n = 200
x1 = np.random.normal(0, 1, n)
x2 = np.random.normal(0, 1, n)
z = np.random.uniform(0, 2*np.pi, n)
y = 2*x1 - x2 + np.sin(z) + np.random.normal(0, 0.5, n)
 
X_para = np.column_stack([x1, x2])
beta_est, g_est = semiparametric_regression(X_para, z, y, loess_span=0.4)
 
print("Semiparametric Model Results")
print("=" * 50)
print(f"True beta:      [2.0, -1.0]")
print(f"Estimated beta: [{beta_est[0]:.3f}, {beta_est[1]:.3f}]")
print(f"Nonparametric g(z) captures the sine pattern")
print("
This correctly separates linear effects from nonlinear!")

The Professional Approach

In practice, experienced analysts often: (1) Start with exploratory nonparametric visualization; (2) Develop intuition about functional form; (3) Fit parametric models suggested by exploration; (4) Use nonparametric diagnostics to validate; (5) Report both where appropriate. This iterative refinement beats dogmatic adherence to either approach.

Application-Specific Recommendations

By Application Domain:

Different fields have different conventions and requirements. Here's guidance tailored to common application areas:

Domain-Specific Recommendations
Domain	Typical Choice	Reasoning
Econometrics	Parametric + diagnostics	Coefficient interpretation crucial; theory-driven; large samples
Epidemiology	Semiparametric / Splines	Dose-response often complex; exposure adjustment common
Environmental Science	GAMs / LOESS	Relationships often nonlinear; seasonal patterns; uncertainty
Machine Learning / Prediction	Whatever cross-validates best	Prediction accuracy is the goal; interpretability secondary
Clinical Trials	Parametric (regulated)	Regulatory requirements favor pre-specified models
Finance / Quant	Both + ensembles	Regime changes; complex dependencies; prediction-critical
Social Science	Parametric + visualization	Theory testing; but nonparametric for exploration
Engineering / Physics	Parametric (theory-based)	Physical laws constrain form; data confirm parameters

By Data Characteristics:

Time Series:

LOESS excellent for trend extraction (STL decomposition)
Combine with parametric ARIMA for forecasting

Spatial Data:

Kernel smoothing for spatial surfaces
Kriging (GP) for interpolation with uncertainty

Survival/Duration Data:

Cox model (semiparametric) is standard
Fully nonparametric for Kaplan-Meier curves

Panel/Longitudinal:

Mixed effects with nonparametric smooth of time
GAMMs (Generalized Additive Mixed Models)

The Regulatory Consideration

In regulated industries (pharmaceuticals, medical devices, some finance), there's often a preference for pre-specified parametric models that were defined before seeing the data. Nonparametric methods, while statistically sound, can face pushback as 'data dredging' if not pre-specified. Know your context!

A Practical Decision Framework

The Decision Tree:

Follow this hierarchical decision process:

1. Is d > 5 and no additive structure expected?
   → YES: Use parametric or tree-based methods
   → NO: Continue

2. Is n < 50?
   → YES: Use parametric (not enough data for nonparametric)
   → NO: Continue

3. Do you have strong theoretical basis for functional form?
   → YES: Use parametric, validate with nonparametric diagnostics
   → NO: Continue

4. Is primary goal interpretation of specific coefficients?
   → YES: Use parametric with carefully chosen form
   → NO: Continue

5. Is d ≤ 3 and you want to visualize the relationship?
   → YES: Use nonparametric (LOESS/kernel smoothing)
   → NO: Continue

6. Is prediction accuracy the main goal?
   → YES: Compare both via cross-validation, use winner
   → NO: Continue

7. Default: Use semiparametric or GAM
   Combines flexibility with structure

decision_framework.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
class MethodRecommendation(Enum):
    PARAMETRIC = "parametric"
    NONPARAMETRIC = "nonparametric"
    SEMIPARAMETRIC = "semiparametric"
    GAM = "gam"
    COMPARE_CV = "compare_via_cv"
    TREE_BASED = "tree_based"
 
@dataclass
class RegressionContext:
    """Captures the context of a regression problem."""
    n_samples: int
    n_features: int
    has_theory: bool
    need_interpretation: bool
    primary_goal: str  # 'prediction', 'inference', 'visualization'
    additive_structure_expected: bool
    domain: Optional[str] = None
 
def recommend_method(context: RegressionContext) -> dict:
    """
    Decision framework for choosing regression method.
    
    Returns recommendation and reasoning.
    """
    n = context.n_samples
    d = context.n_features
    
    # Decision tree
    if d > 5 and not context.additive_structure_expected:
        return {
            'recommendation': MethodRecommendation.TREE_BASED,
            'reasoning': f"High dimension (d={d}) without additive structure. "
                        "Use random forest, gradient boosting, or regularized linear models.",
            'alternatives': ['Ridge/Lasso regression', 'Random forest', 'XGBoost']
        }
    
    if n < 50:
        return {
            'recommendation': MethodRecommendation.PARAMETRIC,
            'reasoning': f"Small sample (n={n}). Nonparametric would have high variance. "
                        "Use simple parametric model.",
            'alternatives': ['Linear model', 'Polynomial (low degree)']
        }
    
    if context.has_theory:
        return {
            'recommendation': MethodRecommendation.PARAMETRIC,
            'reasoning': "Strong theoretical basis available. Use theory-derived model, "
                        "validate with nonparametric diagnostics.",
            'alternatives': ['Theory-specified form', 'LOESS for residual diagnostics']
        }
    
    if context.need_interpretation:
        return {
            'recommendation': MethodRecommendation.SEMIPARAMETRIC,
            'reasoning': "Need coefficient interpretation but unknown functional form. "
                        "Use GAM or partial linear model.",
            'alternatives': ['GAM (interpretable smooth effects)', 'Spline regression']
        }
    
    if d <= 3 and context.primary_goal == 'visualization':
        return {
            'recommendation': MethodRecommendation.NONPARAMETRIC,
            'reasoning': f"Low dimension (d={d}) and visualization goal. "
                        "LOESS or kernel smoothing will reveal patterns without assumptions.",
            'alternatives': ['LOESS', 'Kernel regression', 'Thin plate spline']
        }
    
    if context.primary_goal == 'prediction':
        return {
            'recommendation': MethodRecommendation.COMPARE_CV,
            'reasoning': "Prediction is the goal. Compare parametric and nonparametric "
                        "via cross-validation; use whichever performs better.",
            'alternatives': ['All reasonable models', 'Ensemble of both']
        }
    
    # Default
    if d <= 4:
        return {
            'recommendation': MethodRecommendation.GAM,
            'reasoning': "Moderate dimension, general purpose. GAM provides flexibility "
                        "with additive structure to manage curse of dimensionality.",
            'alternatives': ['GAM', 'Semiparametric', 'LOESS (if d=1)']
        }
    else:
        return {
            'recommendation': MethodRecommendation.SEMIPARAMETRIC,
            'reasoning': "Higher dimension (d>4). Use additive/semiparametric structure "
                        "to avoid curse of dimensionality.",
            'alternatives': ['GAM', 'Additive model', 'Single-index model']
        }
 
# =============================================================================
# Example usage
# =============================================================================
print("Decision Framework Examples")
print("=" * 70)
 
examples = [
    RegressionContext(n_samples=50, n_features=2, has_theory=False,
                      need_interpretation=False, primary_goal='visualization',
                      additive_structure_expected=True),
    
    RegressionContext(n_samples=5000, n_features=1, has_theory=True,
                      need_interpretation=True, primary_goal='inference',
                      additive_structure_expected=True, domain='economics'),
    
    RegressionContext(n_samples=10000, n_features=8, has_theory=False,
                      need_interpretation=False, primary_goal='prediction',
                      additive_structure_expected=False, domain='ml_competition'),
    
    RegressionContext(n_samples=200, n_features=3, has_theory=False,
                      need_interpretation=True, primary_goal='inference',
                      additive_structure_expected=True, domain='epidemiology'),
]
 
for i, context in enumerate(examples, 1):
    result = recommend_method(context)
    print(f"
Example {i}: n={context.n_samples}, d={context.n_features}, "
          f"goal={context.primary_goal}")
    print(f"  Recommendation: {result['recommendation'].value}")
    print(f"  Reasoning: {result['reasoning']}")
    print(f"  Alternatives: {result['alternatives']}")

Final Guidelines and Best Practices

Best Practices Summary

•Always visualize first: Before fitting anything, plot the data. LOESS overlay on scatterplots reveals structure that guides all subsequent choices.
•Match complexity to sample size: Don't use flexible nonparametric methods with small samples. The bias-variance tradeoff penalizes overreach.
•Respect the curse of dimensionality: For $d > 5$, use structured methods (additive, sparse) unless you have enormous datasets.
•Use nonparametric as diagnostic: Even when you fit parametric models, use nonparametric smooths to check residuals and validate assumptions.
•Cross-validate when prediction matters: Let the data decide between methods when prediction accuracy is the goal.
•Document your choices: Report bandwidth, kernel, degree, and method. Nonparametric doesn't mean unprincipled.
•Consider hybrid approaches: Semiparametric and GAMs often give the best of both worlds.
•Know your audience: Some contexts prefer interpretable parametric models; others prioritize accuracy. Match your method to your stakeholders.

Use Nonparametric When

•Dimension is low ($d \leq 3$)
•Sample size is large ($n > 500$)
•You're exploring unknown relationships
•Visualization is the goal
•You want to check parametric assumptions
•Flexibility matters more than efficiency

Use Parametric When

•Theory strongly suggests a form
•You need to interpret coefficients
•Sample size is small ($n < 100$)
•Dimension is high ($d > 5$)
•You need to extrapolate beyond data
•Regulatory/publication requirements

Summary: When to Use Nonparametric Methods

We've completed our journey through nonparametric regression. Let's consolidate the key decision principles:

Module Key Takeaways

•Local regression (LOESS) fits weighted local polynomials, providing flexible, boundary-corrected smoothing with robustifying iterations for outlier resistance.
•Kernel smoothing is the theoretical foundation—weighted local averaging with well-understood asymptotic properties and optimal kernel choices.
•Bandwidth is critical: Cross-validation, GCV, plug-in, and rule-of-thumb methods all address selecting this key hyperparameter.
•The curse of dimensionality limits nonparametric methods to low dimensions unless structure (additive, sparse, manifold) is exploited.
•Parametric vs. nonparametric is not religious—it's practical. Use nonparametric for exploration and low-D flexible fitting; parametric for interpretation and high-D.
•Hybrid methods (GAMs, semiparametric, two-stage workflows) often provide the best practical solutions.
•Context matters: Your domain, audience, regulatory environment, and goals all influence method choice.

Module Complete:

You now have a comprehensive toolkit for nonparametric regression: the methods (LOESS, kernel smoothing), the theory (bias-variance, convergence rates, curse of dimensionality), the practical tools (bandwidth selection), and the decision frameworks for choosing when to use them.

These methods form the foundation for more advanced techniques in machine learning: Gaussian processes, support vector regression, and neural network function approximation all build on the principles developed here. The intuition about local fitting, bias-variance tradeoffs, and the curse of dimensionality will serve you throughout your machine learning journey.

Module Complete

Congratulations! You've mastered nonparametric regression—from the mechanics of LOESS and kernel smoothing to the deep challenge of the curse of dimensionality. You can now apply these methods appropriately, select bandwidths wisely, and make informed decisions about when nonparametric approaches are worth their computational and sample-size costs. These skills form the foundation for advanced statistical learning.

5 / 5

Loading learning content...

Machine LearningAdvanced Regression

Nonparametric Regression

LevelAdvanced

Duration120 mins

TopicAdvanced Regression

5 / 5

When to Use Nonparametric Methods

Making the Right Choice: Parametric vs. Nonparametric

What You Will Learn

The Fundamental Tradeoff

Parametric Models:

Assume a specific functional form: $y = f(x; \boldsymbol{\theta}) + \epsilon$

Examples: Linear regression, polynomial regression, logistic regression
Pro: If form is correct, statistically optimal (efficient). Fast, interpretable.
Con: If form is wrong (misspecification), can be badly biased. All assumptions are 'baked in'.

Nonparametric Models:

Make minimal assumptions about form: $y = f(x) + \epsilon$ where $f \in \mathcal{F}$ (some rich function class).

Examples: LOESS, kernel smoothing, splines, Gaussian processes
Pro: Robust to misspecification. Adapts to whatever the data shows.
Con: Less efficient when simple forms suffice. Curse of dimensionality. Harder to interpret.

The Bias-Variance Perspective:

Aspect	Parametric (underfit)	Parametric (correct)	Nonparametric
Bias	High (if misspecified)	Low	Low (with good bandwidth)
Variance	Low	Low	Higher
MSE	Poor	Excellent	Moderate

The key insight: Parametric wins if the assumption is right; nonparametric wins if it's wrong.

This motivates the core decision framework: How confident are you in your parametric assumptions?

The Bayesian Perspective

Diagnostic Criteria for Choosing

1. Sample Size:

Small ($n < 100$): Parametric usually better. Nonparametric suffers high variance.
Moderate ($100 < n < 1000$): Both viable. Use diagnostics to guide choice.
Large ($n > 1000$): Nonparametric shines. Enough data to estimate complex patterns.

2. Dimensionality:

$d = 1$: Nonparametric nearly always practical and often illuminating.
$d = 2-4$: Nonparametric viable; interpret via surface plots.
$d > 5$: Use structured nonparametric (additive models) or parametric + validation.

3. Domain Knowledge:

Strong theory (physics, economics): Parametric is often justified. Use nonparametric to check assumptions.
Weak theory (exploratory): Nonparametric for discovery, then fit parametric if patterns emerge.
Moderate knowledge: Use both; compare predictions.

Decision Factors: Parametric vs. Nonparametric
Factor	Favors Parametric	Favors Nonparametric
Sample size	Small ($n < 100$)	Large ($n > 1000$)
Dimensionality	High ($d > 5$)	Low ($d \leq 3$)
Domain knowledge	Strong theoretical basis	Exploratory / unknown
Interpretability need	High (explain coefficients)	Low (prediction focus)
Extrapolation need	Must predict outside training range	Interpolation only
Computational budget	Limited	Generous
Confidence in form	High confidence	Low confidence or checking
Goal	Inference on parameters	Visualization or prediction

4. Task Objective:

Prediction accuracy: Use cross-validation to directly compare methods. The winner wins!
Scientific interpretation: Parametric models are more interpretable (coefficients have meaning).
Visualization: Nonparametric excels at revealing patterns without imposing structure.
Inference: Parametric (usually). Nonparametric confidence intervals exist but are complex.

5. Consequences of Error:

High cost of misspecification: Nonparametric is 'safer'—wrong, but not disastrously wrong.
High cost of variance: Parametric is more stable if assumptions are roughly correct.

Visual Diagnostics and Model Checking

Using Nonparametric Fits to Check Parametric Models:

One of the most valuable uses of nonparametric regression is diagnostic: checking whether a parametric model is reasonable.

The Overlay Test:

Fit your parametric model (e.g., linear regression)
Fit a nonparametric smoother (LOESS) to the same data
Overlay both fits on a scatterplot
If they track closely, the parametric form is adequate
If they diverge systematically, consider refining the parametric model

model_comparison_diagnostic.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
 
def compare_parametric_nonparametric(
    x: np.ndarray,
    y: np.ndarray,
    parametric_degree: int = 1,
    loess_span: float = 0.75
):
    """
    Compare parametric (polynomial) and nonparametric (LOESS) fits.
    Returns diagnostics to help choose.
    """
    n = len(x)
    x_sorted = np.sort(x)
    
    # Parametric fit (polynomial)
    poly_coeffs = np.polyfit(x, y, parametric_degree)
    poly = np.poly1d(poly_coeffs)
    y_para = poly(x_sorted)
    y_para_at_x = poly(x)
    
    # LOESS fit (simplified implementation)
    def loess_smooth(x_train, y_train, x_eval, span):
        n_train = len(x_train)
        k = int(np.ceil(span * n_train))
        y_smooth = np.zeros(len(x_eval))
        
        for j, x0 in enumerate(x_eval):
            distances = np.abs(x_train - x0)
            sorted_idx = np.argsort(distances)
            neighbors = sorted_idx[:k]
            h = distances[sorted_idx[k-1]]
            
            if h < 1e-10:
                h = 1e-10
            
            u = (x_train[neighbors] - x0) / h
            w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1)
            
            # Local linear fit
            X_loc = np.column_stack([np.ones(k), x_train[neighbors] - x0])
            W = np.diag(w)
            try:
                beta = np.linalg.solve(X_loc.T @ W @ X_loc, X_loc.T @ W @ y_train[neighbors])
                y_smooth[j] = beta[0]
            except:
                y_smooth[j] = np.average(y_train[neighbors], weights=w+1e-10)
        
        return y_smooth
    
    y_nonpara = loess_smooth(x, y, x_sorted, loess_span)
    y_nonpara_at_x = loess_smooth(x, y, x, loess_span)
    
    # Compute diagnostics
    residuals_para = y - y_para_at_x
    residuals_nonpara = y - y_nonpara_at_x
    
    mse_para = np.mean(residuals_para**2)
    mse_nonpara = np.mean(residuals_nonpara**2)
    
    # Discrepancy between models
    discrepancy = np.sqrt(np.mean((y_para - y_nonpara)**2))
    
    # Runs test on parametric residuals (tests for systematic patterns)
    signs = residuals_para > 0
    runs = 1 + np.sum(signs[1:] != signs[:-1])
    n_pos = np.sum(signs)
    n_neg = n - n_pos
    expected_runs = 1 + 2*n_pos*n_neg / n
    
    return {
        'mse_parametric': mse_para,
        'mse_nonparametric': mse_nonpara,
        'discrepancy': discrepancy,
        'runs_test': {
            'observed': runs,
            'expected': expected_runs,
            'indicates_misspec': runs < expected_runs * 0.7
        },
        'recommendation': 'nonparametric' if mse_nonpara < mse_para * 0.8 else 'parametric adequate'
    }
 
# =============================================================================
# Case 1: Linear relationship (parametric should win)
# =============================================================================
np.random.seed(42)
n = 100
x1 = np.linspace(0, 10, n)
y1 = 2 + 3*x1 + np.random.normal(0, 2, n)
 
result1 = compare_parametric_nonparametric(x1, y1, parametric_degree=1, loess_span=0.5)
 
print("Case 1: Linear Relationship")
print("=" * 60)
print(f"Parametric MSE:    {result1['mse_parametric']:.4f}")
print(f"Nonparametric MSE: {result1['mse_nonparametric']:.4f}")
print(f"Model discrepancy: {result1['discrepancy']:.4f}")
print(f"Recommendation:    {result1['recommendation']}")
 
# =============================================================================
# Case 2: Nonlinear relationship (nonparametric should reveal this)
# =============================================================================
x2 = np.linspace(0, 10, n)
y2 = 2 + 0.5*x2 + 3*np.sin(x2) + np.random.normal(0, 1, n)
 
result2 = compare_parametric_nonparametric(x2, y2, parametric_degree=1, loess_span=0.3)
 
print("
Case 2: Nonlinear Relationship (sine + linear)")
print("=" * 60)
print(f"Parametric MSE:    {result2['mse_parametric']:.4f}")
print(f"Nonparametric MSE: {result2['mse_nonparametric']:.4f}")
print(f"Model discrepancy: {result2['discrepancy']:.4f}")
print(f"Recommendation:    {result2['recommendation']}")

Residual Analysis:

After fitting a parametric model, plot residuals against $x$ with a LOESS smooth:

Flat at zero: Parametric form is adequate
Systematic pattern: Form is missing something—consider nonparametric

Partial Residual Plots (for multiple regression): Plot $e_i + \hat{\beta}j x{ij}$ against $x_j$ with LOESS overlay. Reveals whether each predictor's effect is linear.

Hybrid Strategies: Best of Both Worlds

Combining Parametric and Nonparametric:

You don't always have to choose exclusively. Several powerful strategies combine both approaches.

1. Semiparametric Models:

$$y_i = \mathbf{x}_i^T \boldsymbol{\beta} + g(z_i) + \epsilon_i$$

Some predictors enter parametrically ($\mathbf{x}$), others nonparametrically ($z$). Best when you have theory for some relationships but not others.

2. Generalized Additive Models (GAMs):

$$y_i = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \ldots + \epsilon_i$$

Each $f_j$ is estimated nonparametrically, but the additive structure is parametric. Escapes the curse while allowing flexibility.

3. Two-Stage: Nonparametric Discovery → Parametric Fitting:

Workflow:

Fit LOESS to visualize the relationship
Identify the functional form (looks quadratic? exponential? logistic?)
Fit the parametric model suggested by the nonparametric fit
Validate with cross-validation

This uses nonparametric as exploratory tool to inform parametric confirmatory model.

4. Nonparametric Bootstrap for Parametric Inference:

Fit a parametric model, but assess uncertainty via nonparametric (residual) bootstrap. Robust to distributional assumptions while keeping parametric point estimates.

5. Ensemble: Blend Predictions

Final prediction = weighted average of parametric and nonparametric predictions. Weights chosen by cross-validation. Hedges against misspecification.

hybrid_strategies.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
import numpy as np
from typing import Tuple
 
def semiparametric_regression(
    X_para: np.ndarray,
    x_nonpara: np.ndarray,
    y: np.ndarray,
    loess_span: float = 0.5,
    max_iterations: int = 20,
    tolerance: float = 1e-6
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Semiparametric model: y = X_para @ beta + g(x_nonpara) + epsilon
    
    Fit via backfitting:
    1. Fix g, estimate beta by OLS on y - g
    2. Fix beta, estimate g by LOESS on y - X_para @ beta
    3. Iterate until convergence
    """
    n = len(y)
    
    # Initialize
    g = np.zeros(n)
    beta = np.zeros(X_para.shape[1])
    
    def loess_1d(x, y_target, span):
        """Simple 1D LOESS."""
        n = len(x)
        k = int(np.ceil(span * n))
        y_smooth = np.zeros(n)
        order = np.argsort(x)
        
        for j in range(n):
            x0 = x[j]
            distances = np.abs(x - x0)
            sorted_idx = np.argsort(distances)
            neighbors = sorted_idx[:k]
            h = max(distances[sorted_idx[k-1]], 1e-10)
            
            u = (x[neighbors] - x0) / h
            w = (1 - np.abs(u)**3)**3 * (np.abs(u) < 1)
            
            if np.sum(w) > 1e-10:
                y_smooth[j] = np.average(y_target[neighbors], weights=w)
            else:
                y_smooth[j] = np.mean(y_target)
        
        return y_smooth - np.mean(y_smooth)  # Center
    
    for iteration in range(max_iterations):
        # Step 1: Fix g, estimate beta
        y_adjusted = y - g
        beta_new, _, _, _ = np.linalg.lstsq(X_para, y_adjusted, rcond=None)
        
        # Step 2: Fix beta, estimate g
        y_adjusted = y - X_para @ beta_new
        g_new = loess_1d(x_nonpara, y_adjusted, loess_span)
        
        # Check convergence
        if np.max(np.abs(g_new - g)) < tolerance and np.max(np.abs(beta_new - beta)) < tolerance:
            break
        
        beta = beta_new
        g = g_new
    
    return beta, g
 
def discovery_to_parametric_workflow(x: np.ndarray, y: np.ndarray):
    """
    Demonstrate the two-stage workflow:
    1. Nonparametric discovery
    2. Parametric fitting based on discovered form
    """
    n = len(x)
    
    # Stage 1: LOESS discovery
    def simple_loess(x, y, x_eval, span=0.5):
        k = int(np.ceil(span * len(x)))
        y_smooth = np.zeros(len(x_eval))
        
        for j, x0 in enumerate(x_eval):
            distances = np.abs(x - x0)
            sorted_idx = np.argsort(distances)
            neighbors = sorted_idx[:k]
            h = max(distances[sorted_idx[k-1]], 1e-10)
            w = (1 - (np.abs(x[neighbors] - x0) / h)**3)**3
            y_smooth[j] = np.average(y[neighbors], weights=w+1e-10)
        
        return y_smooth
    
    x_sorted = np.sort(x)
    y_loess = simple_loess(x, y, x_sorted, span=0.3)
    
    # Stage 2: Identify curvature (heuristic: compare linear vs quadratic)
    from numpy.polynomial import polynomial as P
    
    lin_coeffs = np.polyfit(x, y, 1)
    quad_coeffs = np.polyfit(x, y, 2)
    
    y_lin = np.polyval(lin_coeffs, x)
    y_quad = np.polyval(quad_coeffs, x)
    
    mse_lin = np.mean((y - y_lin)**2)
    mse_quad = np.mean((y - y_quad)**2)
    
    # F-test for quadratic term
    f_stat = (mse_lin - mse_quad) / (mse_quad / (n - 3))
    p_value = 1 - stats.f.cdf(f_stat, 1, n - 3) if 'stats' in dir() else 0.05
    
    return {
        'loess_reveals': 'curvature' if mse_quad < mse_lin * 0.85 else 'linear',
        'mse_linear': mse_lin,
        'mse_quadratic': mse_quad,
        'recommended_model': 'quadratic' if mse_quad < mse_lin * 0.85 else 'linear'
    }
 
# =============================================================================
# Demonstration
# =============================================================================
np.random.seed(42)
 
# Generate semiparametric data: y = 2*x1 - x2 + sin(z) + noise
n = 200
x1 = np.random.normal(0, 1, n)
x2 = np.random.normal(0, 1, n)
z = np.random.uniform(0, 2*np.pi, n)
y = 2*x1 - x2 + np.sin(z) + np.random.normal(0, 0.5, n)
 
X_para = np.column_stack([x1, x2])
beta_est, g_est = semiparametric_regression(X_para, z, y, loess_span=0.4)
 
print("Semiparametric Model Results")
print("=" * 50)
print(f"True beta:      [2.0, -1.0]")
print(f"Estimated beta: [{beta_est[0]:.3f}, {beta_est[1]:.3f}]")
print(f"Nonparametric g(z) captures the sine pattern")
print("
This correctly separates linear effects from nonlinear!")

The Professional Approach

Application-Specific Recommendations

By Application Domain:

Different fields have different conventions and requirements. Here's guidance tailored to common application areas:

Domain-Specific Recommendations
Domain	Typical Choice	Reasoning
Econometrics	Parametric + diagnostics	Coefficient interpretation crucial; theory-driven; large samples
Epidemiology	Semiparametric / Splines	Dose-response often complex; exposure adjustment common
Environmental Science	GAMs / LOESS	Relationships often nonlinear; seasonal patterns; uncertainty
Machine Learning / Prediction	Whatever cross-validates best	Prediction accuracy is the goal; interpretability secondary
Clinical Trials	Parametric (regulated)	Regulatory requirements favor pre-specified models
Finance / Quant	Both + ensembles	Regime changes; complex dependencies; prediction-critical
Social Science	Parametric + visualization	Theory testing; but nonparametric for exploration
Engineering / Physics	Parametric (theory-based)	Physical laws constrain form; data confirm parameters

By Data Characteristics:

Time Series:

LOESS excellent for trend extraction (STL decomposition)
Combine with parametric ARIMA for forecasting

Spatial Data:

Kernel smoothing for spatial surfaces
Kriging (GP) for interpolation with uncertainty

Survival/Duration Data:

Cox model (semiparametric) is standard
Fully nonparametric for Kaplan-Meier curves

Panel/Longitudinal:

Mixed effects with nonparametric smooth of time
GAMMs (Generalized Additive Mixed Models)

The Regulatory Consideration

A Practical Decision Framework

The Decision Tree:

Follow this hierarchical decision process:

1. Is d > 5 and no additive structure expected?
   → YES: Use parametric or tree-based methods
   → NO: Continue

2. Is n < 50?
   → YES: Use parametric (not enough data for nonparametric)
   → NO: Continue

3. Do you have strong theoretical basis for functional form?
   → YES: Use parametric, validate with nonparametric diagnostics
   → NO: Continue

4. Is primary goal interpretation of specific coefficients?
   → YES: Use parametric with carefully chosen form
   → NO: Continue

5. Is d ≤ 3 and you want to visualize the relationship?
   → YES: Use nonparametric (LOESS/kernel smoothing)
   → NO: Continue

6. Is prediction accuracy the main goal?
   → YES: Compare both via cross-validation, use winner
   → NO: Continue

7. Default: Use semiparametric or GAM
   Combines flexibility with structure

decision_framework.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
class MethodRecommendation(Enum):
    PARAMETRIC = "parametric"
    NONPARAMETRIC = "nonparametric"
    SEMIPARAMETRIC = "semiparametric"
    GAM = "gam"
    COMPARE_CV = "compare_via_cv"
    TREE_BASED = "tree_based"
 
@dataclass
class RegressionContext:
    """Captures the context of a regression problem."""
    n_samples: int
    n_features: int
    has_theory: bool
    need_interpretation: bool
    primary_goal: str  # 'prediction', 'inference', 'visualization'
    additive_structure_expected: bool
    domain: Optional[str] = None
 
def recommend_method(context: RegressionContext) -> dict:
    """
    Decision framework for choosing regression method.
    
    Returns recommendation and reasoning.
    """
    n = context.n_samples
    d = context.n_features
    
    # Decision tree
    if d > 5 and not context.additive_structure_expected:
        return {
            'recommendation': MethodRecommendation.TREE_BASED,
            'reasoning': f"High dimension (d={d}) without additive structure. "
                        "Use random forest, gradient boosting, or regularized linear models.",
            'alternatives': ['Ridge/Lasso regression', 'Random forest', 'XGBoost']
        }
    
    if n < 50:
        return {
            'recommendation': MethodRecommendation.PARAMETRIC,
            'reasoning': f"Small sample (n={n}). Nonparametric would have high variance. "
                        "Use simple parametric model.",
            'alternatives': ['Linear model', 'Polynomial (low degree)']
        }
    
    if context.has_theory:
        return {
            'recommendation': MethodRecommendation.PARAMETRIC,
            'reasoning': "Strong theoretical basis available. Use theory-derived model, "
                        "validate with nonparametric diagnostics.",
            'alternatives': ['Theory-specified form', 'LOESS for residual diagnostics']
        }
    
    if context.need_interpretation:
        return {
            'recommendation': MethodRecommendation.SEMIPARAMETRIC,
            'reasoning': "Need coefficient interpretation but unknown functional form. "
                        "Use GAM or partial linear model.",
            'alternatives': ['GAM (interpretable smooth effects)', 'Spline regression']
        }
    
    if d <= 3 and context.primary_goal == 'visualization':
        return {
            'recommendation': MethodRecommendation.NONPARAMETRIC,
            'reasoning': f"Low dimension (d={d}) and visualization goal. "
                        "LOESS or kernel smoothing will reveal patterns without assumptions.",
            'alternatives': ['LOESS', 'Kernel regression', 'Thin plate spline']
        }
    
    if context.primary_goal == 'prediction':
        return {
            'recommendation': MethodRecommendation.COMPARE_CV,
            'reasoning': "Prediction is the goal. Compare parametric and nonparametric "
                        "via cross-validation; use whichever performs better.",
            'alternatives': ['All reasonable models', 'Ensemble of both']
        }
    
    # Default
    if d <= 4:
        return {
            'recommendation': MethodRecommendation.GAM,
            'reasoning': "Moderate dimension, general purpose. GAM provides flexibility "
                        "with additive structure to manage curse of dimensionality.",
            'alternatives': ['GAM', 'Semiparametric', 'LOESS (if d=1)']
        }
    else:
        return {
            'recommendation': MethodRecommendation.SEMIPARAMETRIC,
            'reasoning': "Higher dimension (d>4). Use additive/semiparametric structure "
                        "to avoid curse of dimensionality.",
            'alternatives': ['GAM', 'Additive model', 'Single-index model']
        }
 
# =============================================================================
# Example usage
# =============================================================================
print("Decision Framework Examples")
print("=" * 70)
 
examples = [
    RegressionContext(n_samples=50, n_features=2, has_theory=False,
                      need_interpretation=False, primary_goal='visualization',
                      additive_structure_expected=True),
    
    RegressionContext(n_samples=5000, n_features=1, has_theory=True,
                      need_interpretation=True, primary_goal='inference',
                      additive_structure_expected=True, domain='economics'),
    
    RegressionContext(n_samples=10000, n_features=8, has_theory=False,
                      need_interpretation=False, primary_goal='prediction',
                      additive_structure_expected=False, domain='ml_competition'),
    
    RegressionContext(n_samples=200, n_features=3, has_theory=False,
                      need_interpretation=True, primary_goal='inference',
                      additive_structure_expected=True, domain='epidemiology'),
]
 
for i, context in enumerate(examples, 1):
    result = recommend_method(context)
    print(f"
Example {i}: n={context.n_samples}, d={context.n_features}, "
          f"goal={context.primary_goal}")
    print(f"  Recommendation: {result['recommendation'].value}")
    print(f"  Reasoning: {result['reasoning']}")
    print(f"  Alternatives: {result['alternatives']}")

Final Guidelines and Best Practices

Best Practices Summary

•Always visualize first: Before fitting anything, plot the data. LOESS overlay on scatterplots reveals structure that guides all subsequent choices.
•Match complexity to sample size: Don't use flexible nonparametric methods with small samples. The bias-variance tradeoff penalizes overreach.
•Respect the curse of dimensionality: For $d > 5$, use structured methods (additive, sparse) unless you have enormous datasets.
•Use nonparametric as diagnostic: Even when you fit parametric models, use nonparametric smooths to check residuals and validate assumptions.
•Cross-validate when prediction matters: Let the data decide between methods when prediction accuracy is the goal.
•Document your choices: Report bandwidth, kernel, degree, and method. Nonparametric doesn't mean unprincipled.
•Consider hybrid approaches: Semiparametric and GAMs often give the best of both worlds.
•Know your audience: Some contexts prefer interpretable parametric models; others prioritize accuracy. Match your method to your stakeholders.

Use Nonparametric When

•Dimension is low ($d \leq 3$)
•Sample size is large ($n > 500$)
•You're exploring unknown relationships
•Visualization is the goal
•You want to check parametric assumptions
•Flexibility matters more than efficiency

Use Parametric When

•Theory strongly suggests a form
•You need to interpret coefficients
•Sample size is small ($n < 100$)
•Dimension is high ($d > 5$)
•You need to extrapolate beyond data
•Regulatory/publication requirements

Summary: When to Use Nonparametric Methods

We've completed our journey through nonparametric regression. Let's consolidate the key decision principles:

Module Key Takeaways

•Local regression (LOESS) fits weighted local polynomials, providing flexible, boundary-corrected smoothing with robustifying iterations for outlier resistance.
•Kernel smoothing is the theoretical foundation—weighted local averaging with well-understood asymptotic properties and optimal kernel choices.
•Bandwidth is critical: Cross-validation, GCV, plug-in, and rule-of-thumb methods all address selecting this key hyperparameter.
•The curse of dimensionality limits nonparametric methods to low dimensions unless structure (additive, sparse, manifold) is exploited.
•Parametric vs. nonparametric is not religious—it's practical. Use nonparametric for exploration and low-D flexible fitting; parametric for interpretation and high-D.
•Hybrid methods (GAMs, semiparametric, two-stage workflows) often provide the best practical solutions.
•Context matters: Your domain, audience, regulatory environment, and goals all influence method choice.

Module Complete:

Module Complete

5 / 5