Loading learning content...
Throughout our journey in regression, we've fit global models—functions defined by a single set of parameters that apply uniformly across the entire input domain. Linear regression uses one slope and intercept everywhere. Even polynomial regression, despite its flexibility, commits to a single polynomial that governs all predictions.\n\nBut what if the relationship between variables changes fundamentally across different regions of the input space? What if the slope is steep for small values of $x$ but nearly flat for large values? What if there are local patterns that no single polynomial can capture?\n\nLocal regression addresses these challenges by fitting separate models in different neighborhoods of the input space. Rather than asking 'What single function best describes all the data?', it asks 'What function best describes the data near this specific point?'\n\nThis deceptively simple shift in perspective unlocks remarkable flexibility—and introduces fascinating new challenges in bias-variance tradeoffs, computational complexity, and the curse of dimensionality.
By the end of this page, you will understand: (1) The fundamental philosophy of local regression; (2) The mathematical formulation of LOESS/LOWESS; (3) Weight functions and their role in locality; (4) Local polynomial regression of various degrees; (5) The complete algorithm with implementation details; (6) Robustness extensions for outlier resistance.
From Global to Local:\n\nConsider regression as answering the question: Given a new point $x_0$, what should we predict for $y$?\n\nGlobal approach: Fit a model $f(x; \boldsymbol{\beta})$ using all training data, then evaluate $\hat{y} = f(x_0; \hat{\boldsymbol{\beta}})$.\n\nLocal approach: Fit a model using only data points near $x_0$, giving more weight to closer points.\n\nThe key insight is that locally, even complex global relationships often appear simple. A sinusoidal curve looks linear when you zoom in enough. A complicated economic trend might be well-approximated by a line within any small time window.\n\nThe Classical Motivation:\n\nLocal regression emerged from exploratory data analysis in the 1970s-1980s. Researchers like Cleveland (1979) and Cleveland & Devlin (1988) developed LOESS (LOcally Estimated Scatterplot Smoothing) as a way to visualize trends in noisy data without committing to a parametric form.
You'll see both terms in the literature. LOWESS (LOcally WEighted Scatterplot Smoothing) was Cleveland's original 1979 method using local linear fits. LOESS (1988) generalized this to local polynomial fits of any degree. Today, the terms are often used interchangeably, with LOESS being more common.
Intuition Through Visualization:\n\nImagine you want to estimate the trend at point $x_0$. The local regression approach:\n\n1. Define a neighborhood around $x_0$ containing some fraction of the data\n2. Weight the data so nearby points count more than distant ones\n3. Fit a simple model (line or low-degree polynomial) using weighted least squares\n4. Extract the prediction at $x_0$ from this local model\n5. Repeat for every point where you want a prediction\n\nThis sounds computationally expensive (fit a model for every prediction point!)—and it is! But the resulting flexibility is remarkable, and modern computing makes it practical for many applications.
The Weighted Least Squares Framework:\n\nLet $(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)$ be our training data. To predict at a target point $x_0$, we solve a weighted least squares problem:\n\n$$\min_{\beta_0, \beta_1, \ldots, \beta_p} \sum_{i=1}^{n} w_i(x_0) \left[ y_i - \beta_0 - \beta_1 (x_i - x_0) - \ldots - \beta_p (x_i - x_0)^p \right]^2$$\n\nwhere $w_i(x_0)$ is the weight assigned to observation $i$ based on its distance from $x_0$.\n\nKey components:\n- Centering at $x_0$: We use $(x_i - x_0)$ rather than $x_i$ so that $\beta_0$ directly gives us the prediction at $x_0$\n- Polynomial degree $p$: Controls local model complexity (typically $p = 0, 1, \text{ or } 2$)\n- Weight function $w_i(x_0)$: Determines how locality is defined\n\nThe prediction at $x_0$ is simply $\hat{y}(x_0) = \hat{\beta}_0$.
Matrix Formulation:\n\nFor local polynomial regression of degree $p$, define:\n\n$$\mathbf{X}_0 = \begin{bmatrix} 1 & (x_1 - x_0) & (x_1 - x_0)^2 & \cdots & (x_1 - x_0)^p \\ 1 & (x_2 - x_0) & (x_2 - x_0)^2 & \cdots & (x_2 - x_0)^p \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & (x_n - x_0) & (x_n - x_0)^2 & \cdots & (x_n - x_0)^p \end{bmatrix}$$\n\n$$\mathbf{W}_0 = \text{diag}(w_1(x_0), w_2(x_0), \ldots, w_n(x_0))$$\n\nThe weighted least squares solution is:\n\n$$\hat{\boldsymbol{\beta}}_0 = (\mathbf{X}_0^T \mathbf{W}_0 \mathbf{X}_0)^{-1} \mathbf{X}_0^T \mathbf{W}_0 \mathbf{y}$$\n\nAnd the prediction at $x_0$ is:\n\n$$\hat{y}(x_0) = \mathbf{e}_1^T \hat{\boldsymbol{\beta}}_0 = \mathbf{e}_1^T (\mathbf{X}_0^T \mathbf{W}_0 \mathbf{X}_0)^{-1} \mathbf{X}_0^T \mathbf{W}_0 \mathbf{y}$$\n\nwhere $\mathbf{e}_1 = [1, 0, 0, \ldots, 0]^T$ extracts the intercept.
Notice that $\hat{y}(x_0)$ is a linear combination of the $y_i$ values:\n$$\hat{y}(x_0) = \sum_{i=1}^{n} s_i(x_0) y_i$$\nwhere $s_i(x_0) = [\mathbf{e}_1^T (\mathbf{X}_0^T \mathbf{W}_0 \mathbf{X}_0)^{-1} \mathbf{X}_0^T \mathbf{W}_0]_i$. This makes LOESS a linear smoother, sharing properties with kernel smoothers and splines. The $s_i(x_0)$ are called the equivalent kernel weights.
The Weight Function:\n\nThe weights $w_i(x_0)$ determine how 'local' the fit is. They should satisfy:\n1. Locality: Large for points near $x_0$, small or zero for distant points\n2. Smoothness: Continuous decay prevents abrupt changes in predictions\n3. Symmetry: $w(u) = w(-u)$ treats left and right neighbors equally\n\nThe Tri-cube Weight Function (Cleveland's standard choice):\n\n$$W(u) = \begin{cases} (1 - |u|^3)^3 & \text{if } |u| < 1 \\ 0 & \text{otherwise} \end{cases}$$\n\nThis function has appealing properties:\n- Compact support: Zero outside $[-1, 1]$, so distant points are completely ignored\n- Smooth at boundaries: Both the function and its first derivative are continuous at $|u| = 1$\n- Peaked at center: Maximum weight at $u = 0$, smooth decay outward
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
import numpy as npimport matplotlib.pyplot as plt def tricube_weight(u: np.ndarray) -> np.ndarray: """ Tri-cube weight function: W(u) = (1 - |u|^3)^3 for |u| < 1, else 0. Cleveland's standard choice for LOESS. """ u = np.asarray(u) w = np.zeros_like(u, dtype=float) mask = np.abs(u) < 1 w[mask] = (1 - np.abs(u[mask])**3)**3 return w def epanechnikov_weight(u: np.ndarray) -> np.ndarray: """ Epanechnikov (parabolic) weight: W(u) = 3/4 * (1 - u^2) for |u| < 1. Optimal in certain asymptotic senses. """ u = np.asarray(u) w = np.zeros_like(u, dtype=float) mask = np.abs(u) < 1 w[mask] = 0.75 * (1 - u[mask]**2) return w def gaussian_weight(u: np.ndarray, sigma: float = 1/3) -> np.ndarray: """ Gaussian weight: W(u) = exp(-u^2 / (2*sigma^2)). Infinite support but decays rapidly. """ return np.exp(-u**2 / (2 * sigma**2)) def uniform_weight(u: np.ndarray) -> np.ndarray: """ Uniform (box) weight: W(u) = 1 for |u| < 1, else 0. Simplest but causes discontinuous predictions. """ return (np.abs(u) < 1).astype(float) # Visualize the weight functionsu = np.linspace(-1.5, 1.5, 500) fig, ax = plt.subplots(figsize=(10, 6))ax.plot(u, tricube_weight(u), 'b-', lw=2.5, label='Tri-cube (LOESS standard)')ax.plot(u, epanechnikov_weight(u), 'g--', lw=2, label='Epanechnikov')ax.plot(u, gaussian_weight(u), 'r-.', lw=2, label='Gaussian (σ=1/3)')ax.plot(u, uniform_weight(u), 'm:', lw=2, label='Uniform (box)') ax.axhline(0, color='gray', lw=0.5)ax.axvline(0, color='gray', lw=0.5)ax.axvline(-1, color='gray', lw=0.5, ls='--', alpha=0.5)ax.axvline(1, color='gray', lw=0.5, ls='--', alpha=0.5) ax.set_xlabel('Normalized distance u = (x - x₀) / h', fontsize=12)ax.set_ylabel('Weight W(u)', fontsize=12)ax.set_title('Weight Functions for Local Regression', fontsize=14)ax.legend(loc='upper right', fontsize=10)ax.grid(True, alpha=0.3)ax.set_xlim(-1.5, 1.5)ax.set_ylim(-0.1, 1.1) plt.tight_layout()plt.show()Bandwidth and the Span Parameter:\n\nThe key hyperparameter in LOESS is the bandwidth $h$ (also called the span or $\alpha$), which controls the size of the local neighborhood.\n\nTwo ways to specify bandwidth:\n\n1. Fixed bandwidth $h$: The neighborhood has a fixed width. Points within distance $h$ of $x_0$ receive non-zero weight:\n$$w_i(x_0) = W\left( \frac{x_i - x_0}{h} \right)$$\n\n2. Nearest-neighbor fraction $\alpha$ (LOESS standard): The neighborhood includes a fraction $\alpha$ of the data. The bandwidth $h(x_0)$ is the distance to the $k$-th nearest neighbor, where $k = \lfloor \alpha n \rfloor$:\n$$h(x_0) = |x_{(k)} - x_0|$$\nwhere $x_{(k)}$ is the $k$-th nearest neighbor of $x_0$.\n\nThe nearest-neighbor approach is adaptive: neighborhoods are smaller in dense regions and larger in sparse regions, promoting consistent local sample sizes.
| Span (α) | Neighborhood | Bias | Variance | Use Case |
|---|---|---|---|---|
| 0.1 - 0.2 | Very local | Low | High | Complex, wiggly patterns; large datasets |
| 0.3 - 0.5 | Moderate | Moderate | Moderate | General purpose; balanced tradeoff |
| 0.5 - 0.7 | Broad | Higher | Lower | Smoother trends; noisy data |
| 0.8 - 1.0 | Very broad | High | Very low | Captures only major trends; small datasets |
Choosing the Local Polynomial Degree:\n\nThe degree $p$ of the local polynomial affects both fitting and boundary behavior.\n\nLocal constant ($p = 0$): Nadaraya-Watson Estimator\n\nThe simplest case fits a horizontal line (constant) at each point:\n$$\hat{y}(x_0) = \frac{\sum_{i=1}^{n} w_i(x_0) y_i}{\sum_{i=1}^{n} w_i(x_0)}$$\n\nThis is simply a weighted average of nearby $y$ values. Fast to compute but suffers from boundary bias—the estimate is pulled toward the center of the local neighborhood.\n\nLocal linear ($p = 1$): The Standard Choice\n\nFits a line at each point. The key advantage: automatic boundary correction. At boundaries, the local line automatically adjusts its intercept to minimize bias.\n\nLocal quadratic ($p = 2$):\n\nFits a parabola at each point. Better captures curvature in the true function, but requires more data and is more variable.
Theory suggests that odd polynomial degrees ($p = 1, 3, 5, \ldots$) provide better bias-variance tradeoffs than even degrees at the same bandwidth. This is because odd-degree polynomials can correct for asymmetric data distributions more effectively. For this reason, $p = 1$ is almost universally preferred over $p = 0$ or $p = 2$.
Boundary Bias Illustrated:\n\nConsider estimating $f(x)$ at $x_0$ near the left boundary. With local constant fitting:\n- Most data points are to the right of $x_0$\n- The weighted average is pulled toward the right\n- If the function is increasing, we underestimate $f(x_0)$\n\nWith local linear fitting:\n- We fit a line $\hat{y} = \beta_0 + \beta_1 (x - x_0)$\n- The line extrapolates the local trend\n- The intercept $\beta_0$ is a bias-corrected estimate at $x_0$\n\nThis boundary correction is one of the most important reasons to use local linear regression over kernel smoothing with local constant fits.
Algorithm: LOESS with Local Linear Regression\n\nGiven training data $(x_1, y_1), \ldots, (x_n, y_n)$, span parameter $\alpha$, and evaluation points $x_0^{(1)}, \ldots, x_0^{(m)}$:\n\nFor each evaluation point $x_0$:\n\n1. Find $k$-nearest neighbors: $k = \lfloor \alpha n \rfloor$\n2. Compute bandwidth: $h(x_0) = $ distance to $k$-th nearest neighbor\n3. Compute weights: $w_i = W\left( \frac{x_i - x_0}{h(x_0)} \right)$ using tri-cube kernel\n4. Solve weighted least squares:\n $$\min_{\beta_0, \beta_1} \sum_{i=1}^{n} w_i [y_i - \beta_0 - \beta_1(x_i - x_0)]^2$$\n5. Extract prediction: $\hat{y}(x_0) = \hat{\beta}_0$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
import numpy as npfrom typing import Tuple, Optional def tricube(u: np.ndarray) -> np.ndarray: """Tri-cube weight function.""" u = np.asarray(u) w = np.zeros_like(u, dtype=float) mask = np.abs(u) < 1 w[mask] = (1 - np.abs(u[mask])**3)**3 return w def loess_fit(x: np.ndarray, y: np.ndarray, x_eval: Optional[np.ndarray] = None, span: float = 0.75, degree: int = 1) -> Tuple[np.ndarray, np.ndarray]: """ LOESS (Locally Estimated Scatterplot Smoothing). Parameters: x: Input values (n,) y: Target values (n,) x_eval: Points at which to evaluate (m,). If None, use x. span: Fraction of data to use in each local fit (0 < span <= 1) degree: Local polynomial degree (0, 1, or 2) Returns: x_eval: Evaluation points y_hat: Fitted values at evaluation points """ x = np.asarray(x).flatten() y = np.asarray(y).flatten() n = len(x) if x_eval is None: x_eval = x.copy() else: x_eval = np.asarray(x_eval).flatten() m = len(x_eval) y_hat = np.zeros(m) # Number of nearest neighbors k = int(np.ceil(span * n)) k = max(k, degree + 1) # Need at least degree+1 points for a fit k = min(k, n) # Can't use more points than we have for j, x0 in enumerate(x_eval): # Step 1: Find distances to all points distances = np.abs(x - x0) # Step 2: Find the k-th smallest distance (bandwidth) sorted_distances = np.sort(distances) h = sorted_distances[k - 1] # Prevent zero bandwidth if h < 1e-10: h = 1e-10 # Step 3: Compute weights u = distances / h w = tricube(u) # Step 4: Local polynomial fit # Build design matrix centered at x0 X_local = np.column_stack([(x - x0)**p for p in range(degree + 1)]) # Weighted least squares: (X'WX)^{-1} X'Wy W = np.diag(w) XtW = X_local.T @ W XtWX = XtW @ X_local XtWy = XtW @ y # Solve the normal equations try: beta = np.linalg.solve(XtWX, XtWy) except np.linalg.LinAlgError: # Singular matrix - use pseudoinverse beta = np.linalg.lstsq(X_local * w[:, np.newaxis], y * w, rcond=None)[0] # Step 5: Prediction is the intercept (since we centered at x0) y_hat[j] = beta[0] return x_eval, y_hat # =============================================================================# DEMONSTRATION# =============================================================================np.random.seed(42) # Generate data with a complex patternn = 100x = np.sort(np.random.uniform(0, 2*np.pi, n))y_true = np.sin(x) + 0.5 * np.sin(3*x) # True functiony = y_true + np.random.normal(0, 0.3, n) # Add noise # Fit LOESS with different spansspans = [0.2, 0.4, 0.7]x_fine = np.linspace(0, 2*np.pi, 200) print("LOESS Demonstration")print("=" * 50)print(f"{'Span':>8} | {'Mean Squared Error':>20}")print("-" * 50) for span in spans: _, y_hat = loess_fit(x, y, x_eval=x_fine, span=span, degree=1) _, y_hat_train = loess_fit(x, y, span=span, degree=1) mse = np.mean((y - y_hat_train)**2) print(f"{span:>8.2f} | {mse:>20.6f}")LOESS has O(n²) complexity for fitting all training points, since each of the $n$ predictions requires examining all $n$ data points. For large datasets, consider: (1) evaluating at fewer points than training points; (2) using approximate nearest neighbors; (3) switching to kernel smoothing with fixed bandwidth; or (4) using binning/interpolation strategies.
The Outlier Problem:\n\nStandard LOESS uses least squares, which is notoriously sensitive to outliers. A single aberrant point can substantially distort the local fit, affecting predictions in its neighborhood.\n\nCleveland's Robustifying Procedure:\n\nCleveland (1979) proposed an iterative reweighting scheme that down-weights observations with large residuals:\n\n1. Initial fit: Compute $\hat{y}_i$ using standard LOESS\n2. Compute residuals: $r_i = y_i - \hat{y}_i$\n3. Scale residuals: $u_i = r_i / (6 \cdot \text{median}|r_i|)$\n4. Compute robustness weights: $\delta_i = B(u_i)$ where $B$ is the bisquare function\n5. Refit: Multiply original weights by $\delta_i$ and recompute LOESS\n6. Iterate steps 2-5 (typically 2-3 times)
The Bisquare Weight Function:\n\n$$B(u) = \begin{cases} (1 - u^2)^2 & \text{if } |u| < 1 \\ 0 & \text{otherwise} \end{cases}$$\n\nThis function:\n- Gives weight 1 to residuals much smaller than $6 \cdot \text{median}|r|$\n- Gradually down-weights larger residuals\n- Completely ignores residuals larger than $6 \cdot \text{median}|r|$\n\nThe factor of 6 is chosen so that under Gaussian errors, roughly 99.7% of observations get non-zero robustness weights in the initial iteration.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
import numpy as npfrom typing import Tuple def bisquare(u: np.ndarray) -> np.ndarray: """Bisquare weight function for robustifying.""" u = np.asarray(u) w = np.zeros_like(u, dtype=float) mask = np.abs(u) < 1 w[mask] = (1 - u[mask]**2)**2 return w def robust_loess(x: np.ndarray, y: np.ndarray, span: float = 0.75, degree: int = 1, robustifying_iterations: int = 3) -> Tuple[np.ndarray, np.ndarray]: """ Robust LOESS with iterative reweighting for outlier resistance. Parameters: x: Input values y: Target values span: Fraction of data used in local fits degree: Local polynomial degree robustifying_iterations: Number of robustifying iterations (0 = none) Returns: x: Input values (sorted) y_hat: Fitted values """ x = np.asarray(x).flatten() y = np.asarray(y).flatten() n = len(x) # Sort by x for consistency order = np.argsort(x) x = x[order] y = y[order] k = int(np.ceil(span * n)) k = max(k, degree + 1) k = min(k, n) # Initialize robustness weights to 1 delta = np.ones(n) for iteration in range(robustifying_iterations + 1): y_hat = np.zeros(n) for j in range(n): x0 = x[j] # Distances and bandwidth distances = np.abs(x - x0) sorted_distances = np.sort(distances) h = max(sorted_distances[k - 1], 1e-10) # Combined weights: locality * robustness w = (1 - np.abs((x - x0) / h)**3)**3 * (np.abs((x - x0) / h) < 1) w = w * delta # Apply robustness weights # Local polynomial fit X_local = np.column_stack([(x - x0)**p for p in range(degree + 1)]) try: W = np.diag(w) XtWX = X_local.T @ W @ X_local XtWy = X_local.T @ W @ y beta = np.linalg.solve(XtWX, XtWy) except: # Fallback for singular cases beta = np.zeros(degree + 1) beta[0] = np.average(y, weights=w + 1e-10) y_hat[j] = beta[0] # Compute robustness weights for next iteration if iteration < robustifying_iterations: residuals = y - y_hat s = np.median(np.abs(residuals)) # Median absolute deviation if s > 1e-10: u = residuals / (6.0 * s) delta = bisquare(u) else: break # Perfect fit, no need to continue return x, y_hat # Demonstration with outliersnp.random.seed(42)n = 80x = np.sort(np.random.uniform(0, 4, n))y_true = 2 * np.sin(x)y = y_true + np.random.normal(0, 0.3, n) # Add some outliersoutlier_indices = [10, 25, 50, 65]y[outlier_indices] += np.array([3, -4, 5, -3.5]) # Compare standard and robust LOESS_, y_standard = robust_loess(x, y, span=0.3, robustifying_iterations=0)_, y_robust = robust_loess(x, y, span=0.3, robustifying_iterations=3) mse_standard = np.mean((y_standard - y_true)**2)mse_robust = np.mean((y_robust - y_true)**2) print("Robust LOESS vs Standard LOESS (with 4 outliers)")print("=" * 50)print(f"Standard LOESS MSE: {mse_standard:.4f}")print(f"Robust LOESS MSE: {mse_robust:.4f}")print(f"Improvement: {100*(mse_standard - mse_robust)/mse_standard:.1f}%")Use robustifying iterations when: (1) Data may contain outliers or recording errors; (2) The noise distribution is heavy-tailed; (3) You want a 'safe' default that works well across scenarios. The computational overhead is typically 2-3x, but the protection against outliers is often worth it.
Bias-Variance Tradeoff in Local Regression:\n\nFor local linear regression with bandwidth $h$, the asymptotic bias and variance at a point $x_0$ are:\n\nBias:\n$$\text{Bias}[\hat{f}(x_0)] \approx \frac{h^2}{2} f''(x_0) \mu_2$$\n\nwhere $\mu_2 = \int u^2 K(u) du$ for kernel $K$.\n\nVariance:\n$$\text{Var}[\hat{f}(x_0)] \approx \frac{\sigma^2}{n h p(x_0)} \nu_0$$\n\nwhere $p(x_0)$ is the density of $x$ at $x_0$, and $\nu_0 = \int K(u)^2 du$.\n\nKey insights:\n- Bias increases with $h^2$: larger bandwidth → more smoothing → more bias\n- Bias depends on $f''(x_0)$: flatter regions can tolerate larger bandwidth\n- Variance decreases with $nh$: larger bandwidth → more data → less variance\n- Variance depends inversely on local data density $p(x_0)$
Optimal Bandwidth:\n\nMinimizing the asymptotic mean squared error (MSE) yields the optimal bandwidth:\n\n$$h_{\text{opt}}(x_0) \propto \left( \frac{\sigma^2}{n p(x_0) [f''(x_0)]^2} \right)^{1/5}$$\n\nConvergence rate:\n$$\text{MSE}[\hat{f}(x_0)] = O(n^{-4/5})$$\n\nThis is slower than the parametric rate of $O(n^{-1})$ but faster than the local constant rate of $O(n^{-2/3})$ (for local linear vs. local constant).\n\nPractical implication: The $n^{-4/5}$ rate means you need more data to achieve the same accuracy as a correctly specified parametric model—the price of flexibility.
| Method | Convergence Rate | Required n for ε=0.01 error |
|---|---|---|
| Parametric (correctly specified) | O(n⁻¹) | ~100 |
| Local linear regression | O(n⁻⁴ᐟ⁵) | ~3,200 |
| Local constant (Nadaraya-Watson) | O(n⁻²ᐟ³) | ~10,000 |
| Nonparametric in d dimensions | O(n⁻⁴ᐟ⁽⁴⁺ᵈ⁾) | Explodes with d |
LOESS has effective degrees of freedom defined as $\text{tr}(\mathbf{S})$ where $\mathbf{S}$ is the smoothing matrix ($\hat{\mathbf{y}} = \mathbf{S} \mathbf{y}$). This quantifies the 'complexity' of the fit and is useful for model comparison. Smaller bandwidth → more degrees of freedom → more flexible fit.
We've established the complete framework for local regression, from intuition to rigorous implementation. Let's consolidate the key concepts:
What's Next:\n\nLOESS is one approach to local fitting. The next page explores kernel smoothing—a closely related technique that uses fixed bandwidth and has a cleaner theoretical foundation. We'll see how kernel smoothers relate to LOESS and when each is preferred.
You now understand local regression from theory to implementation. LOESS provides a powerful, flexible tool for exploring nonlinear relationships without parametric assumptions. Next, we'll explore kernel smoothing and see how it complements and extends these ideas.