Machine LearningAdvanced Regression

Polynomial Regression

LevelIntermediate

Duration90 mins

TopicAdvanced Regression

5 / 5

Piecewise Polynomials

Local Control: The Power of Piecewise Polynomials

Global polynomials have a fundamental limitation: every coefficient affects the entire curve. Change one data point, and the fitted polynomial adjusts everywhere—sometimes wildly. This global coupling is often undesirable.\n\nPiecewise polynomials solve this by dividing the domain into segments and fitting separate low-degree polynomials to each. This provides:\n\n- Local control: Changes in one region don't affect others\n- Flexibility: Complex patterns from simple pieces\n- Stability: Low-degree polynomials avoid wild oscillations\n- Interpretability: Each piece has a clear local meaning\n\nWith appropriate continuity constraints at the boundaries (knots), piecewise polynomials form splines—the workhorse of modern nonparametric regression and the foundation of countless applications from computer graphics to statistical smoothing.

What You Will Learn

By the end of this page, you will understand: (1) Basic piecewise polynomial construction and regression; (2) Continuity constraints and polynomial splines; (3) The basis function representation for efficient computation; (4) Knot selection strategies; (5) Introduction to B-splines as the practical computational basis.

Basic Piecewise Polynomial Regression

The Simplest Case: Piecewise Constant (Step Function)\n\nDivide the domain into $K$ intervals using knots $\xi_1 < \xi_2 < \ldots < \xi_{K-1}$. Fit a constant within each interval:\n\n$$f(x) = \begin{cases} c_1 & x < \xi_1 \\ c_2 & \xi_1 \leq x < \xi_2 \\ \vdots \\ c_K & x \geq \xi_{K-1} \end{cases}$$\n\nThis has $K$ parameters, one per interval. It's called a step function or piecewise constant regression.\n\nPiecewise Linear:\n\nFit a separate line in each interval:\n$$f(x) = a_k + b_k x \quad \text{for } x \in [\xi_{k-1}, \xi_k)$$\n\nThis has $2K$ parameters. Without constraints, the lines don't connect at knots—creating discontinuities.

piecewise_basic.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
from typing import List, Tuple
 
def piecewise_constant_fit(x: np.ndarray, y: np.ndarray,
                            knots: List[float]) -> Tuple[np.ndarray, np.ndarray]:
    """
    Fit piecewise constant regression (step function).
    
    Parameters:
        x, y: Data
        knots: Interior knot positions
        
    Returns:
        constants: Fitted constant for each interval
        intervals: List of (low, high) for each interval
    """
    knots = sorted(knots)
    all_boundaries = [float('-inf')] + knots + [float('inf')]
    
    K = len(knots) + 1  # Number of intervals
    constants = np.zeros(K)
    intervals = []
    
    for k in range(K):
        low, high = all_boundaries[k], all_boundaries[k + 1]
        intervals.append((low, high))
        
        # Points in this interval
        mask = (x >= low) & (x < high)
        if np.any(mask):
            constants[k] = np.mean(y[mask])
        else:
            constants[k] = np.nan
    
    return constants, intervals
 
def piecewise_linear_unconstrained(x: np.ndarray, y: np.ndarray,
                                    knots: List[float]) -> dict:
    """
    Fit piecewise linear regression WITHOUT continuity constraints.
    Results in discontinuous fit at knots.
    """
    knots = sorted(knots)
    all_boundaries = [-np.inf] + knots + [np.inf]
    
    K = len(knots) + 1
    results = {'segments': [], 'knots': knots}
    
    for k in range(K):
        low, high = all_boundaries[k], all_boundaries[k + 1]
        
        mask = (x >= low) & (x < high) if k < K - 1 else (x >= low) & (x <= high)
        
        if np.sum(mask) >= 2:
            # Fit line in this segment
            x_seg, y_seg = x[mask], y[mask]
            A = np.column_stack([np.ones_like(x_seg), x_seg])
            coeffs = np.linalg.lstsq(A, y_seg, rcond=None)[0]
            intercept, slope = coeffs
        else:
            intercept, slope = np.nan, 0
        
        results['segments'].append({
            'interval': (low, high),
            'intercept': intercept,
            'slope': slope
        })
    
    return results
 
def predict_piecewise(x_new: np.ndarray, fit_result: dict) -> np.ndarray:
    """Predict using fitted piecewise model."""
    y_pred = np.zeros_like(x_new)
    
    for seg in fit_result['segments']:
        low, high = seg['interval']
        mask = (x_new >= low) & (x_new < high)
        if high == np.inf:
            mask = (x_new >= low)
        
        y_pred[mask] = seg['intercept'] + seg['slope'] * x_new[mask]
    
    return y_pred
 
# Example
np.random.seed(42)
x = np.linspace(0, 10, 80)
# True function with distinct regions
y_true = np.where(x < 3, 2 + x, 
                   np.where(x < 7, 8 - 0.5*x, 3 + 0.3*x))
y = y_true + np.random.normal(0, 0.4, len(x))
 
# Fit piecewise linear with knots at regime changes
knots = [3, 7]
result = piecewise_linear_unconstrained(x, y, knots)
 
print("Piecewise Linear Regression (Unconstrained)")
print("=" * 50)
print(f"Knots at: {knots}")
print("-" * 50)
for i, seg in enumerate(result['segments']):
    lo, hi = seg['interval']
    lo_str = f"{lo:.1f}" if lo > -np.inf else "-∞"
    hi_str = f"{hi:.1f}" if hi < np.inf else "∞"
    print(f"Segment {i+1} [{lo_str}, {hi_str}): "
          f"y = {seg['intercept']:.3f} + {seg['slope']:.3f}x")
 
# Check discontinuity at knots
print("\nDiscontinuity check at knots:")
for knot in knots:
    left = result['segments'][knots.index(knot)]['intercept'] + \
           result['segments'][knots.index(knot)]['slope'] * knot
    right = result['segments'][knots.index(knot)+1]['intercept'] + \
            result['segments'][knots.index(knot)+1]['slope'] * knot
    print(f"  At x={knot}: left limit = {left:.3f}, right limit = {right:.3f}, "
          f"jump = {abs(right - left):.3f}")

The Discontinuity Problem

Unconstrained piecewise polynomials create jumps at knots. While sometimes desired (e.g., modeling discrete regime changes), usually we want smooth curves. The next section shows how to add continuity constraints.

Continuity Constraints and Polynomial Splines

Adding Continuity:\n\nTo eliminate discontinuities, we impose continuity constraints at each knot. The level of smoothness is characterized by the number of continuous derivatives:\n\n| Smoothness | Constraint at Knot $\xi$ | Result |\n|------------|---------------------------|--------|\n| $C^0$ | $f(\xi^-) = f(\xi^+)$ | Connected (no jumps) |\n| $C^1$ | + $f'(\xi^-) = f'(\xi^+)$ | Smooth (no kinks) |\n| $C^2$ | + $f''(\xi^-) = f''(\xi^+)$ | Very smooth (continuous curvature) |\n\nDefinition: Polynomial Spline of Degree $d$\n\nA function $s(x)$ is a spline of degree $d$ with knots $\xi_1, \ldots, \xi_K$ if:\n1. On each interval $[\xi_k, \xi_{k+1}]$, $s(x)$ is a polynomial of degree $\leq d$\n2. $s(x)$ has continuous derivatives up to order $(d-1)$ everywhere\n\nThe most common choice is the cubic spline ($d = 3$), which has $C^2$ continuity.

Degrees of Freedom:\n\nFor a degree-$d$ spline with $K$ interior knots:\n\n- Unconstrained: Each of $(K+1)$ intervals has $(d+1)$ parameters → $(K+1)(d+1)$ total\n- Constraints: At each of $K$ knots, we impose $d$ constraints (continuity of $f, f', \ldots, f^{(d-1)}$) → $Kd$ constraints\n- Net parameters: $(K+1)(d+1) - Kd = K + d + 1$\n\nFor a cubic spline ($d = 3$) with $K$ knots: $K + 4$ parameters.\n\nThis is remarkably efficient! A cubic spline with 5 knots has only 9 parameters but can represent complex, smooth curves that would require degree-8 or higher global polynomials.

continuous_piecewise.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
import numpy as np
from typing import List
 
def linear_spline_basis(x: np.ndarray, knots: List[float]) -> np.ndarray:
    """
    Build basis functions for continuous piecewise linear regression.
    
    Basis: {1, x, (x - ξ₁)₊, (x - ξ₂)₊, ...}
    where (x - ξ)₊ = max(0, x - ξ) is the "positive part" or "hinge" function.
    
    This parameterization automatically ensures C⁰ continuity.
    """
    n = len(x)
    K = len(knots)
    
    # Design matrix: [1, x, (x-ξ₁)₊, (x-ξ₂)₊, ...]
    Phi = np.zeros((n, 2 + K))
    Phi[:, 0] = 1        # Intercept
    Phi[:, 1] = x        # Linear term
    
    for k, xi in enumerate(knots):
        # Positive part: (x - ξ)₊ = max(0, x - ξ)
        Phi[:, 2 + k] = np.maximum(0, x - xi)
    
    return Phi
 
def cubic_spline_basis(x: np.ndarray, knots: List[float]) -> np.ndarray:
    """
    Build truncated power basis for cubic spline.
    
    Basis: {1, x, x², x³, (x - ξ₁)³₊, (x - ξ₂)³₊, ...}
    
    This ensures C² continuity automatically.
    Note: Truncated power basis has numerical issues; B-splines are preferred.
    """
    n = len(x)
    K = len(knots)
    
    # Design matrix: [1, x, x², x³, (x-ξ₁)³₊, ...]
    Phi = np.zeros((n, 4 + K))
    Phi[:, 0] = 1
    Phi[:, 1] = x
    Phi[:, 2] = x**2
    Phi[:, 3] = x**3
    
    for k, xi in enumerate(knots):
        # Cubic positive part: (x - ξ)³₊
        Phi[:, 4 + k] = np.maximum(0, x - xi)**3
    
    return Phi
 
def fit_spline(x: np.ndarray, y: np.ndarray, knots: List[float],
               degree: int = 3, regularization: float = 0.0) -> dict:
    """
    Fit polynomial spline using truncated power basis.
    """
    if degree == 1:
        Phi = linear_spline_basis(x, knots)
    elif degree == 3:
        Phi = cubic_spline_basis(x, knots)
    else:
        raise ValueError("Only degree 1 and 3 implemented")
    
    # Fit with optional regularization
    if regularization > 0:
        reg = regularization * np.eye(Phi.shape[1])
        reg[0, 0] = 0  # Don't regularize intercept
        A = Phi.T @ Phi + reg
        beta = np.linalg.solve(A, Phi.T @ y)
    else:
        beta = np.linalg.lstsq(Phi, y, rcond=None)[0]
    
    return {
        'coefficients': beta,
        'knots': knots,
        'degree': degree,
        'basis': Phi
    }
 
def predict_spline(x_new: np.ndarray, fit_result: dict) -> np.ndarray:
    """Predict using fitted spline."""
    if fit_result['degree'] == 1:
        Phi_new = linear_spline_basis(x_new, fit_result['knots'])
    else:
        Phi_new = cubic_spline_basis(x_new, fit_result['knots'])
    return Phi_new @ fit_result['coefficients']
 
# Demonstration: Compare piecewise linear vs cubic spline
np.random.seed(42)
x = np.linspace(0, 10, 60)
y_true = np.sin(x * 0.8) + 0.2 * x
y = y_true + np.random.normal(0, 0.3, len(x))
 
knots = [2.5, 5.0, 7.5]
 
print("Spline Regression Comparison")
print("=" * 60)
print(f"Knots at: {knots}")
print("-" * 60)
 
# Linear spline
fit_linear = fit_spline(x, y, knots, degree=1)
y_pred_linear = predict_spline(x, fit_linear)
mse_linear = np.mean((y - y_pred_linear)**2)
print(f"Linear spline (C⁰): {len(fit_linear['coefficients'])} parameters, MSE={mse_linear:.4f}")
 
# Cubic spline
fit_cubic = fit_spline(x, y, knots, degree=3)
y_pred_cubic = predict_spline(x, fit_cubic)
mse_cubic = np.mean((y - y_pred_cubic)**2)
print(f"Cubic spline (C²):  {len(fit_cubic['coefficients'])} parameters, MSE={mse_cubic:.4f}")
 
# Verify smoothness of cubic spline
print("\nSmoothness verification at knot x=5.0:")
eps = 0.001
x_check = np.array([5.0 - eps, 5.0 + eps])
y_check = predict_spline(x_check, fit_cubic)
print(f"  f(5-ε) = {y_check[0]:.6f}")
print(f"  f(5+ε) = {y_check[1]:.6f}")
print(f"  Difference = {abs(y_check[1] - y_check[0]):.8f} (should be ≈ 0)")

The Truncated Power Basis

The functions $(x - \xi)_+^d$ are called truncated power functions. They're zero before the knot and polynomial after. This basis automatically encodes the continuity constraints! However, truncated power functions have poor numerical properties—B-splines (next topic) are the practical choice.

Natural Splines: Boundary Conditions

The Boundary Problem:\n\nCubic splines can behave erratically beyond the outermost knots—they're still cubic there, so they can curve wildly. Natural splines solve this by constraining the spline to be linear beyond the boundary knots.\n\nNatural Cubic Spline Constraints:\n\n$$s''(x) = 0 \quad \text{for } x < \xi_1 \text{ and } x > \xi_K$$\n\nThis imposes 2 additional constraints, reducing degrees of freedom from $(K + 4)$ to $(K + 2)$ for $K$ interior knots.\n\nWhy Natural Splines Are Optimal:\n\nAmong all functions $g(x)$ with two continuous derivatives that interpolate the data, the natural cubic spline minimizes the roughness penalty:\n\n$$\int_{-\infty}^{\infty} [g''(x)]^2 , dx$$\n\nThis remarkable result shows natural cubic splines are the "smoothest" interpolants in a precise mathematical sense.

natural_splines.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import numpy as np
from scipy.interpolate import CubicSpline
 
def natural_cubic_spline_basis(x: np.ndarray, knots: np.ndarray) -> np.ndarray:
    """
    Build natural cubic spline basis functions.
    
    Uses the transformation that creates basis functions with 
    second derivative = 0 at boundaries.
    
    This is more complex than truncated power; we use a formula
    that directly encodes the natural spline constraints.
    """
    n = len(x)
    K = len(knots)
    
    if K < 2:
        raise ValueError("Need at least 2 knots for natural spline")
    
    # For natural splines, we have K degrees of freedom
    # (K-2 interior + 2 for linear part)
    # The basis can be written more conveniently using a different formulation
    
    # Simple approach: use [1, x, N_1(x), N_2(x), ...]
    # where N_k are derived from the natural spline construction
    
    def d_k(x, xi_k, xi_K):
        """Helper function for natural basis."""
        return (np.maximum(0, x - xi_k)**3 - np.maximum(0, x - xi_K)**3) / (xi_K - xi_k)
    
    xi_K = knots[-1]
    xi_K_minus_1 = knots[-2]
    
    Phi = np.zeros((n, K))
    Phi[:, 0] = 1  # Intercept
    Phi[:, 1] = x  # Linear term
    
    # Natural spline basis functions
    for k in range(K - 2):
        d_k_val = d_k(x, knots[k], xi_K)
        d_K_minus_1_val = d_k(x, xi_K_minus_1, xi_K)
        Phi[:, k + 2] = d_k_val - d_K_minus_1_val
    
    return Phi
 
def fit_natural_spline(x: np.ndarray, y: np.ndarray, 
                        n_knots: int = 5,
                        knot_positions: np.ndarray = None) -> dict:
    """
    Fit natural cubic spline.
    
    Parameters:
        x, y: Data
        n_knots: Number of knots (placed at quantiles if positions not given)
        knot_positions: Explicit knot positions
    """
    if knot_positions is None:
        # Place knots at quantiles of x
        percentiles = np.linspace(0, 100, n_knots + 2)[1:-1]
        knot_positions = np.percentile(x, percentiles)
    
    Phi = natural_cubic_spline_basis(x, knot_positions)
    
    # OLS fit
    beta = np.linalg.lstsq(Phi, y, rcond=None)[0]
    
    return {
        'coefficients': beta,
        'knots': knot_positions,
        'basis': Phi
    }
 
# Compare cubic spline vs natural spline extrapolation
np.random.seed(42)
x_train = np.linspace(1, 9, 40)
y_true = np.sin(x_train * 0.5) + 0.1 * x_train
y_train = y_true + np.random.normal(0, 0.2, len(x_train))
 
# Extended range for extrapolation test
x_test = np.linspace(-2, 12, 100)
 
# Using scipy for comparison
spline_regular = CubicSpline(x_train, y_train, bc_type='not-a-knot')
spline_natural = CubicSpline(x_train, y_train, bc_type='natural')
 
y_regular = spline_regular(x_test)
y_natural = spline_natural(x_test)
 
print("Extrapolation Behavior: Regular vs Natural Cubic Spline")
print("=" * 60)
print("Training data in [1, 9], testing including extrapolation to [-2, 12]")
print("-" * 60)
 
# Check values at extrapolation points
extrap_points = [-2, 0, 10, 12]
print(f"{'x':>6} | {'Regular':>12} | {'Natural':>12} | {'Difference':>12}")
print("-" * 60)
for x_pt in extrap_points:
    reg = spline_regular(x_pt)
    nat = spline_natural(x_pt)
    print(f"{x_pt:>6.1f} | {reg:>12.4f} | {nat:>12.4f} | {abs(reg - nat):>12.4f}")
 
print("\nNote: Natural spline is LINEAR outside [1, 9], avoiding wild swings")
print("      Regular spline continues cubic, often giving unrealistic values")

When to Use Natural Splines

Use natural splines when: (1) you may need to extrapolate slightly beyond data range, (2) you want stable boundary behavior, or (3) you seek the 'smoothest' interpolant. The cost is slightly less flexibility near boundaries (forced linearity), but this is usually beneficial.

Knot Selection Strategies

The Knot Placement Problem:\n\nKnot positions heavily influence spline fit quality. Too few knots → underfitting; too many → overfitting. Poor placement → wasted flexibility in some regions, insufficient in others.\n\nCommon Strategies:

Knot Placement Strategies
Strategy	Method	Pros	Cons
Uniform	Equal spacing on x	Simple, predictable	Ignores data density
Quantile	Equal # of points between knots	Adapts to data density	May cluster in flat regions
Domain knowledge	Place at known change points	Captures true structure	Requires knowledge
Cross-validation	Try many placements, pick best CV	Optimal for prediction	Computationally expensive
Free knots	Optimize positions as parameters	Maximum flexibility	Non-convex optimization

knot_selection.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
import numpy as np
from scipy.interpolate import UnivariateSpline, LSQUnivariateSpline
from typing import List
 
def cross_validate_knot_number(x: np.ndarray, y: np.ndarray,
                                 knot_counts: List[int],
                                 k_folds: int = 5) -> dict:
    """
    Use cross-validation to select optimal number of knots.
    """
    np.random.seed(42)
    n = len(x)
    
    # Create fold indices
    indices = np.random.permutation(n)
    fold_size = n // k_folds
    
    results = []
    
    for n_knots in knot_counts:
        fold_errors = []
        
        for fold in range(k_folds):
            # Split data
            val_start, val_end = fold * fold_size, (fold + 1) * fold_size
            val_idx = indices[val_start:val_end]
            train_idx = np.concatenate([indices[:val_start], indices[val_end:]])
            
            x_train, y_train = x[train_idx], y[train_idx]
            x_val, y_val = x[val_idx], y[val_idx]
            
            # Sort training data (required by scipy splines)
            sort_idx = np.argsort(x_train)
            x_train_sorted = x_train[sort_idx]
            y_train_sorted = y_train[sort_idx]
            
            # Place knots at quantiles of training data
            if n_knots > 0:
                percentiles = np.linspace(0, 100, n_knots + 2)[1:-1]
                knots = np.percentile(x_train_sorted, percentiles)
                
                # Ensure knots are strictly interior
                knots = knots[(knots > x_train_sorted.min()) & 
                              (knots < x_train_sorted.max())]
                
                if len(knots) < 2:
                    knots = np.linspace(x_train_sorted.min() + 0.1,
                                        x_train_sorted.max() - 0.1, n_knots)
                
                try:
                    spline = LSQUnivariateSpline(x_train_sorted, y_train_sorted,
                                                  knots, k=3)
                    y_pred = spline(x_val)
                    mse = np.mean((y_val - y_pred)**2)
                except:
                    mse = np.inf
            else:
                # Global polynomial (no knots)
                coefs = np.polyfit(x_train_sorted, y_train_sorted, 3)
                y_pred = np.polyval(coefs, x_val)
                mse = np.mean((y_val - y_pred)**2)
            
            if np.isfinite(mse):
                fold_errors.append(mse)
        
        if fold_errors:
            results.append({
                'n_knots': n_knots,
                'cv_error': np.mean(fold_errors),
                'cv_std': np.std(fold_errors) / np.sqrt(len(fold_errors))
            })
    
    return results
 
def select_optimal_knots(x: np.ndarray, y: np.ndarray,
                          max_knots: int = 15) -> dict:
    """
    Automatic knot count selection using cross-validation.
    """
    knot_counts = list(range(0, max_knots + 1, 2))  # 0, 2, 4, ..., max
    
    results = cross_validate_knot_number(x, y, knot_counts)
    
    if not results:
        return {'optimal_knots': 0, 'message': 'CV failed'}
    
    # Find minimum CV error
    best = min(results, key=lambda r: r['cv_error'])
    
    # One-SE rule: simplest model within 1 SE of best
    threshold = best['cv_error'] + best['cv_std']
    one_se = min([r for r in results if r['cv_error'] <= threshold],
                 key=lambda r: r['n_knots'])
    
    return {
        'results': results,
        'best_cv': best['n_knots'],
        'one_se': one_se['n_knots'],
        'best_cv_error': best['cv_error']
    }
 
# Example: automatic knot selection
np.random.seed(42)
x = np.sort(np.random.uniform(0, 10, 100))
# Complex function requiring multiple knots
y_true = 2*np.sin(x) + 0.5*np.sin(3*x) + 0.1*x**2
y = y_true + np.random.normal(0, 0.5, len(x))
 
print("Automatic Knot Selection via Cross-Validation")
print("=" * 60)
 
selection = select_optimal_knots(x, y, max_knots=12)
 
print(f"{'Knots':>8} | {'CV Error':>12} | {'Std Err':>10}")
print("-" * 40)
for r in selection['results']:
    marker = ""
    if r['n_knots'] == selection['best_cv']:
        marker = " ← Best CV"
    elif r['n_knots'] == selection['one_se']:
        marker = " ← 1-SE rule"
    print(f"{r['n_knots']:>8} | {r['cv_error']:>12.4f} | {r['cv_std']:>10.4f}{marker}")
 
print(f"\nRecommendation: {selection['one_se']} knots (1-SE rule)")

Practical Knot Selection

Rule of thumb: Start with 3-5 knots placed at quantiles, then adjust based on residual patterns. For automatic selection, cross-validate over knot counts from 0 to n/10 (no more than 10-15 usually needed). The one-SE rule provides robustness.

Introduction to B-Splines

The Problem with Truncated Power Basis:\n\nWhile mathematically convenient, truncated power functions $(x - \xi)_+^d$ have serious numerical issues:\n- They overlap extensively (high correlation)\n- Values can be very large far from knots\n- Condition numbers grow rapidly with number of knots\n\nB-Splines: The Practical Solution\n\nB-splines (Basis splines) are an alternative basis with remarkable properties:\n1. Local support: Each basis function is non-zero only on a small interval\n2. Partition of unity: $\sum_i B_i(x) = 1$ everywhere\n3. Non-negative: $B_i(x) \geq 0$ always\n4. Numerically stable: Computed via stable recursion\n5. Well-conditioned: Design matrix has bounded condition number

B-Spline Definition (Cox-de Boor Recursion):\n\nFor knot sequence $t_0 \leq t_1 \leq \ldots \leq t_m$, B-splines of degree $d$ are defined recursively:\n\nDegree 0 (piecewise constant):\n$$B_{i,0}(x) = \begin{cases} 1 & t_i \leq x < t_{i+1} \\ 0 & \text{otherwise} \end{cases}$$\n\nDegree $d$ (recursive):\n$$B_{i,d}(x) = \frac{x - t_i}{t_{i+d} - t_i} B_{i,d-1}(x) + \frac{t_{i+d+1} - x}{t_{i+d+1} - t_{i+1}} B_{i+1,d-1}(x)$$\n\nEach degree-$d$ B-spline is non-zero only on the interval $[t_i, t_{i+d+1})$—it has local support.

b_splines.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import numpy as np
from scipy.interpolate import BSpline, make_lsq_spline
 
def bspline_basis(x: np.ndarray, knots: np.ndarray, degree: int = 3) -> np.ndarray:
    """
    Evaluate B-spline basis functions at points x.
    
    Parameters:
        x: Evaluation points
        knots: Full knot sequence (including boundary knots)
        degree: Spline degree (default 3 = cubic)
        
    Returns:
        Basis matrix of shape (len(x), n_basis)
    """
    n_basis = len(knots) - degree - 1
    basis = np.zeros((len(x), n_basis))
    
    for i in range(n_basis):
        # Create B-spline with single coefficient = 1 at position i
        c = np.zeros(n_basis)
        c[i] = 1.0
        bspl = BSpline(knots, c, degree)
        basis[:, i] = bspl(x)
    
    return basis
 
def fit_bspline_regression(x: np.ndarray, y: np.ndarray,
                            n_knots: int = 5, degree: int = 3) -> dict:
    """
    Fit regression using B-spline basis.
    """
    # Create knot sequence
    # Need degree+1 copies at boundaries (clamped B-spline)
    internal_knots = np.percentile(x, np.linspace(0, 100, n_knots + 2)[1:-1])
    
    knots = np.concatenate([
        [x.min()] * (degree + 1),
        internal_knots,
        [x.max()] * (degree + 1)
    ])
    
    # Sort data for scipy
    sort_idx = np.argsort(x)
    x_sorted = x[sort_idx]
    y_sorted = y[sort_idx]
    
    # Fit using scipy's least-squares B-spline
    spline = make_lsq_spline(x_sorted, y_sorted, knots, k=degree)
    
    # Also compute basis matrix for analysis
    basis = bspline_basis(x, knots, degree)
    
    return {
        'spline': spline,
        'knots': knots,
        'degree': degree,
        'coefficients': spline.c,
        'basis': basis,
        'n_parameters': len(spline.c),
        'condition_number': np.linalg.cond(basis)
    }
 
# Compare truncated power vs B-spline conditioning
np.random.seed(42)
x = np.sort(np.random.uniform(0, 10, 100))
y = np.sin(x) + np.random.normal(0, 0.3, len(x))
 
print("Conditioning Comparison: Truncated Power vs B-Splines")
print("=" * 60)
print(f"{'Knots':>8} | {'Truncated Power':>18} | {'B-Spline':>18}")
print("-" * 60)
 
for n_knots in [3, 5, 8, 12]:
    # Truncated power basis (from earlier)
    knot_pos = np.percentile(x, np.linspace(0, 100, n_knots + 2)[1:-1])
    
    # Truncated power
    Phi_tp = np.column_stack([np.ones_like(x), x, x**2, x**3] +
                              [np.maximum(0, x - k)**3 for k in knot_pos])
    cond_tp = np.linalg.cond(Phi_tp)
    
    # B-spline
    result = fit_bspline_regression(x, y, n_knots=n_knots)
    cond_bs = result['condition_number']
    
    improvement = cond_tp / cond_bs
    
    print(f"{n_knots:>8} | {cond_tp:>18.2e} | {cond_bs:>18.2e} ({improvement:.1f}x better)")
 
print("\nB-splines maintain bounded condition numbers even with many knots!")

Why B-Splines Are Used in Practice

B-splines are the computation workhorse for splines because: (1) numerically stable via Cox-de Boor recursion, (2) local support enables efficient computation (sparse matrices), (3) bounded condition numbers prevent numerical issues, (4) widely available in libraries (scipy, R, etc.). Always prefer B-splines over truncated power basis in production code.

Connections and Extensions

Piecewise polynomials connect to several important topics in regression and machine learning:

Key Connections

•Smoothing Splines: Rather than selecting knots, place a knot at every data point but add a roughness penalty $\lambda \int [f''(x)]^2 dx$. This balances fit vs smoothness automatically.
•Penalized Regression Splines (P-splines): Use many knots with difference penalties on coefficients. Simpler than smoothing splines, nearly as effective.
•Generalized Additive Models (GAMs): Extend to multiple predictors: $g(E[Y]) = \beta_0 + f_1(x_1) + f_2(x_2) + \ldots$ where each $f_i$ is a spline.
•Regression Trees: The simplest piecewise model—piecewise constant with automatically selected split points. Basis for random forests, gradient boosting.
•MARS (Multivariate Adaptive Regression Splines): Automatic selection of knots and interactions using piecewise linear basis functions.

Splines vs. Neural Networks

Splines with enough knots can approximate any continuous function. So can neural networks. Splines are interpretable, have guaranteed smoothness, and need less data. Neural networks are more flexible for high-dimensional, unstructured data. For 1-2D regression with moderate data, splines often outperform.

Next Steps in Curriculum

Module 2 of this chapter covers B-splines in depth. Module 3 covers Generalized Additive Models. Later chapters connect splines to kernel methods (smoothing splines are kernel regression).

Summary: Piecewise Polynomials

We've covered the foundations of piecewise polynomial regression, from basic construction to practical implementation. Let's consolidate:

Key Takeaways

•Piecewise polynomials provide local control—changes in one region don't affect others—unlike global polynomials.
•Polynomial splines add continuity constraints at knots, creating smooth curves from simple pieces (cubic splines = C² smooth).
•Natural splines constrain behavior at boundaries to be linear, improving extrapolation stability.
•Knot selection is crucial: too few → underfitting, too many → overfitting. Cross-validation provides automatic selection.
•B-splines are the numerically stable basis for practical computation, with local support and bounded condition numbers.
•Connections to smoothing splines, GAMs, and tree-based methods show piecewise polynomials are foundational to modern nonparametric regression.

Module Complete!\n\nYou've now completed the Polynomial Regression module, covering:\n1. Polynomial feature construction and the Vandermonde matrix\n2. Degree selection via cross-validation, information criteria, and hypothesis testing\n3. Overfitting diagnosis and mitigation through regularization\n4. Orthogonal polynomial bases for numerical stability\n5. Piecewise polynomials and splines for local flexibility\n\nThese tools form the foundation for nonlinear regression and prepare you for more advanced topics like spline regression, GAMs, and kernel methods.

Module Complete

Congratulations! You've mastered polynomial regression from first principles through advanced techniques. You can now construct polynomial features, select optimal complexity, diagnose and prevent overfitting, use numerically stable orthogonal bases, and apply piecewise methods for local flexibility. These skills transfer directly to production ML applications.

5 / 5

Loading learning content...

Machine LearningAdvanced Regression

Polynomial Regression

LevelIntermediate

Duration90 mins

TopicAdvanced Regression

5 / 5

Piecewise Polynomials

Local Control: The Power of Piecewise Polynomials

What You Will Learn

Basic Piecewise Polynomial Regression

piecewise_basic.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
from typing import List, Tuple
 
def piecewise_constant_fit(x: np.ndarray, y: np.ndarray,
                            knots: List[float]) -> Tuple[np.ndarray, np.ndarray]:
    """
    Fit piecewise constant regression (step function).
    
    Parameters:
        x, y: Data
        knots: Interior knot positions
        
    Returns:
        constants: Fitted constant for each interval
        intervals: List of (low, high) for each interval
    """
    knots = sorted(knots)
    all_boundaries = [float('-inf')] + knots + [float('inf')]
    
    K = len(knots) + 1  # Number of intervals
    constants = np.zeros(K)
    intervals = []
    
    for k in range(K):
        low, high = all_boundaries[k], all_boundaries[k + 1]
        intervals.append((low, high))
        
        # Points in this interval
        mask = (x >= low) & (x < high)
        if np.any(mask):
            constants[k] = np.mean(y[mask])
        else:
            constants[k] = np.nan
    
    return constants, intervals
 
def piecewise_linear_unconstrained(x: np.ndarray, y: np.ndarray,
                                    knots: List[float]) -> dict:
    """
    Fit piecewise linear regression WITHOUT continuity constraints.
    Results in discontinuous fit at knots.
    """
    knots = sorted(knots)
    all_boundaries = [-np.inf] + knots + [np.inf]
    
    K = len(knots) + 1
    results = {'segments': [], 'knots': knots}
    
    for k in range(K):
        low, high = all_boundaries[k], all_boundaries[k + 1]
        
        mask = (x >= low) & (x < high) if k < K - 1 else (x >= low) & (x <= high)
        
        if np.sum(mask) >= 2:
            # Fit line in this segment
            x_seg, y_seg = x[mask], y[mask]
            A = np.column_stack([np.ones_like(x_seg), x_seg])
            coeffs = np.linalg.lstsq(A, y_seg, rcond=None)[0]
            intercept, slope = coeffs
        else:
            intercept, slope = np.nan, 0
        
        results['segments'].append({
            'interval': (low, high),
            'intercept': intercept,
            'slope': slope
        })
    
    return results
 
def predict_piecewise(x_new: np.ndarray, fit_result: dict) -> np.ndarray:
    """Predict using fitted piecewise model."""
    y_pred = np.zeros_like(x_new)
    
    for seg in fit_result['segments']:
        low, high = seg['interval']
        mask = (x_new >= low) & (x_new < high)
        if high == np.inf:
            mask = (x_new >= low)
        
        y_pred[mask] = seg['intercept'] + seg['slope'] * x_new[mask]
    
    return y_pred
 
# Example
np.random.seed(42)
x = np.linspace(0, 10, 80)
# True function with distinct regions
y_true = np.where(x < 3, 2 + x, 
                   np.where(x < 7, 8 - 0.5*x, 3 + 0.3*x))
y = y_true + np.random.normal(0, 0.4, len(x))
 
# Fit piecewise linear with knots at regime changes
knots = [3, 7]
result = piecewise_linear_unconstrained(x, y, knots)
 
print("Piecewise Linear Regression (Unconstrained)")
print("=" * 50)
print(f"Knots at: {knots}")
print("-" * 50)
for i, seg in enumerate(result['segments']):
    lo, hi = seg['interval']
    lo_str = f"{lo:.1f}" if lo > -np.inf else "-∞"
    hi_str = f"{hi:.1f}" if hi < np.inf else "∞"
    print(f"Segment {i+1} [{lo_str}, {hi_str}): "
          f"y = {seg['intercept']:.3f} + {seg['slope']:.3f}x")
 
# Check discontinuity at knots
print("\nDiscontinuity check at knots:")
for knot in knots:
    left = result['segments'][knots.index(knot)]['intercept'] + \
           result['segments'][knots.index(knot)]['slope'] * knot
    right = result['segments'][knots.index(knot)+1]['intercept'] + \
            result['segments'][knots.index(knot)+1]['slope'] * knot
    print(f"  At x={knot}: left limit = {left:.3f}, right limit = {right:.3f}, "
          f"jump = {abs(right - left):.3f}")

The Discontinuity Problem

Continuity Constraints and Polynomial Splines

continuous_piecewise.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
import numpy as np
from typing import List
 
def linear_spline_basis(x: np.ndarray, knots: List[float]) -> np.ndarray:
    """
    Build basis functions for continuous piecewise linear regression.
    
    Basis: {1, x, (x - ξ₁)₊, (x - ξ₂)₊, ...}
    where (x - ξ)₊ = max(0, x - ξ) is the "positive part" or "hinge" function.
    
    This parameterization automatically ensures C⁰ continuity.
    """
    n = len(x)
    K = len(knots)
    
    # Design matrix: [1, x, (x-ξ₁)₊, (x-ξ₂)₊, ...]
    Phi = np.zeros((n, 2 + K))
    Phi[:, 0] = 1        # Intercept
    Phi[:, 1] = x        # Linear term
    
    for k, xi in enumerate(knots):
        # Positive part: (x - ξ)₊ = max(0, x - ξ)
        Phi[:, 2 + k] = np.maximum(0, x - xi)
    
    return Phi
 
def cubic_spline_basis(x: np.ndarray, knots: List[float]) -> np.ndarray:
    """
    Build truncated power basis for cubic spline.
    
    Basis: {1, x, x², x³, (x - ξ₁)³₊, (x - ξ₂)³₊, ...}
    
    This ensures C² continuity automatically.
    Note: Truncated power basis has numerical issues; B-splines are preferred.
    """
    n = len(x)
    K = len(knots)
    
    # Design matrix: [1, x, x², x³, (x-ξ₁)³₊, ...]
    Phi = np.zeros((n, 4 + K))
    Phi[:, 0] = 1
    Phi[:, 1] = x
    Phi[:, 2] = x**2
    Phi[:, 3] = x**3
    
    for k, xi in enumerate(knots):
        # Cubic positive part: (x - ξ)³₊
        Phi[:, 4 + k] = np.maximum(0, x - xi)**3
    
    return Phi
 
def fit_spline(x: np.ndarray, y: np.ndarray, knots: List[float],
               degree: int = 3, regularization: float = 0.0) -> dict:
    """
    Fit polynomial spline using truncated power basis.
    """
    if degree == 1:
        Phi = linear_spline_basis(x, knots)
    elif degree == 3:
        Phi = cubic_spline_basis(x, knots)
    else:
        raise ValueError("Only degree 1 and 3 implemented")
    
    # Fit with optional regularization
    if regularization > 0:
        reg = regularization * np.eye(Phi.shape[1])
        reg[0, 0] = 0  # Don't regularize intercept
        A = Phi.T @ Phi + reg
        beta = np.linalg.solve(A, Phi.T @ y)
    else:
        beta = np.linalg.lstsq(Phi, y, rcond=None)[0]
    
    return {
        'coefficients': beta,
        'knots': knots,
        'degree': degree,
        'basis': Phi
    }
 
def predict_spline(x_new: np.ndarray, fit_result: dict) -> np.ndarray:
    """Predict using fitted spline."""
    if fit_result['degree'] == 1:
        Phi_new = linear_spline_basis(x_new, fit_result['knots'])
    else:
        Phi_new = cubic_spline_basis(x_new, fit_result['knots'])
    return Phi_new @ fit_result['coefficients']
 
# Demonstration: Compare piecewise linear vs cubic spline
np.random.seed(42)
x = np.linspace(0, 10, 60)
y_true = np.sin(x * 0.8) + 0.2 * x
y = y_true + np.random.normal(0, 0.3, len(x))
 
knots = [2.5, 5.0, 7.5]
 
print("Spline Regression Comparison")
print("=" * 60)
print(f"Knots at: {knots}")
print("-" * 60)
 
# Linear spline
fit_linear = fit_spline(x, y, knots, degree=1)
y_pred_linear = predict_spline(x, fit_linear)
mse_linear = np.mean((y - y_pred_linear)**2)
print(f"Linear spline (C⁰): {len(fit_linear['coefficients'])} parameters, MSE={mse_linear:.4f}")
 
# Cubic spline
fit_cubic = fit_spline(x, y, knots, degree=3)
y_pred_cubic = predict_spline(x, fit_cubic)
mse_cubic = np.mean((y - y_pred_cubic)**2)
print(f"Cubic spline (C²):  {len(fit_cubic['coefficients'])} parameters, MSE={mse_cubic:.4f}")
 
# Verify smoothness of cubic spline
print("\nSmoothness verification at knot x=5.0:")
eps = 0.001
x_check = np.array([5.0 - eps, 5.0 + eps])
y_check = predict_spline(x_check, fit_cubic)
print(f"  f(5-ε) = {y_check[0]:.6f}")
print(f"  f(5+ε) = {y_check[1]:.6f}")
print(f"  Difference = {abs(y_check[1] - y_check[0]):.8f} (should be ≈ 0)")

The Truncated Power Basis

Natural Splines: Boundary Conditions

natural_splines.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import numpy as np
from scipy.interpolate import CubicSpline
 
def natural_cubic_spline_basis(x: np.ndarray, knots: np.ndarray) -> np.ndarray:
    """
    Build natural cubic spline basis functions.
    
    Uses the transformation that creates basis functions with 
    second derivative = 0 at boundaries.
    
    This is more complex than truncated power; we use a formula
    that directly encodes the natural spline constraints.
    """
    n = len(x)
    K = len(knots)
    
    if K < 2:
        raise ValueError("Need at least 2 knots for natural spline")
    
    # For natural splines, we have K degrees of freedom
    # (K-2 interior + 2 for linear part)
    # The basis can be written more conveniently using a different formulation
    
    # Simple approach: use [1, x, N_1(x), N_2(x), ...]
    # where N_k are derived from the natural spline construction
    
    def d_k(x, xi_k, xi_K):
        """Helper function for natural basis."""
        return (np.maximum(0, x - xi_k)**3 - np.maximum(0, x - xi_K)**3) / (xi_K - xi_k)
    
    xi_K = knots[-1]
    xi_K_minus_1 = knots[-2]
    
    Phi = np.zeros((n, K))
    Phi[:, 0] = 1  # Intercept
    Phi[:, 1] = x  # Linear term
    
    # Natural spline basis functions
    for k in range(K - 2):
        d_k_val = d_k(x, knots[k], xi_K)
        d_K_minus_1_val = d_k(x, xi_K_minus_1, xi_K)
        Phi[:, k + 2] = d_k_val - d_K_minus_1_val
    
    return Phi
 
def fit_natural_spline(x: np.ndarray, y: np.ndarray, 
                        n_knots: int = 5,
                        knot_positions: np.ndarray = None) -> dict:
    """
    Fit natural cubic spline.
    
    Parameters:
        x, y: Data
        n_knots: Number of knots (placed at quantiles if positions not given)
        knot_positions: Explicit knot positions
    """
    if knot_positions is None:
        # Place knots at quantiles of x
        percentiles = np.linspace(0, 100, n_knots + 2)[1:-1]
        knot_positions = np.percentile(x, percentiles)
    
    Phi = natural_cubic_spline_basis(x, knot_positions)
    
    # OLS fit
    beta = np.linalg.lstsq(Phi, y, rcond=None)[0]
    
    return {
        'coefficients': beta,
        'knots': knot_positions,
        'basis': Phi
    }
 
# Compare cubic spline vs natural spline extrapolation
np.random.seed(42)
x_train = np.linspace(1, 9, 40)
y_true = np.sin(x_train * 0.5) + 0.1 * x_train
y_train = y_true + np.random.normal(0, 0.2, len(x_train))
 
# Extended range for extrapolation test
x_test = np.linspace(-2, 12, 100)
 
# Using scipy for comparison
spline_regular = CubicSpline(x_train, y_train, bc_type='not-a-knot')
spline_natural = CubicSpline(x_train, y_train, bc_type='natural')
 
y_regular = spline_regular(x_test)
y_natural = spline_natural(x_test)
 
print("Extrapolation Behavior: Regular vs Natural Cubic Spline")
print("=" * 60)
print("Training data in [1, 9], testing including extrapolation to [-2, 12]")
print("-" * 60)
 
# Check values at extrapolation points
extrap_points = [-2, 0, 10, 12]
print(f"{'x':>6} | {'Regular':>12} | {'Natural':>12} | {'Difference':>12}")
print("-" * 60)
for x_pt in extrap_points:
    reg = spline_regular(x_pt)
    nat = spline_natural(x_pt)
    print(f"{x_pt:>6.1f} | {reg:>12.4f} | {nat:>12.4f} | {abs(reg - nat):>12.4f}")
 
print("\nNote: Natural spline is LINEAR outside [1, 9], avoiding wild swings")
print("      Regular spline continues cubic, often giving unrealistic values")

When to Use Natural Splines

Knot Selection Strategies

Knot Placement Strategies
Strategy	Method	Pros	Cons
Uniform	Equal spacing on x	Simple, predictable	Ignores data density
Quantile	Equal # of points between knots	Adapts to data density	May cluster in flat regions
Domain knowledge	Place at known change points	Captures true structure	Requires knowledge
Cross-validation	Try many placements, pick best CV	Optimal for prediction	Computationally expensive
Free knots	Optimize positions as parameters	Maximum flexibility	Non-convex optimization

knot_selection.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
import numpy as np
from scipy.interpolate import UnivariateSpline, LSQUnivariateSpline
from typing import List
 
def cross_validate_knot_number(x: np.ndarray, y: np.ndarray,
                                 knot_counts: List[int],
                                 k_folds: int = 5) -> dict:
    """
    Use cross-validation to select optimal number of knots.
    """
    np.random.seed(42)
    n = len(x)
    
    # Create fold indices
    indices = np.random.permutation(n)
    fold_size = n // k_folds
    
    results = []
    
    for n_knots in knot_counts:
        fold_errors = []
        
        for fold in range(k_folds):
            # Split data
            val_start, val_end = fold * fold_size, (fold + 1) * fold_size
            val_idx = indices[val_start:val_end]
            train_idx = np.concatenate([indices[:val_start], indices[val_end:]])
            
            x_train, y_train = x[train_idx], y[train_idx]
            x_val, y_val = x[val_idx], y[val_idx]
            
            # Sort training data (required by scipy splines)
            sort_idx = np.argsort(x_train)
            x_train_sorted = x_train[sort_idx]
            y_train_sorted = y_train[sort_idx]
            
            # Place knots at quantiles of training data
            if n_knots > 0:
                percentiles = np.linspace(0, 100, n_knots + 2)[1:-1]
                knots = np.percentile(x_train_sorted, percentiles)
                
                # Ensure knots are strictly interior
                knots = knots[(knots > x_train_sorted.min()) & 
                              (knots < x_train_sorted.max())]
                
                if len(knots) < 2:
                    knots = np.linspace(x_train_sorted.min() + 0.1,
                                        x_train_sorted.max() - 0.1, n_knots)
                
                try:
                    spline = LSQUnivariateSpline(x_train_sorted, y_train_sorted,
                                                  knots, k=3)
                    y_pred = spline(x_val)
                    mse = np.mean((y_val - y_pred)**2)
                except:
                    mse = np.inf
            else:
                # Global polynomial (no knots)
                coefs = np.polyfit(x_train_sorted, y_train_sorted, 3)
                y_pred = np.polyval(coefs, x_val)
                mse = np.mean((y_val - y_pred)**2)
            
            if np.isfinite(mse):
                fold_errors.append(mse)
        
        if fold_errors:
            results.append({
                'n_knots': n_knots,
                'cv_error': np.mean(fold_errors),
                'cv_std': np.std(fold_errors) / np.sqrt(len(fold_errors))
            })
    
    return results
 
def select_optimal_knots(x: np.ndarray, y: np.ndarray,
                          max_knots: int = 15) -> dict:
    """
    Automatic knot count selection using cross-validation.
    """
    knot_counts = list(range(0, max_knots + 1, 2))  # 0, 2, 4, ..., max
    
    results = cross_validate_knot_number(x, y, knot_counts)
    
    if not results:
        return {'optimal_knots': 0, 'message': 'CV failed'}
    
    # Find minimum CV error
    best = min(results, key=lambda r: r['cv_error'])
    
    # One-SE rule: simplest model within 1 SE of best
    threshold = best['cv_error'] + best['cv_std']
    one_se = min([r for r in results if r['cv_error'] <= threshold],
                 key=lambda r: r['n_knots'])
    
    return {
        'results': results,
        'best_cv': best['n_knots'],
        'one_se': one_se['n_knots'],
        'best_cv_error': best['cv_error']
    }
 
# Example: automatic knot selection
np.random.seed(42)
x = np.sort(np.random.uniform(0, 10, 100))
# Complex function requiring multiple knots
y_true = 2*np.sin(x) + 0.5*np.sin(3*x) + 0.1*x**2
y = y_true + np.random.normal(0, 0.5, len(x))
 
print("Automatic Knot Selection via Cross-Validation")
print("=" * 60)
 
selection = select_optimal_knots(x, y, max_knots=12)
 
print(f"{'Knots':>8} | {'CV Error':>12} | {'Std Err':>10}")
print("-" * 40)
for r in selection['results']:
    marker = ""
    if r['n_knots'] == selection['best_cv']:
        marker = " ← Best CV"
    elif r['n_knots'] == selection['one_se']:
        marker = " ← 1-SE rule"
    print(f"{r['n_knots']:>8} | {r['cv_error']:>12.4f} | {r['cv_std']:>10.4f}{marker}")
 
print(f"\nRecommendation: {selection['one_se']} knots (1-SE rule)")

Practical Knot Selection

Introduction to B-Splines

b_splines.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import numpy as np
from scipy.interpolate import BSpline, make_lsq_spline
 
def bspline_basis(x: np.ndarray, knots: np.ndarray, degree: int = 3) -> np.ndarray:
    """
    Evaluate B-spline basis functions at points x.
    
    Parameters:
        x: Evaluation points
        knots: Full knot sequence (including boundary knots)
        degree: Spline degree (default 3 = cubic)
        
    Returns:
        Basis matrix of shape (len(x), n_basis)
    """
    n_basis = len(knots) - degree - 1
    basis = np.zeros((len(x), n_basis))
    
    for i in range(n_basis):
        # Create B-spline with single coefficient = 1 at position i
        c = np.zeros(n_basis)
        c[i] = 1.0
        bspl = BSpline(knots, c, degree)
        basis[:, i] = bspl(x)
    
    return basis
 
def fit_bspline_regression(x: np.ndarray, y: np.ndarray,
                            n_knots: int = 5, degree: int = 3) -> dict:
    """
    Fit regression using B-spline basis.
    """
    # Create knot sequence
    # Need degree+1 copies at boundaries (clamped B-spline)
    internal_knots = np.percentile(x, np.linspace(0, 100, n_knots + 2)[1:-1])
    
    knots = np.concatenate([
        [x.min()] * (degree + 1),
        internal_knots,
        [x.max()] * (degree + 1)
    ])
    
    # Sort data for scipy
    sort_idx = np.argsort(x)
    x_sorted = x[sort_idx]
    y_sorted = y[sort_idx]
    
    # Fit using scipy's least-squares B-spline
    spline = make_lsq_spline(x_sorted, y_sorted, knots, k=degree)
    
    # Also compute basis matrix for analysis
    basis = bspline_basis(x, knots, degree)
    
    return {
        'spline': spline,
        'knots': knots,
        'degree': degree,
        'coefficients': spline.c,
        'basis': basis,
        'n_parameters': len(spline.c),
        'condition_number': np.linalg.cond(basis)
    }
 
# Compare truncated power vs B-spline conditioning
np.random.seed(42)
x = np.sort(np.random.uniform(0, 10, 100))
y = np.sin(x) + np.random.normal(0, 0.3, len(x))
 
print("Conditioning Comparison: Truncated Power vs B-Splines")
print("=" * 60)
print(f"{'Knots':>8} | {'Truncated Power':>18} | {'B-Spline':>18}")
print("-" * 60)
 
for n_knots in [3, 5, 8, 12]:
    # Truncated power basis (from earlier)
    knot_pos = np.percentile(x, np.linspace(0, 100, n_knots + 2)[1:-1])
    
    # Truncated power
    Phi_tp = np.column_stack([np.ones_like(x), x, x**2, x**3] +
                              [np.maximum(0, x - k)**3 for k in knot_pos])
    cond_tp = np.linalg.cond(Phi_tp)
    
    # B-spline
    result = fit_bspline_regression(x, y, n_knots=n_knots)
    cond_bs = result['condition_number']
    
    improvement = cond_tp / cond_bs
    
    print(f"{n_knots:>8} | {cond_tp:>18.2e} | {cond_bs:>18.2e} ({improvement:.1f}x better)")
 
print("\nB-splines maintain bounded condition numbers even with many knots!")

Why B-Splines Are Used in Practice

Connections and Extensions

Piecewise polynomials connect to several important topics in regression and machine learning:

Key Connections

•Smoothing Splines: Rather than selecting knots, place a knot at every data point but add a roughness penalty $\lambda \int [f''(x)]^2 dx$. This balances fit vs smoothness automatically.
•Penalized Regression Splines (P-splines): Use many knots with difference penalties on coefficients. Simpler than smoothing splines, nearly as effective.
•Generalized Additive Models (GAMs): Extend to multiple predictors: $g(E[Y]) = \beta_0 + f_1(x_1) + f_2(x_2) + \ldots$ where each $f_i$ is a spline.
•Regression Trees: The simplest piecewise model—piecewise constant with automatically selected split points. Basis for random forests, gradient boosting.
•MARS (Multivariate Adaptive Regression Splines): Automatic selection of knots and interactions using piecewise linear basis functions.

Splines vs. Neural Networks

Next Steps in Curriculum

Module 2 of this chapter covers B-splines in depth. Module 3 covers Generalized Additive Models. Later chapters connect splines to kernel methods (smoothing splines are kernel regression).

Summary: Piecewise Polynomials

We've covered the foundations of piecewise polynomial regression, from basic construction to practical implementation. Let's consolidate:

Key Takeaways

•Piecewise polynomials provide local control—changes in one region don't affect others—unlike global polynomials.
•Polynomial splines add continuity constraints at knots, creating smooth curves from simple pieces (cubic splines = C² smooth).
•Natural splines constrain behavior at boundaries to be linear, improving extrapolation stability.
•Knot selection is crucial: too few → underfitting, too many → overfitting. Cross-validation provides automatic selection.
•B-splines are the numerically stable basis for practical computation, with local support and bounded condition numbers.
•Connections to smoothing splines, GAMs, and tree-based methods show piecewise polynomials are foundational to modern nonparametric regression.

Module Complete

5 / 5