Spline Regression - Learning Module

Loading content...

0/245

Natural Splines: Optimal Boundary Behavior

The Boundary Problem in Spline Regression

In the previous page, we established B-splines as the foundation for spline regression. However, unconstrained B-spline (or equivalently, truncated power) bases suffer from a significant problem at the boundaries of the data range.

Near the left and right edges of the observed $x$ values, the spline has relatively few data points to anchor its behavior, yet the polynomial pieces are free to follow any pattern consistent with the smoothness constraints. This typically leads to:

High variance at boundaries: The fitted curve can swing wildly just beyond the observed data
Poor extrapolation: Predictions outside the training range are often nonsensical
Excessive degrees of freedom: The boundary regions 'waste' flexibility that could be better used in the interior

Natural splines address these problems by imposing additional constraints: the spline must be linear (not cubic) beyond the boundary knots. This reduces the degrees of freedom and stabilizes boundary behavior.

What You Will Learn

By the end of this page, you will understand the mathematical formulation of natural cubic splines, why their boundary constraints are optimal in a variational sense, how to construct their basis functions, and when to prefer them over unconstrained splines. You'll see that natural splines achieve the same flexibility in the data interior with fewer parameters.

Definition of Natural Cubic Splines

Definition (Natural Cubic Spline):

A natural cubic spline with knots $\xi_1 < \xi_2 < \cdots < \xi_K$ on the interval $[\xi_1, \xi_K]$ is a function $s: \mathbb{R} \to \mathbb{R}$ such that:

On each interval $[\xi_j, \xi_{j+1}]$ for $j = 1, \ldots, K-1$, the function $s$ is a cubic polynomial
The function $s$ is twice continuously differentiable ($C^2$) everywhere
The function $s$ is linear on the tails $(-\infty, \xi_1]$ and $[\xi_K, \infty)$

The third condition is the distinguishing feature. It is equivalent to requiring: $$s''(\xi_1) = s''(\xi_K) = 0$$

That is, the second derivative vanishes at the boundary knots. Since a linear function has zero second derivative, and we require $C^2$ continuity, the spline must transition smoothly from cubic to linear at the boundaries.

Why 'Natural'?

The name 'natural' comes from the physical spline analogy. A thin flexible strip (a drafting spline) that is free at both ends will naturally assume a shape with zero curvature at the endpoints—the minimum energy configuration. Natural cubic splines mathematically capture this physical behavior.

Degrees of Freedom Reduction:

Recall that unconstrained cubic splines with $K$ knots have dimension $K + 4$ (we need $K + 4$ basis functions or parameters). Natural splines add 4 constraints:

$s''(\xi_1) = 0$ (left boundary constraint)
$s''(\xi_K) = 0$ (right boundary constraint)
$s'''(\xi_1) = 0$ (implied by linearity on left tail)
$s'''(\xi_K) = 0$ (implied by linearity on right tail)

Actually, the constraints $s'' = 0$ at boundaries encompass the third derivative constraints (since if $s$ is linear, both $s''$ and $s'''$ are zero). The net effect is:

$$\dim(\text{Natural Cubic Splines with } K \text{ knots}) = K$$

With $K$ knots, we have exactly $K$ basis functions—a remarkably clean result. Each knot corresponds to one degree of freedom.

Comparison: Unconstrained vs Natural Cubic Splines
Property	Unconstrained Cubic Spline	Natural Cubic Spline
Knot sequence	$K$ interior knots on $[a, b]$	$K$ knots serve as both interior and boundary
Degrees of freedom	$K + 4$	$K$
Boundary behavior	Cubic (flexible, high variance)	Linear (constrained, stable)
Second derivative at boundaries	Unconstrained	$s''(\xi_1) = s''(\xi_K) = 0$
Extrapolation	Cubic (dangerous)	Linear (reasonable)
Typical use case	Interior flexibility needed	General-purpose regression

The Variational Optimality of Natural Splines

Natural cubic splines aren't just a convenient construction—they arise as the unique solution to a fundamental optimization problem.

The Smoothing Problem:

Given data points $(x_1, y_1), \ldots, (x_n, y_n)$ with $x_1 < x_2 < \cdots < x_n$, consider finding a function $g$ that:

Interpolates the data exactly: $g(x_i) = y_i$ for all $i$
Is as 'smooth' as possible

We measure smoothness by the integrated squared second derivative: $$\mathcal{R}(g) = \int_{-\infty}^{\infty} [g''(x)]^2 , dx$$

This is called the roughness penalty or bending energy. A linear function has $\mathcal{R}(g) = 0$; wigglier functions have larger values.

The Physical Connection

The quantity $\int [g''(x)]^2 , dx$ is proportional to the bending energy of an elastic beam. Minimizing this over all interpolating functions finds the configuration that a physical spline would naturally assume. This is why natural splines are 'natural'—they minimize bending energy.

Theorem (Optimality of Natural Cubic Splines):

Among all twice-differentiable functions $g$ satisfying the interpolation conditions $g(x_i) = y_i$ for $i = 1, \ldots, n$, the unique minimizer of $\mathcal{R}(g) = \int [g''(x)]^2 , dx$ is the natural cubic spline with knots at the data points $x_1, \ldots, x_n$.

Proof Sketch:

Let $s$ be the natural cubic spline interpolant and let $g$ be any other interpolating function. Define $h = g - s$. Then:

$$\mathcal{R}(g) = \int [s''(x) + h''(x)]^2 , dx = \int [s''(x)]^2 , dx + 2 \int s''(x) h''(x) , dx + \int [h''(x)]^2 , dx$$

The key is to show the cross-term vanishes. Using integration by parts twice:

$$\int_{x_1}^{x_n} s''(x) h''(x) , dx = [s''(x) h'(x)]{x_1}^{x_n} - \int{x_1}^{x_n} s'''(x) h'(x) , dx$$

Since $s''(x_1) = s''(x_n) = 0$ (natural boundary conditions), the boundary terms vanish. Integrating by parts again:

$$= -[s'''(x) h(x)]{x_1}^{x_n} + \int{x_1}^{x_n} s^{(4)}(x) h(x) , dx$$

Now, $s$ is a cubic polynomial on each interval, so $s^{(4)} = 0$ everywhere except at knots. The jumps in $s'''$ at interior knots, combined with the fact that $h(x_i) = 0$ (since both $s$ and $g$ interpolate), causes the integral to vanish.

Thus $\mathcal{R}(g) = \mathcal{R}(s) + \mathcal{R}(h) \geq \mathcal{R}(s)$, with equality iff $h'' = 0$ everywhere, i.e., $g = s$ (up to a linear function, which must be zero by interpolation).

The Fundamental Result

Natural cubic splines are not just convenient—they are optimal in a precise mathematical sense. Among all possible interpolating functions, they minimize bending energy. This variational characterization explains why natural splines appear throughout applied mathematics, physics, and engineering.

Constructing the Natural Spline Basis

There are several ways to construct a basis for natural cubic splines. We'll present two approaches: modification of B-splines, and a direct construction using 'natural spline' basis functions.

Approach 1: Constrained B-splines

Start with the standard B-spline basis of dimension $K + 4$ for $K$ knots. Apply 4 linear constraints (corresponding to $s''(\xi_1) = s''(\xi_K) = 0$ and implied conditions). The resulting constrained basis has dimension $K$.

In practice, we compute a null-space basis for the constraint matrix. Libraries like R's ns() function handle this automatically.

Approach 2: The Truncated Power Natural Spline Basis

Define the following $K$ basis functions for knots $\xi_1 < \cdots < \xi_K$:

$N_1(x) = 1$ (constant)
$N_2(x) = x$ (linear)
For $k = 1, \ldots, K-2$: $$N_{k+2}(x) = d_k(x) - d_{K-1}(x)$$ where $$d_k(x) = \frac{(x - \xi_k)+^3 - (x - \xi_K)+^3}{\xi_K - \xi_k}$$

These basis functions are specifically designed to satisfy the natural boundary conditions.

natural_spline_basis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import numpy as np
 
def truncated_cube(x, xi):
    """Compute (x - xi)_+^3"""
    return np.maximum(0, x - xi) ** 3
 
def natural_spline_basis(x, knots):
    """
    Construct the natural cubic spline basis matrix.
    
    Uses the truncated power representation with natural boundary conditions.
    
    Parameters:
    -----------
    x : array of shape (n,)
        Evaluation points
    knots : array of shape (K,)
        Knot positions (including boundary knots)
    
    Returns:
    --------
    N : array of shape (n, K)
        Natural spline basis matrix
    """
    x = np.asarray(x)
    knots = np.asarray(knots)
    K = len(knots)
    n = len(x)
    
    if K < 2:
        raise ValueError("Need at least 2 knots for natural splines")
    
    # Define d_k(x) helper function
    def d_k(x, k):
        xi_k = knots[k]
        xi_K = knots[-1]  # Last knot
        numerator = truncated_cube(x, xi_k) - truncated_cube(x, xi_K)
        denominator = xi_K - xi_k
        return numerator / denominator
    
    # Build basis matrix
    N = np.zeros((n, K))
    
    # First two basis functions: constant and linear
    N[:, 0] = 1.0         # N_1(x) = 1
    N[:, 1] = x           # N_2(x) = x
    
    # Remaining K-2 basis functions
    d_Km1 = d_k(x, K - 2)  # d_{K-1}(x), using 0-indexing
    
    for k in range(K - 2):
        # N_{k+3}(x) = d_k(x) - d_{K-1}(x), but with 0-indexing:
        # N[:, k+2] = d_k(x, k) - d_{K-1}(x)
        N[:, k + 2] = d_k(x, k) - d_Km1
    
    return N
 
 
# Example: Create natural spline basis
knots = np.array([0.0, 0.25, 0.5, 0.75, 1.0])  # K=5 knots
x = np.linspace(0, 1, 200)
 
N = natural_spline_basis(x, knots)
print(f"Basis matrix shape: {N.shape}")  # (200, 5) - exactly K columns!
 
# Verify at boundaries: second derivative should be zero
# For piecewise cubic, s''(xi) involves only the coefficient of x^2 term
# We can verify numerically that fitted curves have s''=0 at boundaries
 
# The condition number should be better than truncated power
print(f"Condition number: {np.linalg.cond(N):.2f}")

Using Libraries for Natural Splines:

In practice, we use library implementations that handle the construction efficiently:

# Using patsy (Python equivalent of R's formula interface)
import patsy

# Create natural spline basis with 5 degrees of freedom
# (5 df means 5 knots, placed at quantiles by default)
N = patsy.dmatrix("cr(x, df=5) - 1", data={"x": x})

# Or specify knots explicitly
N = patsy.dmatrix("cr(x, knots=[0.25, 0.5, 0.75]) - 1", data={"x": x})

Note: patsy uses cr() for natural (restricted) cubic splines, and bs() for B-splines. In R, use ns() for natural splines and bs() for B-splines.

Boundary vs Interior Knots

Beware of terminology confusion! In the natural spline convention, all K knots (including the leftmost and rightmost) are considered. Some software uses 'interior knots' differently. In R's ns(), df = K means K total knots. In patsy's cr(), the boundary knots are set at data extremes by default, so df = K means K-2 interior knots plus 2 boundary knots.

Natural Spline Regression in Practice

With the natural spline basis constructed, regression proceeds exactly as with B-splines: form the design matrix, apply least squares (or regularized variants), and make predictions.

Let's work through a complete example comparing natural splines with unconstrained B-splines.

natural_spline_regression.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
import numpy as np
from scipy.interpolate import BSpline
import matplotlib.pyplot as plt
 
# =====================================================
# Generate Data with Known True Function
# =====================================================
np.random.seed(42)
 
def true_function(x):
    return np.sin(2 * x) + 0.3 * x
 
n = 80
x = np.sort(np.random.uniform(0, 2 * np.pi, n))
y_true = true_function(x)
y = y_true + np.random.normal(0, 0.4, n)
 
# =====================================================
# Natural Spline Basis (using our implementation)
# =====================================================
def truncated_cube(x, xi):
    return np.maximum(0, x - xi) ** 3
 
def natural_spline_basis(x, knots):
    K = len(knots)
    n = len(x)
    
    def d_k(x, k):
        return (truncated_cube(x, knots[k]) - truncated_cube(x, knots[-1])) / (knots[-1] - knots[k])
    
    N = np.zeros((n, K))
    N[:, 0] = 1.0
    N[:, 1] = x
    
    d_Km1 = d_k(x, K - 2)
    for k in range(K - 2):
        N[:, k + 2] = d_k(x, k) - d_Km1
    
    return N
 
# =====================================================
# B-spline Basis (for comparison)
# =====================================================
def bspline_basis(x, interior_knots, degree):
    t = np.concatenate([
        np.repeat(x.min(), degree + 1),
        interior_knots,
        np.repeat(x.max(), degree + 1)
    ])
    n_basis = len(t) - degree - 1
    B = np.zeros((len(x), n_basis))
    for j in range(n_basis):
        c = np.zeros(n_basis)
        c[j] = 1.0
        spl = BSpline(t, c, degree, extrapolate=False)
        B[:, j] = np.nan_to_num(spl(x), nan=0.0)
    return B
 
# =====================================================
# Fit Both Models
# =====================================================
# Place knots at quantiles
K = 6  # Number of knots for natural spline
interior_K = K - 2  # Interior knots for B-spline to get similar df
 
knots_ns = np.percentile(x, np.linspace(0, 100, K))
interior_knots_bs = np.percentile(x, np.linspace(0, 100, interior_K + 2)[1:-1])
 
# Natural spline fit
N_train = natural_spline_basis(x, knots_ns)
beta_ns = np.linalg.lstsq(N_train, y, rcond=None)[0]
print(f"Natural Spline: {N_train.shape[1]} parameters")
 
# B-spline fit (cubic)
B_train = bspline_basis(x, interior_knots_bs, degree=3)
beta_bs = np.linalg.lstsq(B_train, y, rcond=None)[0]
print(f"B-spline: {B_train.shape[1]} parameters")
 
# =====================================================
# Prediction and Extrapolation
# =====================================================
# Include extrapolation region beyond data
x_pred = np.linspace(-1, 2 * np.pi + 1, 300)
 
# Natural spline: safe extrapolation
N_pred = natural_spline_basis(x_pred, knots_ns)
y_ns = N_pred @ beta_ns
 
# B-spline: need to handle extrapolation carefully
# Build extended grid for B-spline (within data range)
x_pred_bs = np.clip(x_pred, x.min(), x.max())
B_pred = bspline_basis(x_pred_bs, interior_knots_bs, degree=3)
y_bs = B_pred @ beta_bs
 
# For B-spline extrapolation, use boundary derivatives
# (This is approximate; proper extrapolation requires care)
left_mask = x_pred < x.min()
right_mask = x_pred > x.max()
 
# =====================================================
# Compute Metrics
# =====================================================
y_ns_train = N_train @ beta_ns
y_bs_train = B_train @ beta_bs
 
mse_ns = np.mean((y - y_ns_train) ** 2)
mse_bs = np.mean((y - y_bs_train) ** 2)
 
print(f"Natural Spline MSE: {mse_ns:.4f}")
print(f"B-spline MSE: {mse_bs:.4f}")
 
# R² scores
ss_res_ns = np.sum((y - y_ns_train) ** 2)
ss_res_bs = np.sum((y - y_bs_train) ** 2)
ss_tot = np.sum((y - y.mean()) ** 2)
 
r2_ns = 1 - ss_res_ns / ss_tot
r2_bs = 1 - ss_res_bs / ss_tot
 
print(f"Natural Spline R²: {r2_ns:.4f}")
print(f"B-spline R²: {r2_bs:.4f}")

Key Observations:

Fewer Parameters: Natural splines achieve comparable fit quality with fewer parameters. With $K$ knots, natural splines use $K$ parameters while B-splines use $K + 3 + 1 = K + 4$ parameters (for cubics with boundary adjustments).
Stable Extrapolation: Natural splines extrapolate linearly beyond the data range—a much more sensible behavior than the potentially wild cubic extrapolation of unconstrained splines.
Similar Interior Fit: Within the data range, both methods produce nearly identical fits. The differences appear primarily at and beyond the boundaries.
Slightly Higher Bias at Boundaries: The linear constraint means natural splines may underfit data that has genuine curvature at the boundaries. This is the tradeoff for reduced variance.

When to Choose Natural Splines

Use natural splines when: (1) You don't have strong prior knowledge about boundary behavior; (2) Extrapolation stability matters; (3) You want to reduce effective degrees of freedom; (4) The true function is not expected to have extreme curvature at the edges. Use unconstrained B-splines when boundary flexibility is genuinely needed or when boundaries are interior to a larger problem.

Natural Splines and Smoothing Splines

Natural cubic splines are intimately connected to smoothing splines, which we'll cover in detail on the next page. Here we preview the connection.

The Smoothing Spline Problem:

Instead of requiring exact interpolation ($g(x_i) = y_i$), smoothing splines find a function that balances data fidelity against smoothness:

$$\min_g \sum_{i=1}^n (y_i - g(x_i))^2 + \lambda \int [g''(x)]^2 , dx$$

where $\lambda > 0$ is a smoothing parameter. When $\lambda = 0$, we recover the interpolation problem; when $\lambda \to \infty$, the solution approaches a straight line.

Reinsch's Theorem

A remarkable result (Reinsch, 1967): For any $\lambda > 0$, the unique solution to the smoothing spline problem is a natural cubic spline with knots at the data points $x_1, \ldots, x_n$. This means we need only optimize over an $n$-dimensional space of natural splines, not the infinite-dimensional space of all smooth functions.

Why This Matters:

Reinsch's theorem transforms an infinite-dimensional optimization problem into a finite-dimensional one. The smoothing spline solution can be written as:

$$\hat{g}(x) = \sum_{j=1}^n \hat{\beta}_j N_j(x)$$

where $N_1, \ldots, N_n$ is a natural spline basis with knots at the data points. The optimal coefficients $\hat{\boldsymbol{\beta}}$ are obtained by solving a penalized least squares problem:

$$\hat{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}} |\mathbf{y} - \mathbf{N}\boldsymbol{\beta}|^2 + \lambda \boldsymbol{\beta}^T \boldsymbol{\Omega} \boldsymbol{\beta}$$

where $\boldsymbol{\Omega}$ is the penalty matrix with elements $\Omega_{jk} = \int N_j''(x) N_k''(x) , dx$.

This closed-form solution enables efficient computation and connects spline smoothing to ridge regression.

Interpolation vs Smoothing with Natural Splines
Aspect	Interpolating Natural Spline	Smoothing Spline
Data fit	Exact: $\hat{g}(x_i) = y_i$	Approximate: residuals allowed
Objective	Minimize $\int [g'']^2$ subject to interpolation	Minimize $\sum (y_i - g(x_i))^2 + \lambda \int [g'']^2$
Smoothing parameter	None (implicitly $\lambda = 0$)	$\lambda > 0$, chosen by user or CV
Effective df	Exactly $n$	Between 2 and $n$, controlled by $\lambda$
Overfitting risk	High (interpolates noise)	Controlled by $\lambda$
Use case	Exact curve through points	General regression with smoothing

Computational Aspects of Natural Splines

Natural spline computations involve several matrices that exploit special structure for efficiency.

The Natural Spline Design Matrix:

Given $n$ data points and $K$ knots (typically $K = n$ for smoothing splines or $K \ll n$ for regression splines), the design matrix $\mathbf{N}$ has dimensions $n \times K$.

The Penalty Matrix:

For smoothing splines, we need the matrix $\boldsymbol{\Omega}$ with entries: $$\Omega_{jk} = \int N_j''(x) N_k''(x) , dx$$

Due to the local support of natural spline basis functions, $\boldsymbol{\Omega}$ is banded (tridiagonal-ish structure). This sparsity enables $O(n)$ rather than $O(n^3)$ solutions.

penalty_matrix.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
import numpy as np
from scipy import integrate
 
def compute_penalty_matrix(knots):
    """
    Compute the penalty matrix Omega for natural cubic splines.
    
    Omega[j,k] = integral of N_j''(x) * N_k''(x) dx
    
    For natural splines, this has a closed-form expression.
    
    Parameters:
    -----------
    knots : array of shape (K,)
        Knot positions (sorted)
    
    Returns:
    --------
    Omega : array of shape (K, K)
        Penalty matrix
    """
    K = len(knots)
    Omega = np.zeros((K, K))
    
    # For the truncated power basis natural spline representation,
    # the penalty matrix has a specific structure.
    
    # However, for the standard representation:
    # N_1(x) = 1, N_2(x) = x, so N_1'' = N_2'' = 0
    # The penalty only involves the K-2 non-linear basis functions
    
    # For these, we use the formula:
    # h_k = knots[k+1] - knots[k]  (spacing)
    
    # Penalty matrix for natural splines (in the standard representation):
    # Only the (K-2) x (K-2) bottom-right block is non-zero
    
    h = np.diff(knots)  # h[k] = knots[k+1] - knots[k]
    
    # The integrated second derivative matrix for natural cubic splines
    # has a specific form related to the knot spacings
    
    # Simplified version: compute numerically
    # (Closed-form exists but is tedious to derive)
    
    def second_deriv_Nk(x, k, knots):
        """Approximate second derivative of k-th natural spline basis"""
        eps = 1e-5
        N_plus = natural_spline_basis(np.array([x + eps]), knots)[0, k]
        N_center = natural_spline_basis(np.array([x]), knots)[0, k]
        N_minus = natural_spline_basis(np.array([x - eps]), knots)[0, k]
        return (N_plus - 2*N_center + N_minus) / eps**2
    
    # Note: This is for illustration. Real implementations use exact formulas.
    # The closed-form for C-R natural splines is:
    #   Omega[j,k] involves (xi_K - xi_j)(xi_K - xi_k) terms
    
    # For regression purposes, the key insight is that Omega is:
    # 1. Symmetric positive semi-definite
    # 2. Has rank K-2 (null space: linear functions)
    # 3. Sparse/banded for efficient computation
    
    return Omega
 
 
def natural_spline_basis(x, knots):
    """Same implementation as before"""
    def truncated_cube(x, xi):
        return np.maximum(0, x - xi) ** 3
    
    K = len(knots)
    n = len(x)
    
    def d_k(x, k):
        return (truncated_cube(x, knots[k]) - truncated_cube(x, knots[-1])) / (knots[-1] - knots[k])
    
    N = np.zeros((n, K))
    N[:, 0] = 1.0
    N[:, 1] = x
    
    d_Km1 = d_k(x, K - 2)
    for k in range(K - 2):
        N[:, k + 2] = d_k(x, k) - d_Km1
    
    return N
 
 
# For the R-like formula, the penalty matrix for natural splines
# with knots at locations xi_1, ..., xi_K is:
 
def penalty_matrix_closed_form(knots):
    """
    Closed-form penalty matrix for natural cubic splines.
    
    Based on Green & Silverman (1994), Chapter 2.
    """
    K = len(knots)
    
    # This computes the penalty in the 'h' parameterization
    h = np.diff(knots)  # K-1 spacings
    
    # R matrix: (K-2) x K
    R = np.zeros((K - 2, K))
    for j in range(K - 2):
        R[j, j] = 1 / h[j]
        R[j, j + 1] = -1 / h[j] - 1 / h[j + 1]
        R[j, j + 2] = 1 / h[j + 1]
    
    # Q matrix: (K-2) x (K-2)
    Q = np.zeros((K - 2, K - 2))
    for j in range(K - 2):
        Q[j, j] = (h[j] + h[j + 1]) / 3
        if j < K - 3:
            Q[j, j + 1] = h[j + 1] / 6
            Q[j + 1, j] = h[j + 1] / 6
    
    # Full penalty matrix
    Omega_full = R.T @ np.linalg.solve(Q, R)
    
    return Omega_full

Computational Complexity

With $n$ observations and $K$ knots: (1) Building the design matrix is $O(nK)$; (2) Solving the normal equations is $O(nK^2 + K^3)$; (3) For smoothing splines with $K = n$ knots, special algorithms exploit banded structure to achieve $O(n)$ complexity. This makes smoothing splines practical even for large datasets.

Understanding Degrees of Freedom

The concept of degrees of freedom is subtly different for natural splines than for parametric models.

Parametric Degrees of Freedom:

For a natural spline with $K$ knots used as fixed basis functions, the model has exactly $K$ parameters. The residual degrees of freedom is $n - K$, just like ordinary linear regression with $K$ predictors.

Effective Degrees of Freedom (Smoothing Splines):

For smoothing splines with penalty $\lambda$, the effective degrees of freedom is:

$$\text{df}(\lambda) = \text{tr}\left(\mathbf{S}_\lambda\right)$$

where $\mathbf{S}\lambda$ is the smoother matrix satisfying $\hat{\mathbf{y}} = \mathbf{S}\lambda \mathbf{y}$:

$$\mathbf{S}_\lambda = \mathbf{N}(\mathbf{N}^T\mathbf{N} + \lambda\boldsymbol{\Omega})^{-1}\mathbf{N}^T$$

This generalizes the hat matrix from linear regression.

Degrees of Freedom vs Smoothing Parameter
$\lambda$	df($\lambda$)	Behavior	Interpretation
$\lambda \to 0$	$\to n$	Interpolation	Maximum flexibility, overfits
$\lambda$ small	$\approx n$	Wiggly fit	High variance, low bias
$\lambda$ moderate	$2 < \text{df} < n$	Smooth fit	Bias-variance tradeoff
$\lambda$ large	$\to 2$	Nearly linear	High bias, low variance
$\lambda \to \infty$	$= 2$	Straight line	Just intercept + slope

Why Minimum df = 2?

The penalty $\int [g'']^2,dx$ penalizes curvature but not linear terms. Linear functions ($g(x) = a + bx$) have $g'' = 0$, so they incur zero penalty. The null space of the penalty has dimension 2 (the space of linear functions), which is why the minimum df is 2.

Choosing Degrees of Freedom:

In practice, we often specify the desired df directly rather than $\lambda$. Libraries like R's smooth.spline() accept a df argument and internally solve for the corresponding $\lambda$. This makes tuning more intuitive:

df = 4-6: Very smooth, captures only main trends
df = 8-12: Moderate flexibility, common default
df = 15-25: Captures local features
df > 50: Potentially overfit, use with caution

Rule of Thumb

For exploratory analysis, start with df around $\sqrt{n}$ or $n^{1/3}$. For formal inference, use cross-validation or information criteria (AIC/BIC) to select df. Remember: df controls complexity—too few gives bias, too many gives variance.

Natural Splines in Multiple Regression

Natural splines extend naturally to multiple predictors. For a model with predictors $x_1, \ldots, x_p$, we can include a natural spline transformation of each:

$$y = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_p(x_p) + \epsilon$$

where each $f_j$ is represented as a natural spline: $$f_j(x_j) = \sum_{k=1}^{K_j} \beta_{jk} N_{jk}(x_j)$$

This is the foundation of Generalized Additive Models (GAMs), which we'll cover in a later module.

multiple_natural_splines.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import numpy as np
import patsy
from sklearn.linear_model import LinearRegression
 
# =====================================================
# Example: Multiple Natural Spline Regression
# =====================================================
np.random.seed(42)
 
n = 200
 
# Generate two predictors
x1 = np.random.uniform(0, 10, n)
x2 = np.random.uniform(0, 10, n)
 
# True relationship: nonlinear in both
y_true = np.sin(x1) + 0.5 * (x2 - 5)**2 / 10 - 0.1 * x1 * x2 / 10
y = y_true + np.random.normal(0, 0.5, n)
 
# Create design matrix with natural splines for both predictors
# cr() is patsy's natural (cubic restricted) spline
data_dict = {"y": y, "x1": x1, "x2": x2}
 
# Natural splines with 5 df each
design_formula = "cr(x1, df=5) + cr(x2, df=5)"
design_matrix = patsy.dmatrix(design_formula, data=data_dict, return_type="dataframe")
 
print(f"Design matrix shape: {design_matrix.shape}")
print(f"Columns: {list(design_matrix.columns)}")
 
# Fit regression
model = LinearRegression(fit_intercept=True)
model.fit(design_matrix, y)
 
y_hat = model.predict(design_matrix)
r2 = 1 - np.sum((y - y_hat)**2) / np.sum((y - y.mean())**2)
print(f"R²: {r2:.4f}")
 
# =====================================================
# Visualize Partial Effects
# =====================================================
# To visualize f_1(x1), hold x2 at its mean
 
x1_grid = np.linspace(x1.min(), x1.max(), 100)
x2_mean = np.mean(x2)
 
# Create prediction data
pred_data = {"x1": x1_grid, "x2": np.repeat(x2_mean, 100)}
X_pred = patsy.dmatrix(design_formula, data=pred_data, return_type="dataframe")
 
y_pred = model.predict(X_pred)
 
# The partial effect of x1 is y_pred (with x2 held constant)

Key Considerations for Multiple Predictors:

Total Degrees of Freedom: If each of $p$ predictors uses $K$ df, the total model df is approximately $pK$ (plus intercept). With many predictors, this can grow quickly.
Centering for Interpretation: To interpret each $f_j$ as a 'main effect,' center them so $\sum_i f_j(x_{ij}) = 0$. This is standard in GAM implementations.
Interactions: Natural splines can be combined with interaction terms, e.g., ns(x1):ns(x2) for tensor product splines. Be cautious—this explodes the number of parameters.
Regularization: With many spline terms, regularization (ridge or lasso on the spline coefficients) is often necessary. This is conceptually similar to penalized splines (Page 5).

Natural Splines vs B-splines in Formula Interfaces

In R: ns(x, df=5) creates a natural spline basis; bs(x, df=5) creates a B-spline basis. In Python's patsy: cr(x, df=5) creates natural (restricted cubic) splines; bs(x, df=5) creates B-splines. The natural spline is typically preferred for regression because of its reduced df and stable extrapolation.

Summary: Natural Splines in Perspective

Key Takeaways

•Natural cubic splines are cubic splines constrained to be linear beyond the boundary knots, reducing degrees of freedom from $K+4$ to $K$.
•Variational optimality: Natural splines uniquely minimize the integrated squared second derivative (bending energy) among all interpolating functions.
•Stable extrapolation: Linear tails produce sensible predictions beyond the data range, unlike the wild cubic extrapolation of unconstrained splines.
•Connection to smoothing splines: The solution to the penalized regression problem with roughness penalty is always a natural spline with knots at the data points (Reinsch's theorem).
•Efficient computation: Penalty matrices for natural splines are sparse/banded, enabling $O(n)$ algorithms for smoothing spline problems.
•Effective degrees of freedom: For smoothing splines, df($\lambda$) ranges from 2 (linear fit) to $n$ (interpolation), controlled by the smoothing parameter.
•Natural extension to multiple predictors: Natural splines form the basis of Generalized Additive Models, allowing flexible nonlinear effects for each predictor.

What's Next:

We've established that natural splines are optimal for interpolation and arise as solutions to the smoothing problem. On the next page, we'll dive deep into smoothing splines—the full optimization framework where we directly trade off data fidelity against smoothness, with the smoothing parameter $\lambda$ controlling the balance.

Page Complete

You now understand natural cubic splines: their definition, variational optimality, construction, and relationship to smoothing splines. Natural splines reduce degrees of freedom while ensuring sensible extrapolation, making them the default choice for spline-based regression in most applications.