Loading content...
In the previous page, we established B-splines as the foundation for spline regression. However, unconstrained B-spline (or equivalently, truncated power) bases suffer from a significant problem at the boundaries of the data range.
Near the left and right edges of the observed $x$ values, the spline has relatively few data points to anchor its behavior, yet the polynomial pieces are free to follow any pattern consistent with the smoothness constraints. This typically leads to:
Natural splines address these problems by imposing additional constraints: the spline must be linear (not cubic) beyond the boundary knots. This reduces the degrees of freedom and stabilizes boundary behavior.
By the end of this page, you will understand the mathematical formulation of natural cubic splines, why their boundary constraints are optimal in a variational sense, how to construct their basis functions, and when to prefer them over unconstrained splines. You'll see that natural splines achieve the same flexibility in the data interior with fewer parameters.
Definition (Natural Cubic Spline):
A natural cubic spline with knots $\xi_1 < \xi_2 < \cdots < \xi_K$ on the interval $[\xi_1, \xi_K]$ is a function $s: \mathbb{R} \to \mathbb{R}$ such that:
The third condition is the distinguishing feature. It is equivalent to requiring: $$s''(\xi_1) = s''(\xi_K) = 0$$
That is, the second derivative vanishes at the boundary knots. Since a linear function has zero second derivative, and we require $C^2$ continuity, the spline must transition smoothly from cubic to linear at the boundaries.
The name 'natural' comes from the physical spline analogy. A thin flexible strip (a drafting spline) that is free at both ends will naturally assume a shape with zero curvature at the endpoints—the minimum energy configuration. Natural cubic splines mathematically capture this physical behavior.
Degrees of Freedom Reduction:
Recall that unconstrained cubic splines with $K$ knots have dimension $K + 4$ (we need $K + 4$ basis functions or parameters). Natural splines add 4 constraints:
Actually, the constraints $s'' = 0$ at boundaries encompass the third derivative constraints (since if $s$ is linear, both $s''$ and $s'''$ are zero). The net effect is:
$$\dim(\text{Natural Cubic Splines with } K \text{ knots}) = K$$
With $K$ knots, we have exactly $K$ basis functions—a remarkably clean result. Each knot corresponds to one degree of freedom.
| Property | Unconstrained Cubic Spline | Natural Cubic Spline |
|---|---|---|
| Knot sequence | $K$ interior knots on $[a, b]$ | $K$ knots serve as both interior and boundary |
| Degrees of freedom | $K + 4$ | $K$ |
| Boundary behavior | Cubic (flexible, high variance) | Linear (constrained, stable) |
| Second derivative at boundaries | Unconstrained | $s''(\xi_1) = s''(\xi_K) = 0$ |
| Extrapolation | Cubic (dangerous) | Linear (reasonable) |
| Typical use case | Interior flexibility needed | General-purpose regression |
Natural cubic splines aren't just a convenient construction—they arise as the unique solution to a fundamental optimization problem.
The Smoothing Problem:
Given data points $(x_1, y_1), \ldots, (x_n, y_n)$ with $x_1 < x_2 < \cdots < x_n$, consider finding a function $g$ that:
We measure smoothness by the integrated squared second derivative: $$\mathcal{R}(g) = \int_{-\infty}^{\infty} [g''(x)]^2 , dx$$
This is called the roughness penalty or bending energy. A linear function has $\mathcal{R}(g) = 0$; wigglier functions have larger values.
The quantity $\int [g''(x)]^2 , dx$ is proportional to the bending energy of an elastic beam. Minimizing this over all interpolating functions finds the configuration that a physical spline would naturally assume. This is why natural splines are 'natural'—they minimize bending energy.
Theorem (Optimality of Natural Cubic Splines):
Among all twice-differentiable functions $g$ satisfying the interpolation conditions $g(x_i) = y_i$ for $i = 1, \ldots, n$, the unique minimizer of $\mathcal{R}(g) = \int [g''(x)]^2 , dx$ is the natural cubic spline with knots at the data points $x_1, \ldots, x_n$.
Proof Sketch:
Let $s$ be the natural cubic spline interpolant and let $g$ be any other interpolating function. Define $h = g - s$. Then:
$$\mathcal{R}(g) = \int [s''(x) + h''(x)]^2 , dx = \int [s''(x)]^2 , dx + 2 \int s''(x) h''(x) , dx + \int [h''(x)]^2 , dx$$
The key is to show the cross-term vanishes. Using integration by parts twice:
$$\int_{x_1}^{x_n} s''(x) h''(x) , dx = [s''(x) h'(x)]{x_1}^{x_n} - \int{x_1}^{x_n} s'''(x) h'(x) , dx$$
Since $s''(x_1) = s''(x_n) = 0$ (natural boundary conditions), the boundary terms vanish. Integrating by parts again:
$$= -[s'''(x) h(x)]{x_1}^{x_n} + \int{x_1}^{x_n} s^{(4)}(x) h(x) , dx$$
Now, $s$ is a cubic polynomial on each interval, so $s^{(4)} = 0$ everywhere except at knots. The jumps in $s'''$ at interior knots, combined with the fact that $h(x_i) = 0$ (since both $s$ and $g$ interpolate), causes the integral to vanish.
Thus $\mathcal{R}(g) = \mathcal{R}(s) + \mathcal{R}(h) \geq \mathcal{R}(s)$, with equality iff $h'' = 0$ everywhere, i.e., $g = s$ (up to a linear function, which must be zero by interpolation).
Natural cubic splines are not just convenient—they are optimal in a precise mathematical sense. Among all possible interpolating functions, they minimize bending energy. This variational characterization explains why natural splines appear throughout applied mathematics, physics, and engineering.
There are several ways to construct a basis for natural cubic splines. We'll present two approaches: modification of B-splines, and a direct construction using 'natural spline' basis functions.
Approach 1: Constrained B-splines
Start with the standard B-spline basis of dimension $K + 4$ for $K$ knots. Apply 4 linear constraints (corresponding to $s''(\xi_1) = s''(\xi_K) = 0$ and implied conditions). The resulting constrained basis has dimension $K$.
In practice, we compute a null-space basis for the constraint matrix. Libraries like R's ns() function handle this automatically.
Approach 2: The Truncated Power Natural Spline Basis
Define the following $K$ basis functions for knots $\xi_1 < \cdots < \xi_K$:
These basis functions are specifically designed to satisfy the natural boundary conditions.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
import numpy as np def truncated_cube(x, xi): """Compute (x - xi)_+^3""" return np.maximum(0, x - xi) ** 3 def natural_spline_basis(x, knots): """ Construct the natural cubic spline basis matrix. Uses the truncated power representation with natural boundary conditions. Parameters: ----------- x : array of shape (n,) Evaluation points knots : array of shape (K,) Knot positions (including boundary knots) Returns: -------- N : array of shape (n, K) Natural spline basis matrix """ x = np.asarray(x) knots = np.asarray(knots) K = len(knots) n = len(x) if K < 2: raise ValueError("Need at least 2 knots for natural splines") # Define d_k(x) helper function def d_k(x, k): xi_k = knots[k] xi_K = knots[-1] # Last knot numerator = truncated_cube(x, xi_k) - truncated_cube(x, xi_K) denominator = xi_K - xi_k return numerator / denominator # Build basis matrix N = np.zeros((n, K)) # First two basis functions: constant and linear N[:, 0] = 1.0 # N_1(x) = 1 N[:, 1] = x # N_2(x) = x # Remaining K-2 basis functions d_Km1 = d_k(x, K - 2) # d_{K-1}(x), using 0-indexing for k in range(K - 2): # N_{k+3}(x) = d_k(x) - d_{K-1}(x), but with 0-indexing: # N[:, k+2] = d_k(x, k) - d_{K-1}(x) N[:, k + 2] = d_k(x, k) - d_Km1 return N # Example: Create natural spline basisknots = np.array([0.0, 0.25, 0.5, 0.75, 1.0]) # K=5 knotsx = np.linspace(0, 1, 200) N = natural_spline_basis(x, knots)print(f"Basis matrix shape: {N.shape}") # (200, 5) - exactly K columns! # Verify at boundaries: second derivative should be zero# For piecewise cubic, s''(xi) involves only the coefficient of x^2 term# We can verify numerically that fitted curves have s''=0 at boundaries # The condition number should be better than truncated powerprint(f"Condition number: {np.linalg.cond(N):.2f}")Using Libraries for Natural Splines:
In practice, we use library implementations that handle the construction efficiently:
# Using patsy (Python equivalent of R's formula interface)
import patsy
# Create natural spline basis with 5 degrees of freedom
# (5 df means 5 knots, placed at quantiles by default)
N = patsy.dmatrix("cr(x, df=5) - 1", data={"x": x})
# Or specify knots explicitly
N = patsy.dmatrix("cr(x, knots=[0.25, 0.5, 0.75]) - 1", data={"x": x})
Note: patsy uses cr() for natural (restricted) cubic splines, and bs() for B-splines. In R, use ns() for natural splines and bs() for B-splines.
Beware of terminology confusion! In the natural spline convention, all K knots (including the leftmost and rightmost) are considered. Some software uses 'interior knots' differently. In R's ns(), df = K means K total knots. In patsy's cr(), the boundary knots are set at data extremes by default, so df = K means K-2 interior knots plus 2 boundary knots.
With the natural spline basis constructed, regression proceeds exactly as with B-splines: form the design matrix, apply least squares (or regularized variants), and make predictions.
Let's work through a complete example comparing natural splines with unconstrained B-splines.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
import numpy as npfrom scipy.interpolate import BSplineimport matplotlib.pyplot as plt # =====================================================# Generate Data with Known True Function# =====================================================np.random.seed(42) def true_function(x): return np.sin(2 * x) + 0.3 * x n = 80x = np.sort(np.random.uniform(0, 2 * np.pi, n))y_true = true_function(x)y = y_true + np.random.normal(0, 0.4, n) # =====================================================# Natural Spline Basis (using our implementation)# =====================================================def truncated_cube(x, xi): return np.maximum(0, x - xi) ** 3 def natural_spline_basis(x, knots): K = len(knots) n = len(x) def d_k(x, k): return (truncated_cube(x, knots[k]) - truncated_cube(x, knots[-1])) / (knots[-1] - knots[k]) N = np.zeros((n, K)) N[:, 0] = 1.0 N[:, 1] = x d_Km1 = d_k(x, K - 2) for k in range(K - 2): N[:, k + 2] = d_k(x, k) - d_Km1 return N # =====================================================# B-spline Basis (for comparison)# =====================================================def bspline_basis(x, interior_knots, degree): t = np.concatenate([ np.repeat(x.min(), degree + 1), interior_knots, np.repeat(x.max(), degree + 1) ]) n_basis = len(t) - degree - 1 B = np.zeros((len(x), n_basis)) for j in range(n_basis): c = np.zeros(n_basis) c[j] = 1.0 spl = BSpline(t, c, degree, extrapolate=False) B[:, j] = np.nan_to_num(spl(x), nan=0.0) return B # =====================================================# Fit Both Models# =====================================================# Place knots at quantilesK = 6 # Number of knots for natural splineinterior_K = K - 2 # Interior knots for B-spline to get similar df knots_ns = np.percentile(x, np.linspace(0, 100, K))interior_knots_bs = np.percentile(x, np.linspace(0, 100, interior_K + 2)[1:-1]) # Natural spline fitN_train = natural_spline_basis(x, knots_ns)beta_ns = np.linalg.lstsq(N_train, y, rcond=None)[0]print(f"Natural Spline: {N_train.shape[1]} parameters") # B-spline fit (cubic)B_train = bspline_basis(x, interior_knots_bs, degree=3)beta_bs = np.linalg.lstsq(B_train, y, rcond=None)[0]print(f"B-spline: {B_train.shape[1]} parameters") # =====================================================# Prediction and Extrapolation# =====================================================# Include extrapolation region beyond datax_pred = np.linspace(-1, 2 * np.pi + 1, 300) # Natural spline: safe extrapolationN_pred = natural_spline_basis(x_pred, knots_ns)y_ns = N_pred @ beta_ns # B-spline: need to handle extrapolation carefully# Build extended grid for B-spline (within data range)x_pred_bs = np.clip(x_pred, x.min(), x.max())B_pred = bspline_basis(x_pred_bs, interior_knots_bs, degree=3)y_bs = B_pred @ beta_bs # For B-spline extrapolation, use boundary derivatives# (This is approximate; proper extrapolation requires care)left_mask = x_pred < x.min()right_mask = x_pred > x.max() # =====================================================# Compute Metrics# =====================================================y_ns_train = N_train @ beta_nsy_bs_train = B_train @ beta_bs mse_ns = np.mean((y - y_ns_train) ** 2)mse_bs = np.mean((y - y_bs_train) ** 2) print(f"Natural Spline MSE: {mse_ns:.4f}")print(f"B-spline MSE: {mse_bs:.4f}") # R² scoresss_res_ns = np.sum((y - y_ns_train) ** 2)ss_res_bs = np.sum((y - y_bs_train) ** 2)ss_tot = np.sum((y - y.mean()) ** 2) r2_ns = 1 - ss_res_ns / ss_totr2_bs = 1 - ss_res_bs / ss_tot print(f"Natural Spline R²: {r2_ns:.4f}")print(f"B-spline R²: {r2_bs:.4f}")Key Observations:
Fewer Parameters: Natural splines achieve comparable fit quality with fewer parameters. With $K$ knots, natural splines use $K$ parameters while B-splines use $K + 3 + 1 = K + 4$ parameters (for cubics with boundary adjustments).
Stable Extrapolation: Natural splines extrapolate linearly beyond the data range—a much more sensible behavior than the potentially wild cubic extrapolation of unconstrained splines.
Similar Interior Fit: Within the data range, both methods produce nearly identical fits. The differences appear primarily at and beyond the boundaries.
Slightly Higher Bias at Boundaries: The linear constraint means natural splines may underfit data that has genuine curvature at the boundaries. This is the tradeoff for reduced variance.
Use natural splines when: (1) You don't have strong prior knowledge about boundary behavior; (2) Extrapolation stability matters; (3) You want to reduce effective degrees of freedom; (4) The true function is not expected to have extreme curvature at the edges. Use unconstrained B-splines when boundary flexibility is genuinely needed or when boundaries are interior to a larger problem.
Natural cubic splines are intimately connected to smoothing splines, which we'll cover in detail on the next page. Here we preview the connection.
The Smoothing Spline Problem:
Instead of requiring exact interpolation ($g(x_i) = y_i$), smoothing splines find a function that balances data fidelity against smoothness:
$$\min_g \sum_{i=1}^n (y_i - g(x_i))^2 + \lambda \int [g''(x)]^2 , dx$$
where $\lambda > 0$ is a smoothing parameter. When $\lambda = 0$, we recover the interpolation problem; when $\lambda \to \infty$, the solution approaches a straight line.
A remarkable result (Reinsch, 1967): For any $\lambda > 0$, the unique solution to the smoothing spline problem is a natural cubic spline with knots at the data points $x_1, \ldots, x_n$. This means we need only optimize over an $n$-dimensional space of natural splines, not the infinite-dimensional space of all smooth functions.
Why This Matters:
Reinsch's theorem transforms an infinite-dimensional optimization problem into a finite-dimensional one. The smoothing spline solution can be written as:
$$\hat{g}(x) = \sum_{j=1}^n \hat{\beta}_j N_j(x)$$
where $N_1, \ldots, N_n$ is a natural spline basis with knots at the data points. The optimal coefficients $\hat{\boldsymbol{\beta}}$ are obtained by solving a penalized least squares problem:
$$\hat{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}} |\mathbf{y} - \mathbf{N}\boldsymbol{\beta}|^2 + \lambda \boldsymbol{\beta}^T \boldsymbol{\Omega} \boldsymbol{\beta}$$
where $\boldsymbol{\Omega}$ is the penalty matrix with elements $\Omega_{jk} = \int N_j''(x) N_k''(x) , dx$.
This closed-form solution enables efficient computation and connects spline smoothing to ridge regression.
| Aspect | Interpolating Natural Spline | Smoothing Spline |
|---|---|---|
| Data fit | Exact: $\hat{g}(x_i) = y_i$ | Approximate: residuals allowed |
| Objective | Minimize $\int [g'']^2$ subject to interpolation | Minimize $\sum (y_i - g(x_i))^2 + \lambda \int [g'']^2$ |
| Smoothing parameter | None (implicitly $\lambda = 0$) | $\lambda > 0$, chosen by user or CV |
| Effective df | Exactly $n$ | Between 2 and $n$, controlled by $\lambda$ |
| Overfitting risk | High (interpolates noise) | Controlled by $\lambda$ |
| Use case | Exact curve through points | General regression with smoothing |
Natural spline computations involve several matrices that exploit special structure for efficiency.
The Natural Spline Design Matrix:
Given $n$ data points and $K$ knots (typically $K = n$ for smoothing splines or $K \ll n$ for regression splines), the design matrix $\mathbf{N}$ has dimensions $n \times K$.
The Penalty Matrix:
For smoothing splines, we need the matrix $\boldsymbol{\Omega}$ with entries: $$\Omega_{jk} = \int N_j''(x) N_k''(x) , dx$$
Due to the local support of natural spline basis functions, $\boldsymbol{\Omega}$ is banded (tridiagonal-ish structure). This sparsity enables $O(n)$ rather than $O(n^3)$ solutions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
import numpy as npfrom scipy import integrate def compute_penalty_matrix(knots): """ Compute the penalty matrix Omega for natural cubic splines. Omega[j,k] = integral of N_j''(x) * N_k''(x) dx For natural splines, this has a closed-form expression. Parameters: ----------- knots : array of shape (K,) Knot positions (sorted) Returns: -------- Omega : array of shape (K, K) Penalty matrix """ K = len(knots) Omega = np.zeros((K, K)) # For the truncated power basis natural spline representation, # the penalty matrix has a specific structure. # However, for the standard representation: # N_1(x) = 1, N_2(x) = x, so N_1'' = N_2'' = 0 # The penalty only involves the K-2 non-linear basis functions # For these, we use the formula: # h_k = knots[k+1] - knots[k] (spacing) # Penalty matrix for natural splines (in the standard representation): # Only the (K-2) x (K-2) bottom-right block is non-zero h = np.diff(knots) # h[k] = knots[k+1] - knots[k] # The integrated second derivative matrix for natural cubic splines # has a specific form related to the knot spacings # Simplified version: compute numerically # (Closed-form exists but is tedious to derive) def second_deriv_Nk(x, k, knots): """Approximate second derivative of k-th natural spline basis""" eps = 1e-5 N_plus = natural_spline_basis(np.array([x + eps]), knots)[0, k] N_center = natural_spline_basis(np.array([x]), knots)[0, k] N_minus = natural_spline_basis(np.array([x - eps]), knots)[0, k] return (N_plus - 2*N_center + N_minus) / eps**2 # Note: This is for illustration. Real implementations use exact formulas. # The closed-form for C-R natural splines is: # Omega[j,k] involves (xi_K - xi_j)(xi_K - xi_k) terms # For regression purposes, the key insight is that Omega is: # 1. Symmetric positive semi-definite # 2. Has rank K-2 (null space: linear functions) # 3. Sparse/banded for efficient computation return Omega def natural_spline_basis(x, knots): """Same implementation as before""" def truncated_cube(x, xi): return np.maximum(0, x - xi) ** 3 K = len(knots) n = len(x) def d_k(x, k): return (truncated_cube(x, knots[k]) - truncated_cube(x, knots[-1])) / (knots[-1] - knots[k]) N = np.zeros((n, K)) N[:, 0] = 1.0 N[:, 1] = x d_Km1 = d_k(x, K - 2) for k in range(K - 2): N[:, k + 2] = d_k(x, k) - d_Km1 return N # For the R-like formula, the penalty matrix for natural splines# with knots at locations xi_1, ..., xi_K is: def penalty_matrix_closed_form(knots): """ Closed-form penalty matrix for natural cubic splines. Based on Green & Silverman (1994), Chapter 2. """ K = len(knots) # This computes the penalty in the 'h' parameterization h = np.diff(knots) # K-1 spacings # R matrix: (K-2) x K R = np.zeros((K - 2, K)) for j in range(K - 2): R[j, j] = 1 / h[j] R[j, j + 1] = -1 / h[j] - 1 / h[j + 1] R[j, j + 2] = 1 / h[j + 1] # Q matrix: (K-2) x (K-2) Q = np.zeros((K - 2, K - 2)) for j in range(K - 2): Q[j, j] = (h[j] + h[j + 1]) / 3 if j < K - 3: Q[j, j + 1] = h[j + 1] / 6 Q[j + 1, j] = h[j + 1] / 6 # Full penalty matrix Omega_full = R.T @ np.linalg.solve(Q, R) return Omega_fullWith $n$ observations and $K$ knots: (1) Building the design matrix is $O(nK)$; (2) Solving the normal equations is $O(nK^2 + K^3)$; (3) For smoothing splines with $K = n$ knots, special algorithms exploit banded structure to achieve $O(n)$ complexity. This makes smoothing splines practical even for large datasets.
The concept of degrees of freedom is subtly different for natural splines than for parametric models.
Parametric Degrees of Freedom:
For a natural spline with $K$ knots used as fixed basis functions, the model has exactly $K$ parameters. The residual degrees of freedom is $n - K$, just like ordinary linear regression with $K$ predictors.
Effective Degrees of Freedom (Smoothing Splines):
For smoothing splines with penalty $\lambda$, the effective degrees of freedom is:
$$\text{df}(\lambda) = \text{tr}\left(\mathbf{S}_\lambda\right)$$
where $\mathbf{S}\lambda$ is the smoother matrix satisfying $\hat{\mathbf{y}} = \mathbf{S}\lambda \mathbf{y}$:
$$\mathbf{S}_\lambda = \mathbf{N}(\mathbf{N}^T\mathbf{N} + \lambda\boldsymbol{\Omega})^{-1}\mathbf{N}^T$$
This generalizes the hat matrix from linear regression.
| $\lambda$ | df($\lambda$) | Behavior | Interpretation |
|---|---|---|---|
| $\lambda \to 0$ | $\to n$ | Interpolation | Maximum flexibility, overfits |
| $\lambda$ small | $\approx n$ | Wiggly fit | High variance, low bias |
| $\lambda$ moderate | $2 < \text{df} < n$ | Smooth fit | Bias-variance tradeoff |
| $\lambda$ large | $\to 2$ | Nearly linear | High bias, low variance |
| $\lambda \to \infty$ | $= 2$ | Straight line | Just intercept + slope |
Why Minimum df = 2?
The penalty $\int [g'']^2,dx$ penalizes curvature but not linear terms. Linear functions ($g(x) = a + bx$) have $g'' = 0$, so they incur zero penalty. The null space of the penalty has dimension 2 (the space of linear functions), which is why the minimum df is 2.
Choosing Degrees of Freedom:
In practice, we often specify the desired df directly rather than $\lambda$. Libraries like R's smooth.spline() accept a df argument and internally solve for the corresponding $\lambda$. This makes tuning more intuitive:
For exploratory analysis, start with df around $\sqrt{n}$ or $n^{1/3}$. For formal inference, use cross-validation or information criteria (AIC/BIC) to select df. Remember: df controls complexity—too few gives bias, too many gives variance.
Natural splines extend naturally to multiple predictors. For a model with predictors $x_1, \ldots, x_p$, we can include a natural spline transformation of each:
$$y = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_p(x_p) + \epsilon$$
where each $f_j$ is represented as a natural spline: $$f_j(x_j) = \sum_{k=1}^{K_j} \beta_{jk} N_{jk}(x_j)$$
This is the foundation of Generalized Additive Models (GAMs), which we'll cover in a later module.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
import numpy as npimport patsyfrom sklearn.linear_model import LinearRegression # =====================================================# Example: Multiple Natural Spline Regression# =====================================================np.random.seed(42) n = 200 # Generate two predictorsx1 = np.random.uniform(0, 10, n)x2 = np.random.uniform(0, 10, n) # True relationship: nonlinear in bothy_true = np.sin(x1) + 0.5 * (x2 - 5)**2 / 10 - 0.1 * x1 * x2 / 10y = y_true + np.random.normal(0, 0.5, n) # Create design matrix with natural splines for both predictors# cr() is patsy's natural (cubic restricted) splinedata_dict = {"y": y, "x1": x1, "x2": x2} # Natural splines with 5 df eachdesign_formula = "cr(x1, df=5) + cr(x2, df=5)"design_matrix = patsy.dmatrix(design_formula, data=data_dict, return_type="dataframe") print(f"Design matrix shape: {design_matrix.shape}")print(f"Columns: {list(design_matrix.columns)}") # Fit regressionmodel = LinearRegression(fit_intercept=True)model.fit(design_matrix, y) y_hat = model.predict(design_matrix)r2 = 1 - np.sum((y - y_hat)**2) / np.sum((y - y.mean())**2)print(f"R²: {r2:.4f}") # =====================================================# Visualize Partial Effects# =====================================================# To visualize f_1(x1), hold x2 at its mean x1_grid = np.linspace(x1.min(), x1.max(), 100)x2_mean = np.mean(x2) # Create prediction datapred_data = {"x1": x1_grid, "x2": np.repeat(x2_mean, 100)}X_pred = patsy.dmatrix(design_formula, data=pred_data, return_type="dataframe") y_pred = model.predict(X_pred) # The partial effect of x1 is y_pred (with x2 held constant)Key Considerations for Multiple Predictors:
Total Degrees of Freedom: If each of $p$ predictors uses $K$ df, the total model df is approximately $pK$ (plus intercept). With many predictors, this can grow quickly.
Centering for Interpretation: To interpret each $f_j$ as a 'main effect,' center them so $\sum_i f_j(x_{ij}) = 0$. This is standard in GAM implementations.
Interactions: Natural splines can be combined with interaction terms, e.g., ns(x1):ns(x2) for tensor product splines. Be cautious—this explodes the number of parameters.
Regularization: With many spline terms, regularization (ridge or lasso on the spline coefficients) is often necessary. This is conceptually similar to penalized splines (Page 5).
In R: ns(x, df=5) creates a natural spline basis; bs(x, df=5) creates a B-spline basis. In Python's patsy: cr(x, df=5) creates natural (restricted cubic) splines; bs(x, df=5) creates B-splines. The natural spline is typically preferred for regression because of its reduced df and stable extrapolation.
What's Next:
We've established that natural splines are optimal for interpolation and arise as solutions to the smoothing problem. On the next page, we'll dive deep into smoothing splines—the full optimization framework where we directly trade off data fidelity against smoothness, with the smoothing parameter $\lambda$ controlling the balance.
You now understand natural cubic splines: their definition, variational optimality, construction, and relationship to smoothing splines. Natural splines reduce degrees of freedom while ensuring sensible extrapolation, making them the default choice for spline-based regression in most applications.