Loading content...
We've seen two approaches to spline regression:
Regression Splines: Use few knots ($K \ll n$), fit by OLS. Fast but requires careful knot selection.
Smoothing Splines: Use $n$ knots (one per data point), regularize via penalty. Automatic smoothing but involves $n \times n$ matrices.
Penalized Splines (P-splines) combine the advantages of both:
P-splines, introduced by Eilers and Marx (1996), have become the workhorse of flexible regression in practice.
By the end of this page, you will understand the P-spline formulation with difference penalties, why difference penalties approximate derivative penalties, how to select the number of knots and penalty strength, computational advantages of P-splines, and their connection to mixed models. You'll see P-splines as the practical method of choice for most spline regression problems.
The Objective Function:
P-splines minimize a penalized sum of squares:
$$\mathcal{L}(\boldsymbol{\beta}) = |\mathbf{y} - \mathbf{B}\boldsymbol{\beta}|^2 + \lambda |\mathbf{D}_q \boldsymbol{\beta}|^2$$
where:
The key innovation is the difference penalty $|\mathbf{D}_q \boldsymbol{\beta}|^2$ applied to the coefficients rather than an integral of derivatives.
Difference Matrices:
The first-order difference matrix $\mathbf{D}_1$ computes successive differences: $$\mathbf{D}_1 = \begin{pmatrix} -1 & 1 & 0 & \cdots & 0 \ 0 & -1 & 1 & \cdots & 0 \ \vdots & & \ddots & \ddots & \vdots \ 0 & \cdots & 0 & -1 & 1 \end{pmatrix} \in \mathbb{R}^{(p-1) \times p}$$
So $(\mathbf{D}_1 \boldsymbol{\beta})j = \beta{j+1} - \beta_j$.
Higher-order differences are computed by applying $\mathbf{D}_1$ repeatedly: $$\mathbf{D}_2 = \mathbf{D}_1^{(p-1)} \cdot \mathbf{D}_1^{(p)} \in \mathbb{R}^{(p-2) \times p}$$
Second differences are: $(\mathbf{D}2 \boldsymbol{\beta})j = \beta{j+2} - 2\beta{j+1} + \beta_j$
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
import numpy as np def difference_matrix(p, order=2): """ Construct difference matrix of given order. Parameters: ----------- p : int Number of columns (B-spline coefficients) order : int Order of differencing (1 = first difference, 2 = second, etc.) Returns: -------- D : array of shape (p - order, p) Difference matrix """ D = np.eye(p) for _ in range(order): # Apply first-order differencing n_rows = D.shape[0] - 1 D_new = np.zeros((n_rows, p)) for i in range(n_rows): D_new[i, :] = D[i + 1, :] - D[i, :] D = D_new return D def build_penalty_matrix(p, order=2): """ Construct the penalty matrix P = D^T @ D. The penalty is beta^T @ P @ beta = ||D @ beta||^2 """ D = difference_matrix(p, order) return D.T @ D # Example: Build difference matricesp = 8 # Number of B-spline coefficients D1 = difference_matrix(p, order=1)D2 = difference_matrix(p, order=2)D3 = difference_matrix(p, order=3) print("First-order difference matrix D1 (shape:", D1.shape, "):")print(D1)print()print("Second-order difference matrix D2 (shape:", D2.shape, "):")print(D2) # Show what D2 does to a coefficient vectorbeta = np.array([1, 2, 4, 7, 11, 16, 22, 29], dtype=float) # Quadratic-ishprint("Beta:", beta)print("Second differences (D2 @ beta):", D2 @ beta)# For a perfect quadratic, second differences are constant!Second-order differences penalize changes in slope (discrete curvature). This is analogous to the integral of the squared second derivative in smoothing splines. For B-splines with equally-spaced knots, the second difference of coefficients approximates the second derivative of the spline curve. Using $q=2$ (second-order penalty) is the standard choice for cubic P-splines.
Deriving the Solution:
The P-spline objective is: $$\mathcal{L}(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{B}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{B}\boldsymbol{\beta}) + \lambda \boldsymbol{\beta}^T \mathbf{P} \boldsymbol{\beta}$$
where $\mathbf{P} = \mathbf{D}_q^T \mathbf{D}_q$ is the penalty matrix.
Differentiating and setting to zero: $$\frac{\partial \mathcal{L}}{\partial \boldsymbol{\beta}} = -2\mathbf{B}^T(\mathbf{y} - \mathbf{B}\boldsymbol{\beta}) + 2\lambda \mathbf{P} \boldsymbol{\beta} = \mathbf{0}$$
Rearranging: $$\mathbf{B}^T \mathbf{B} \boldsymbol{\beta} + \lambda \mathbf{P} \boldsymbol{\beta} = \mathbf{B}^T \mathbf{y}$$ $$(\mathbf{B}^T \mathbf{B} + \lambda \mathbf{P}) \boldsymbol{\beta} = \mathbf{B}^T \mathbf{y}$$
Solution: $$\hat{\boldsymbol{\beta}} = (\mathbf{B}^T \mathbf{B} + \lambda \mathbf{P})^{-1} \mathbf{B}^T \mathbf{y}$$
Fitted Values and Smoother Matrix:
The fitted values are: $$\hat{\mathbf{y}} = \mathbf{B} \hat{\boldsymbol{\beta}} = \mathbf{B}(\mathbf{B}^T \mathbf{B} + \lambda \mathbf{P})^{-1} \mathbf{B}^T \mathbf{y} = \mathbf{H}_\lambda \mathbf{y}$$
The smoother matrix (or hat matrix) is: $$\mathbf{H}_\lambda = \mathbf{B}(\mathbf{B}^T \mathbf{B} + \lambda \mathbf{P})^{-1} \mathbf{B}^T$$
This is an $n \times n$ matrix but involves inverting only a $p \times p$ matrix ($p$ is the number of B-spline basis functions, typically 20-40).
Effective Degrees of Freedom: $$\text{df}(\lambda) = \text{tr}(\mathbf{H}_\lambda)$$
This ranges from $p$ (when $\lambda = 0$) to $q$ (when $\lambda \to \infty$), where $q$ is the order of the difference penalty.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
import numpy as npfrom scipy.interpolate import BSpline def fit_pspline(x, y, n_basis=25, degree=3, diff_order=2, lambda_val=1.0): """ Fit a P-spline to data. Parameters: ----------- x, y : arrays Data n_basis : int Number of B-spline basis functions degree : int B-spline degree diff_order : int Order of difference penalty (typically 2) lambda_val : float Smoothing parameter Returns: -------- dict with keys: 'beta': fitted coefficients 'y_hat': fitted values 'smoother': smoother matrix 'df': effective degrees of freedom 'knots': extended knot sequence """ n = len(x) x_min, x_max = x.min(), x.max() # Build equally-spaced interior knots # n_basis = n_interior_knots + degree + 1 n_interior = n_basis - degree - 1 interior_knots = np.linspace(x_min, x_max, n_interior + 2)[1:-1] # Extended knot sequence t = np.concatenate([ np.repeat(x_min, degree + 1), interior_knots, np.repeat(x_max, degree + 1) ]) # Build B-spline design matrix B = np.zeros((n, n_basis)) for j in range(n_basis): c = np.zeros(n_basis) c[j] = 1.0 spl = BSpline(t, c, degree, extrapolate=False) B[:, j] = np.nan_to_num(spl(x), nan=0.0) # Build difference penalty matrix D = difference_matrix(n_basis, diff_order) P = D.T @ D # Solve penalized normal equations BtB = B.T @ B Bty = B.T @ y # (B'B + lambda * P) beta = B'y A = BtB + lambda_val * P beta = np.linalg.solve(A, Bty) # Fitted values y_hat = B @ beta # Smoother matrix: H = B (B'B + lambda*P)^{-1} B' A_inv = np.linalg.inv(A) H = B @ A_inv @ B.T # Effective degrees of freedom df = np.trace(H) return { 'beta': beta, 'y_hat': y_hat, 'smoother': H, 'df': df, 'knots': t, 'B': B, 'P': P } def difference_matrix(p, order=2): D = np.eye(p) for _ in range(order): n_rows = D.shape[0] - 1 D_new = np.zeros((n_rows, p)) for i in range(n_rows): D_new[i, :] = D[i + 1, :] - D[i, :] D = D_new return D # Example: Fit P-splinenp.random.seed(42)n = 200x = np.sort(np.random.uniform(0, 2 * np.pi, n))y_true = np.sin(x) + 0.5 * np.cos(2 * x)y = y_true + np.random.normal(0, 0.4, n) # Fit with different lambda valuesprint("Effect of lambda on P-spline fit:")for lam in [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]: result = fit_pspline(x, y, n_basis=25, lambda_val=lam) mse = np.mean((y_true - result['y_hat']) ** 2) print(f"lambda = {lam:7.3f}: df = {result['df']:5.2f}, MSE = {mse:.4f}")For P-splines with $p$ basis functions, we invert a $p \times p$ matrix (cost $O(p^3)$), not $n \times n$. With $p \approx 25$ and $n = 10000$, this is $25^3 / 10000^3 \approx 10^{-9}$ times the work of naive smoothing splines! Even for large datasets, P-spline fitting is essentially instant.
Unlike regression splines where the number of knots directly controls flexibility, P-splines use many knots (to ensure sufficient flexibility) and let the penalty control smoothness.
Key Insight: As long as there are 'enough' basis functions to capture the true curve, adding more doesn't hurt—the penalty prevents overfitting.
Practical Guidance:
| $p$ (basis) | Flexibility | Effect of $\lambda$ | Recommendation |
|---|---|---|---|
| 10 | Limited | May underfit even at $\lambda=0$ | Too few for complex functions |
| 20 | Moderate | Good control, may miss details | Minimum for most cases |
| 30-40 | High | Excellent control | Default choice |
| 50+ | Very high | Strong penalty needed | Only for very complex patterns |
| 100+ | Excessive | High computational cost | Rarely necessary |
Unlike regression splines, P-splines don't overfit when you have 'too many' basis functions—the penalty takes care of it. This means you can safely err on the side of more basis functions. The only costs are (1) slightly higher computation and (2) needing a larger penalty value.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
import numpy as npfrom scipy.optimize import minimize_scalar def cv_score(lambda_val, x, y, n_basis, degree, diff_order): """Compute GCV score for given lambda.""" result = fit_pspline(x, y, n_basis, degree, diff_order, lambda_val) n = len(y) df = result['df'] rss = np.sum((y - result['y_hat']) ** 2) # GCV gcv = (rss / n) / ((1 - df / n) ** 2) return gcv def optimal_lambda_gcv(x, y, n_basis=25, degree=3, diff_order=2): """Find lambda that minimizes GCV.""" # Grid search then refinement lambdas = np.logspace(-6, 6, 50) gcv_scores = [cv_score(lam, x, y, n_basis, degree, diff_order) for lam in lambdas] best_idx = np.argmin(gcv_scores) return lambdas[best_idx], gcv_scores[best_idx] # Experiment: vary number of basis functionsnp.random.seed(42)n = 150x = np.sort(np.random.uniform(0, 2 * np.pi, n))y_true = np.sin(x) + 0.3 * np.sin(3 * x)y = y_true + np.random.normal(0, 0.35, n) print("Effect of number of basis functions (with GCV-optimal lambda):")print(f"{'n_basis':>8} | {'opt_lambda':>12} | {'df':>6} | {'MSE':>8}")print("-" * 45) for n_basis in [10, 15, 20, 25, 30, 40, 50]: opt_lambda, _ = optimal_lambda_gcv(x, y, n_basis) result = fit_pspline(x, y, n_basis, lambda_val=opt_lambda) mse = np.mean((y_true - result['y_hat']) ** 2) print(f"{n_basis:>8} | {opt_lambda:>12.4f} | {result['df']:>6.2f} | {mse:>8.4f}") # Key observation: Once n_basis is large enough (~20+), # the MSE stabilizes because lambda adaptsAs with smoothing splines, the key tuning decision is selecting $\lambda$. The same methods apply:
1. Cross-Validation (LOOCV or K-Fold): Minimize estimated out-of-sample prediction error.
2. Generalized Cross-Validation (GCV): $$\text{GCV}(\lambda) = \frac{\text{RSS}(\lambda)/n}{(1 - \text{df}(\lambda)/n)^2}$$
3. Information Criteria (AIC, BIC): $$\text{AIC}(\lambda) = n \log(\text{RSS}/n) + 2 \cdot \text{df}(\lambda)$$
4. REML (Restricted Maximum Likelihood): From the mixed model perspective (see Section 7), $\lambda$ is related to variance components, estimated via REML.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
import numpy as npimport matplotlib.pyplot as plt def compute_criteria(result, y, n): """Compute various selection criteria.""" y_hat = result['y_hat'] df = result['df'] rss = np.sum((y - y_hat) ** 2) # GCV gcv = (rss / n) / ((1 - df / n) ** 2) # AIC aic = n * np.log(rss / n) + 2 * df # BIC bic = n * np.log(rss / n) + np.log(n) * df # LOOCV (using shortcut via hat matrix diagonal) H = result['smoother'] h_diag = np.diag(H) loo_residuals = (y - y_hat) / (1 - h_diag + 1e-10) loocv = np.mean(loo_residuals ** 2) return {'GCV': gcv, 'AIC': aic, 'BIC': bic, 'LOOCV': loocv, 'df': df} def find_optimal_lambda(x, y, n_basis=25, criterion='GCV'): """Find optimal lambda by minimizing specified criterion.""" lambdas = np.logspace(-6, 6, 100) best_lambda = None best_score = np.inf all_scores = [] for lam in lambdas: result = fit_pspline(x, y, n_basis, lambda_val=lam) scores = compute_criteria(result, y, len(y)) all_scores.append(scores) if scores[criterion] < best_score: best_score = scores[criterion] best_lambda = lam return best_lambda, all_scores, lambdas # Example: Compare different selection criterianp.random.seed(42)n = 120x = np.sort(np.random.uniform(0, 2 * np.pi, n))y_true = np.sin(x) + 0.4 * np.cos(2.5 * x)y = y_true + np.random.normal(0, 0.4, n) print("Optimal lambda by different criteria:")for criterion in ['GCV', 'AIC', 'BIC', 'LOOCV']: opt_lambda, scores, lambdas = find_optimal_lambda(x, y, criterion=criterion) # Evaluate at optimal result = fit_pspline(x, y, n_basis=25, lambda_val=opt_lambda) mse = np.mean((y_true - result['y_hat']) ** 2) print(f"{criterion:>6}: lambda = {opt_lambda:>10.4f}, df = {result['df']:>5.2f}, MSE = {mse:.4f}") # Plot criteria curves_, all_scores, lambdas = find_optimal_lambda(x, y, criterion='GCV') fig, axes = plt.subplots(2, 2, figsize=(12, 8))criteria_names = ['GCV', 'AIC', 'BIC', 'LOOCV'] for ax, crit in zip(axes.flatten(), criteria_names): values = [s[crit] for s in all_scores] ax.semilogx(lambdas, values, 'b-', lw=1.5) ax.axvline(lambdas[np.argmin(values)], color='red', linestyle='--', label=f'Optimal: {lambdas[np.argmin(values)]:.3f}') ax.set_xlabel('lambda') ax.set_ylabel(crit) ax.set_title(f'{crit} vs lambda') ax.legend() plt.tight_layout()plt.savefig('lambda_selection.png', dpi=150)GCV and LOOCV typically select similar lambda values (more flexibility). BIC is more conservative. In practice, GCV is the default choice for P-splines because it's fast to compute and well-validated. If interpretability matters more than prediction, consider BIC.
A remarkable fact: P-splines with second-order difference penalty approximate smoothing splines with integrated second derivative penalty. This is why they work so well with far fewer basis functions.
The Connection:
For equally-spaced B-splines with knot spacing $h$, the second difference of coefficients approximates the second derivative of the spline:
$$\frac{\beta_{j+1} - 2\beta_j + \beta_{j-1}}{h^2} \approx s''(x_j)$$
where $x_j$ is near the center of the $j$-th B-spline.
Summing over $j$: $$\frac{1}{h^2} |\mathbf{D}_2 \boldsymbol{\beta}|^2 \approx \int [s''(x)]^2 , dx$$
So the discrete penalty is a Riemann sum approximation to the integral penalty!
With 20-40 equally-spaced B-splines, the P-spline approximation to a smoothing spline is typically accurate to within 1-2 effective degrees of freedom. For practical purposes, the fitted curves are visually indistinguishable. This is why P-splines have largely replaced smoothing splines in modern practice.
Why This Matters:
Computational savings: P-splines with 30 basis functions cost $O(30^3) = O(27000)$ to fit. Smoothing splines with $n = 10000$ cost $O(10000^3)$ naively (though efficient algorithms reduce this).
Memory savings: Store a $30 \times 30$ matrix instead of $10000 \times 10000$.
Same effective result: The fitted curve is nearly identical.
Greater flexibility: The discrete penalty extends easily to other contexts (spatial, tensor products) where integral penalties are intractable.
| Aspect | Smoothing Splines | P-splines ($p$ basis) |
|---|---|---|
| Number of parameters | $n$ | $p \ll n$ |
| Matrix to invert | $n \times n$ | $p \times p$ |
| Time complexity (naive) | $O(n^3)$ | $O(np^2 + p^3)$ |
| Time complexity (efficient) | $O(n)$ (banded) | $O(np^2 + p^3)$ |
| Memory | $O(n^2)$ or $O(n)$ banded | $O(np + p^2)$ |
| Applicable dimensions | 1D only (efficient) | Generalizes to tensor products |
| Knot positions | At data points | Equally-spaced (typically) |
P-splines extend naturally to multiple dimensions using tensor product bases. This is a major advantage over smoothing splines, which don't have efficient higher-dimensional analogues.
Tensor Product P-splines for 2D:
For a function $f(x_1, x_2)$, use: $$f(x_1, x_2) = \sum_j \sum_k \beta_{jk} B_j(x_1) B_k(x_2)$$
The penalty extends as: $$\lambda_1 |\mathbf{D}_1 \boldsymbol{\beta}|^2 + \lambda_2 |\boldsymbol{\beta} \mathbf{D}_2^T|^2$$
penalizing roughness in each direction separately.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
import numpy as npfrom scipy.interpolate import BSplinefrom itertools import product def build_2d_pspline_basis(x1, x2, n_basis_1=15, n_basis_2=15, degree=3): """ Build tensor product B-spline basis for 2D P-spline. Parameters: ----------- x1, x2 : arrays of shape (n,) Predictor values (paired observations) n_basis_1, n_basis_2 : int Number of basis functions in each dimension degree : int B-spline degree Returns: -------- B : array of shape (n, n_basis_1 * n_basis_2) Tensor product design matrix """ n = len(x1) # Build 1D bases def build_1d_basis(x, n_basis): x_min, x_max = x.min(), x.max() n_interior = n_basis - degree - 1 interior_knots = np.linspace(x_min, x_max, n_interior + 2)[1:-1] t = np.concatenate([ np.repeat(x_min, degree + 1), interior_knots, np.repeat(x_max, degree + 1) ]) B = np.zeros((len(x), n_basis)) for j in range(n_basis): c = np.zeros(n_basis) c[j] = 1.0 spl = BSpline(t, c, degree, extrapolate=False) B[:, j] = np.nan_to_num(spl(x), nan=0.0) return B B1 = build_1d_basis(x1, n_basis_1) # n x n_basis_1 B2 = build_1d_basis(x2, n_basis_2) # n x n_basis_2 # Tensor product: B[i, j*n_basis_2 + k] = B1[i,j] * B2[i,k] # This is the row-wise Kronecker product B = np.zeros((n, n_basis_1 * n_basis_2)) for i in range(n): B[i, :] = np.outer(B1[i, :], B2[i, :]).flatten() return B, B1, B2 def build_2d_penalty(n_basis_1, n_basis_2, diff_order=2, lambda_1=1.0, lambda_2=1.0): """ Build 2D penalty matrix for tensor product P-spline. Penalty is: lambda_1 * ||D_1 ⊗ I_2||^2 + lambda_2 * ||I_1 ⊗ D_2||^2 """ D1 = difference_matrix(n_basis_1, diff_order) D2 = difference_matrix(n_basis_2, diff_order) I1 = np.eye(n_basis_1) I2 = np.eye(n_basis_2) # Kronecker products for penalties in each direction P1 = np.kron(D1.T @ D1, I2) # Penalty in x1 direction P2 = np.kron(I1, D2.T @ D2) # Penalty in x2 direction P = lambda_1 * P1 + lambda_2 * P2 return P # Example: Fit 2D P-splinenp.random.seed(42)n = 500 # Generate 2D datax1 = np.random.uniform(0, 1, n)x2 = np.random.uniform(0, 1, n) # True surface: peaks and valleysy_true = (np.sin(4 * np.pi * x1) * np.cos(4 * np.pi * x2) + 0.5 * np.exp(-((x1 - 0.5)**2 + (x2 - 0.5)**2) / 0.1))y = y_true + np.random.normal(0, 0.2, n) # Fit 2D P-splinen_basis_1, n_basis_2 = 12, 12B, B1, B2 = build_2d_pspline_basis(x1, x2, n_basis_1, n_basis_2)P = build_2d_penalty(n_basis_1, n_basis_2, lambda_1=10, lambda_2=10) # SolveBtB = B.T @ BBty = B.T @ ybeta = np.linalg.solve(BtB + P, Bty) y_hat = B @ betamse = np.mean((y_true - y_hat) ** 2)print(f"2D P-spline: {n_basis_1 * n_basis_2} basis functions, MSE = {mse:.4f}")Tensor product P-splines have $p_1 \times p_2 \times \cdots$ parameters. With 20 basis per dimension: 1D = 20, 2D = 400, 3D = 8000, 4D = 160000! This exponential growth limits tensor products to ~3 dimensions. For higher dimensions, use additive models (GAMs) or other sparse approaches.
A powerful perspective: P-splines can be reformulated as linear mixed models. This connection enables:
The Reparameterization:
The P-spline coefficients $\boldsymbol{\beta}$ can be decomposed as: $$\boldsymbol{\beta} = \mathbf{X}\boldsymbol{\alpha} + \mathbf{Z}\mathbf{u}$$
where:
The Mixed Model:
$$y_i = \mathbf{x}_i^T \boldsymbol{\alpha} + \mathbf{z}_i^T \mathbf{u} + \epsilon_i$$
with:
REML Estimation:
The smoothing parameter $\lambda$ emerges as a ratio of variance components, estimated by Restricted Maximum Likelihood (REML). This provides:
The mixed model representation is implemented in R's mgcv package (the 's()' function in gam()) and Python's statsmodels (via the GAM class). These use REML for automatic smoothness selection, providing a fully automated P-spline fitting procedure. This is often the easiest way to use P-splines in practice.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
# In practice, use libraries that implement P-splines with automatic smoothing # Python: using pygam# pip install pygam """from pygam import LinearGAM, s # Fit P-spline with automatic smoothing selectionmodel = LinearGAM(s(0, n_splines=25)).fit(x, y) # Get fitted values and confidence intervalsy_hat = model.predict(x)conf_int = model.confidence_intervals(x, width=0.95) # Get effective degrees of freedomprint(f"Effective df: {model.statistics_['edof']}")""" # Python: using statsmodels"""import statsmodels.api as smfrom patsy import dmatrix # Create B-spline basis with 25 knotsbasis_formula = "bs(x, df=25, degree=3) - 1"B = dmatrix(basis_formula, {"x": x}) # For penalized estimation, use BSplines directlyfrom statsmodels.gam.api import GLMGam, BSplines # Define spline termspline = BSplines(x, df=25, degree=3) # Fit with automatic selection (via GCV or similar)model = GLMGam.from_formula("y ~ 1", {"y": y}, smoother=spline)result = model.fit() print(result.summary())""" # Demonstration: Mixed model view conceptuallyimport numpy as np def decompose_penalty(n_basis, diff_order=2): """ Decompose B-spline basis into fixed and random components. Returns X (null space of penalty) and Z (penalized space). """ D = difference_matrix(n_basis, diff_order) P = D.T @ D # Eigendecomposition of P eigenvalues, eigenvectors = np.linalg.eigh(P) # Null space: eigenvectors with eigenvalue ~0 null_mask = eigenvalues < 1e-10 X = eigenvectors[:, null_mask] # Fixed effects design # Range space: eigenvectors with positive eigenvalues range_mask = ~null_mask Z = eigenvectors[:, range_mask] # Random effects design # Eigenvalues for scaling d = eigenvalues[range_mask] return X, Z, d # Examplen_basis = 25X, Z, d = decompose_penalty(n_basis, diff_order=2) print(f"Penalty matrix rank: {len(d)}")print(f"Null space dimension: {X.shape[1]} (= diff_order)")print(f"Random effects dimension: {Z.shape[1]}")print(f"Null space corresponds to linear functions (degree < diff_order)")| Pitfall | Symptom | Solution |
|---|---|---|
| Too few basis functions | Underfit, $\lambda = 0$ still wiggly | Increase $p$ to 30+ |
| Lambda too small | Overfitting, wiggly | Let GCV/REML select $\lambda$ |
| Lambda too large | Underfitting, too smooth | Let GCV/REML select $\lambda$ |
| Non-equally-spaced knots | Penalty interpretation breaks | Use equally-spaced knots |
| Data outside knot range | Poor fit at boundaries | Extend knots slightly beyond data range |
For 1D P-splines: 25-30 equally-spaced cubic B-splines + second-order difference penalty + GCV or REML selection. This works remarkably well across a wide range of problems with minimal tuning.
Module Complete!
You have now completed the Spline Regression module, covering:
These methods form the foundation for flexible nonparametric regression in machine learning and statistics. P-splines, in particular, are the workhorse behind Generalized Additive Models (GAMs), which we'll encounter in a subsequent module.
You now have a comprehensive understanding of spline regression methods: from the foundational B-spline construction through natural splines, smoothing splines, knot selection strategies, and the practical P-spline framework. These tools enable flexible, interpretable modeling of nonlinear relationships in continuous data.