Machine LearningSpline Regression

Spline Regression Methods

LevelIntermediate

Duration90 mins

TopicSpline Regression

1 / 5

Basis Splines (B-splines): The Foundation of Modern Spline Regression

Beyond Polynomial Regression: The Need for Splines

In polynomial regression, we expand a single predictor $x$ into a polynomial basis $(1, x, x^2, \ldots, x^d)$ and fit a linear model in these transformed features. This approach has a fundamental limitation: polynomial bases are global. Each basis function spans the entire range of $x$, meaning that the fitted value at any point depends on all the data—including data far away from that point.

This global nature creates two severe problems:

Runge's phenomenon: High-degree polynomials oscillate wildly near the boundaries, producing poor fits despite low training error.
Numerical instability: The Vandermonde matrix $(x_i^j)$ becomes extremely ill-conditioned as degree increases, causing coefficient estimates to explode.

Splines solve both problems by replacing global polynomial bases with piecewise polynomial bases that have local support—each basis function is non-zero only over a finite interval. The resulting fits adapt locally to the data while maintaining smoothness across the entire domain.

What You Will Learn

By the end of this page, you will understand the mathematical construction of B-splines via the Cox-de Boor recurrence, their remarkable properties (local support, partition of unity, convex hull containment), and how to use them as basis functions for flexible regression modeling. You will see why B-splines are the preferred representation for spline regression across statistics, machine learning, and computer graphics.

What is a Spline?

Before diving into B-splines specifically, we need to understand what a spline is in general.

Definition (Spline Function):

A spline of degree $d$ on the interval $[a, b]$ with knots $a = \xi_0 < \xi_1 < \xi_2 < \cdots < \xi_K < \xi_{K+1} = b$ is a function $s: [a, b] \to \mathbb{R}$ such that:

On each subinterval $[\xi_j, \xi_{j+1})$, the function $s$ is a polynomial of degree at most $d$.
The function $s$ has continuous derivatives up to order $d-1$ at each interior knot $\xi_1, \ldots, \xi_K$.

The knots $\xi_1, \ldots, \xi_K$ are the points where the polynomial pieces join. The smoothness requirement ensures that, despite being piecewise, the spline appears smooth to the eye—no visible corners or jumps up to the $(d-1)$-th derivative.

Historical Note: The Physical Spline

The term 'spline' comes from drafting. Before computers, engineers and ship designers used thin flexible strips of wood or metal (called splines) held in place by heavy lead weights (called 'ducks'). The strip naturally assumed the shape that minimized bending energy—which turns out to be a cubic spline! This physical origin explains why cubic splines (degree 3) are the most common: they minimize curvature while allowing sufficient flexibility.

Degrees of Freedom of a Spline Space:

How many parameters are needed to specify a spline of degree $d$ with $K$ interior knots?

Consider: We have $K+1$ subintervals, each carrying a polynomial of degree $d$, which has $d+1$ coefficients. This gives $(K+1)(d+1)$ free parameters. However, at each of the $K$ interior knots, we impose $d$ continuity conditions (continuity of the function itself and its first $d-1$ derivatives). Thus:

$$\dim(\text{Spline Space}) = (K+1)(d+1) - Kd = K + d + 1$$

This is a fundamental formula. For cubic splines ($d = 3$) with $K$ interior knots, the dimension is $K + 4$. This matches: we need to specify $K$ knot positions plus 4 boundary conditions (or equivalently, choose from a $(K+4)$-dimensional basis).

Spline Space Dimensions for Common Cases
Degree $d$	Name	Continuity	Interior Knots $K$	Dimension
0	Step function	$C^{-1}$ (discontinuous)	$K$	$K + 1$
1	Linear spline	$C^0$ (continuous)	$K$	$K + 2$
2	Quadratic spline	$C^1$ (smooth)	$K$	$K + 3$
3	Cubic spline	$C^2$ (very smooth)	$K$	$K + 4$
$d$	Degree-$d$ spline	$C^{d-1}$	$K$	$K + d + 1$

The Basis Function Approach:

Since the space of splines of degree $d$ with $K$ knots has dimension $K + d + 1$, we can represent any such spline as a linear combination of $K + d + 1$ basis functions:

$$s(x) = \sum_{j=1}^{K+d+1} \beta_j B_j(x)$$

The question is: what basis should we use? The most natural-seeming basis—truncated power functions—turns out to be poorly suited for computation. Enter B-splines.

The Truncated Power Basis (And Why We Avoid It)

The most straightforward basis for the space of degree-$d$ splines with knots $\xi_1, \ldots, \xi_K$ is the truncated power basis:

$${1, x, x^2, \ldots, x^d, (x - \xi_1)+^d, (x - \xi_2)+^d, \ldots, (x - \xi_K)_+^d}$$

where the truncated power function is defined as:

$$(x - \xi)_+^d = \begin{cases} (x - \xi)^d & \text{if } x > \xi \ 0 & \text{if } x \leq \xi \end{cases}$$

This basis is intuitive: the first $d+1$ functions are the standard polynomial basis, and each additional function 'kicks in' at its corresponding knot to allow a new polynomial piece.

truncated_power_basis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
import matplotlib.pyplot as plt
 
def truncated_power(x, xi, d):
    """Compute (x - xi)_+^d"""
    return np.maximum(0, x - xi) ** d
 
def build_truncated_power_design(x, knots, degree):
    """
    Build design matrix for truncated power basis.
    
    For degree d with K knots: columns are
    [1, x, x^2, ..., x^d, (x-xi_1)_+^d, ..., (x-xi_K)_+^d]
    
    Total columns: d + 1 + K
    """
    n = len(x)
    K = len(knots)
    
    # Polynomial part: 1, x, x^2, ..., x^d
    X_poly = np.column_stack([x ** j for j in range(degree + 1)])
    
    # Truncated power part: (x - xi_j)_+^d for each knot
    X_trunc = np.column_stack([truncated_power(x, xi, degree) for xi in knots])
    
    return np.hstack([X_poly, X_trunc])
 
# Example: Cubic splines with knots at 0.25, 0.5, 0.75
x = np.linspace(0, 1, 200)
knots = [0.25, 0.5, 0.75]
degree = 3
 
X = build_truncated_power_design(x, knots, degree)
print(f"Design matrix shape: {X.shape}")  # (200, 7) = 3+1 + 3
 
# Condition number reveals the problem!
print(f"Condition number: {np.linalg.cond(X):.2e}")

Numerical Disaster: Ill-Conditioning

The truncated power basis suffers from severe numerical ill-conditioning. The condition number of the design matrix grows extremely large as the number of knots increases. In double precision arithmetic (about 16 significant digits), a condition number of $10^{12}$ means you lose 12 digits of accuracy in your coefficient estimates! This makes the truncated power basis essentially unusable for practical spline regression with more than a handful of knots.

Why the Ill-Conditioning?

The problem is that truncated power functions are nearly collinear. Consider two adjacent knots $\xi_j$ and $\xi_{j+1}$. The functions $(x - \xi_j)+^d$ and $(x - \xi{j+1})_+^d$ are highly correlated—both are zero below their respective knots, and both grow polynomially above them, differing only by a slight phase shift.

Mathematically, if the knots are closely spaced, the truncated power functions become nearly linearly dependent, causing the design matrix to be nearly singular. The closer the knots, the worse the conditioning.

The Solution: B-Splines

B-splines provide an alternative basis for the exact same spline space that has excellent numerical properties. Instead of functions that 'switch on' at knots, B-splines are bell-shaped functions with compact support—each is non-zero only over a finite interval spanning a few adjacent knots.

B-spline Definition: The Cox-de Boor Recurrence

B-splines (Basis splines) are a numerically stable basis for the spline function space. Unlike the truncated power basis, B-splines have compact support and are defined via an elegant recursive formula.

Extended Knot Sequence:

To construct B-splines of degree $d$ with $K$ interior knots $\xi_1 < \cdots < \xi_K$ on $[a, b]$, we first define an extended knot sequence by adding $d+1$ copies of the boundary values:

$$\underbrace{t_0 = t_1 = \cdots = t_d = a}{d+1 \text{ copies}}, \quad t{d+1} = \xi_1, \quad t_{d+2} = \xi_2, \quad \ldots, \quad t_{d+K} = \xi_K, \quad \underbrace{t_{d+K+1} = \cdots = t_{2d+K+1} = b}_{d+1 \text{ copies}}$$

This extended sequence has $2d + K + 2$ knots in total. The repeated boundary knots ensure that the B-splines interpolate exactly at the endpoints.

Why Repeat Boundary Knots?

Repeating the boundary knots $d+1$ times is crucial. This multiplicity causes the B-splines to sum exactly to 1 over the entire interval $[a,b]$ (the partition of unity property) and ensures that the spline passes through the control points at the boundaries. Without this convention, B-splines would not form a valid basis for the natural spline space.

The Cox-de Boor Recurrence Formula:

B-splines are defined recursively. For notational convenience, let $B_{j,k}(x)$ denote the $j$-th B-spline of degree $k$ (so $k$ goes from 0 to $d$).

Base case (degree 0): $$B_{j,0}(x) = \begin{cases} 1 & \text{if } t_j \leq x < t_{j+1} \ 0 & \text{otherwise} \end{cases}$$

This is simply the indicator function for the interval $[t_j, t_{j+1})$—a rectangular pulse.

Recursive case (degree $k \geq 1$): $$B_{j,k}(x) = \frac{x - t_j}{t_{j+k} - t_j} B_{j,k-1}(x) + \frac{t_{j+k+1} - x}{t_{j+k+1} - t_{j+1}} B_{j+1,k-1}(x)$$

Here, we adopt the convention that $0/0 = 0$ whenever a denominator vanishes (which happens when consecutive knots coincide).

This recurrence is sometimes also written as: $$B_{j,k}(x) = \omega_{j,k}(x) \cdot B_{j,k-1}(x) + \big(1 - \omega_{j+1,k}(x)\big) \cdot B_{j+1,k-1}(x)$$

where $\omega_{j,k}(x) = \frac{x - t_j}{t_{j+k} - t_j}$ is a local linear interpolation weight.

cox_de_boor_recursion.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import numpy as np
 
def cox_de_boor(x, j, k, t):
    """
    Compute B_{j,k}(x) using the Cox-de Boor recurrence.
    
    Parameters:
    -----------
    x : float
        Evaluation point
    j : int
        Index of B-spline (0-indexed)
    k : int
        Degree of B-spline
    t : array
        Extended knot sequence
    
    Returns:
    --------
    float : Value of B_{j,k}(x)
    """
    # Base case: degree 0 (indicator function)
    if k == 0:
        if t[j] <= x < t[j + 1]:
            return 1.0
        else:
            return 0.0
    
    # Recursive case: degree k
    # Handle 0/0 convention: if denominator is 0, that term is 0
    
    # First term: (x - t_j) / (t_{j+k} - t_j) * B_{j,k-1}(x)
    denom1 = t[j + k] - t[j]
    if denom1 == 0:
        term1 = 0.0
    else:
        term1 = ((x - t[j]) / denom1) * cox_de_boor(x, j, k - 1, t)
    
    # Second term: (t_{j+k+1} - x) / (t_{j+k+1} - t_{j+1}) * B_{j+1,k-1}(x)
    denom2 = t[j + k + 1] - t[j + 1]
    if denom2 == 0:
        term2 = 0.0
    else:
        term2 = ((t[j + k + 1] - x) / denom2) * cox_de_boor(x, j + 1, k - 1, t)
    
    return term1 + term2
 
 
def build_extended_knots(interior_knots, a, b, degree):
    """
    Build extended knot sequence with repeated boundary knots.
    
    Parameters:
    -----------
    interior_knots : array
        Interior knots xi_1, ..., xi_K
    a, b : float
        Domain boundaries
    degree : int
        Spline degree
    
    Returns:
    --------
    array : Extended knot sequence
    """
    left_boundary = [a] * (degree + 1)
    right_boundary = [b] * (degree + 1)
    return np.array(left_boundary + list(interior_knots) + right_boundary)
 
 
# Example: Cubic B-splines (degree 3) with 3 interior knots
interior_knots = [0.25, 0.5, 0.75]
a, b = 0.0, 1.0
degree = 3
 
t = build_extended_knots(interior_knots, a, b, degree)
print(f"Extended knot sequence ({len(t)} knots):")
print(t)
 
# Number of B-splines: n = len(t) - degree - 1 = K + degree + 1
n_basis = len(t) - degree - 1
print(f"Number of B-spline basis functions: {n_basis}")

Understanding the Recurrence Geometrically:

The Cox-de Boor formula has a beautiful geometric interpretation:

Start with degree-0 B-splines: rectangular pulses on each knot interval
At each level, each B-spline is a weighted average of two degree-$(k-1)$ B-splines
The weights are linear functions that transition smoothly across the knot span
Each recursion step adds a degree of smoothness

By the time we reach degree $d$, each B-spline $B_{j,d}(x)$ is:

Non-zero only on the interval $[t_j, t_{j+d+1}]$ (local support)
A piecewise polynomial of degree $d$
$C^{d-1}$ continuous everywhere (smooth)
Non-negative everywhere
Bell-shaped (single maximum)

Fundamental Properties of B-splines

B-splines possess a remarkable collection of properties that make them ideal for statistical modeling. These properties are not just mathematically elegant—they translate directly into computational and interpretive advantages.

Essential B-spline Properties

•Local Support: Each B-spline $B_{j,d}(x)$ is non-zero only on the interval $[t_j, t_{j+d+1}]$, spanning exactly $d+1$ knot intervals. This means changes to one coefficient affect only a local portion of the curve.
•Non-negativity: $B_{j,d}(x) \geq 0$ for all $x$. This is immediate from the recurrence: the base case is non-negative, and the recurrence preserves non-negativity.
•Partition of Unity: $\sum_{j=0}^{n-1} B_{j,d}(x) = 1$ for all $x \in [a, b]$, where $n = K + d + 1$ is the number of B-splines. This ensures that the coefficients can be interpreted as weights.
•Convex Hull Property: If $s(x) = \sum_j \beta_j B_{j,d}(x)$ and $L \leq \beta_j \leq U$ for all $j$, then $L \leq s(x) \leq U$ for all $x$. The spline curve lies within the convex hull of its control coefficients.
•Smoothness: Degree-$d$ B-splines are $C^{d-1}$ continuous at simple knots. At a knot of multiplicity $m$, continuity is reduced to $C^{d-m}$. This gives fine control over smoothness.
•Linear Independence: The B-splines ${B_{j,d}}_{j=0}^{n-1}$ are linearly independent, forming a basis for the spline space.

Proof of Partition of Unity:

The partition of unity property is crucial and follows from induction on degree. For degree 0, the B-splines are indicator functions of disjoint intervals, so exactly one equals 1 at any point.

For degree $k$, suppose the property holds for degree $k-1$. Then: $$\sum_j B_{j,k}(x) = \sum_j \left[ \omega_{j,k}(x) B_{j,k-1}(x) + (1-\omega_{j+1,k}(x)) B_{j+1,k-1}(x) \right]$$

Re-indexing the second sum and using $\omega_{j,k}(x) + (1-\omega_{j,k}(x)) = 1$: $$= \sum_j B_{j,k-1}(x) \cdot \left[ \omega_{j,k}(x) + (1-\omega_{j,k}(x)) \right] = \sum_j B_{j,k-1}(x) = 1$$

by the induction hypothesis. The repeated boundary knots ensure this works at the domain boundaries.

Why These Properties Matter for ML

Local support means the design matrix for B-splines is banded (sparse), not dense. Partition of unity enables interpretable coefficients. Non-negativity with positive coefficients guarantees non-negative fitted values (useful for counts, densities). The convex hull property enables uncertainty visualization. These aren't abstract niceties—they're practical advantages.

B-spline Properties by Degree
Degree $d$	Continuity at Simple Knots	Support Width	Shape Characteristics
0	Discontinuous	1 interval	Rectangular pulse
1	$C^0$ (continuous)	2 intervals	Triangular (tent function)
2	$C^1$ (smooth)	3 intervals	Quadratic bump
3	$C^2$ (very smooth)	4 intervals	Bell-shaped curve
$d$	$C^{d-1}$	$d+1$ intervals	Smooth, bell-shaped

Constructing the B-spline Design Matrix

For regression, we evaluate each B-spline basis function at each data point to form a design matrix. Given $n$ observations $x_1, \ldots, x_n$ and $p = K + d + 1$ B-spline basis functions, the design matrix $\mathbf{B}$ is:

$$\mathbf{B}{i,j} = B{j,d}(x_i) \quad \text{for } i = 1, \ldots, n \text{ and } j = 0, \ldots, p-1$$

The spline regression model is then: $$y_i = \sum_{j=0}^{p-1} \beta_j B_{j,d}(x_i) + \epsilon_i = \mathbf{B}_i \boldsymbol{\beta} + \epsilon_i$$

In matrix form: $\mathbf{y} = \mathbf{B} \boldsymbol{\beta} + \boldsymbol{\epsilon}$

This is simply linear regression with the B-spline design matrix! We can apply ordinary least squares, ridge regression, or any other linear model technique.

bspline_design_matrix.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import numpy as np
from scipy.interpolate import BSpline
import matplotlib.pyplot as plt
 
def build_bspline_design(x, interior_knots, degree, include_intercept=False):
    """
    Build the B-spline design matrix using scipy.
    
    Parameters:
    -----------
    x : array of shape (n,)
        Evaluation points
    interior_knots : array of shape (K,)
        Interior knots
    degree : int
        Spline degree (typically 3 for cubic)
    include_intercept : bool
        If True, add column of 1s (standard in statsmodels convention)
    
    Returns:
    --------
    B : array of shape (n, K + degree + 1 + int(include_intercept))
        B-spline design matrix
    """
    # Build extended knot sequence
    t = np.concatenate([
        np.repeat(x.min(), degree + 1),
        interior_knots,
        np.repeat(x.max(), degree + 1)
    ])
    
    # Number of basis functions
    n_basis = len(t) - degree - 1
    
    # Evaluate each basis function at all points
    B = np.zeros((len(x), n_basis))
    for j in range(n_basis):
        # Create coefficient vector: 1 at position j, 0 elsewhere
        c = np.zeros(n_basis)
        c[j] = 1.0
        
        # Create B-spline and evaluate
        spline = BSpline(t, c, degree, extrapolate=False)
        B[:, j] = spline(x)
    
    # Handle NaN at right boundary (open interval convention)
    B = np.nan_to_num(B, nan=0.0)
    
    if include_intercept:
        B = np.column_stack([np.ones(len(x)), B])
    
    return B
 
 
# Example: Generate data and fit B-spline regression
np.random.seed(42)
 
# True function: sinusoidal with noise
n = 100
x = np.sort(np.random.uniform(0, 2*np.pi, n))
y_true = np.sin(x) + 0.5 * np.cos(2*x)
y = y_true + np.random.normal(0, 0.3, n)
 
# Define knots and build design matrix
interior_knots = np.linspace(0.5, 2*np.pi - 0.5, 5)  # 5 interior knots
degree = 3  # Cubic B-splines
 
B = build_bspline_design(x, interior_knots, degree)
print(f"Design matrix shape: {B.shape}")  # (100, 9) = 5 + 3 + 1
 
# Fit via OLS
beta_hat = np.linalg.lstsq(B, y, rcond=None)[0]
y_hat = B @ beta_hat
 
# Check condition number - should be reasonable!
print(f"Condition number: {np.linalg.cond(B):.2f}")
 
# Compare with truncated power (import from earlier)
# You'll find B-spline condition number is MUCH smaller

Numerical Stability Achieved

The condition number of the B-spline design matrix is typically on the order of $10^1$ to $10^2$ (small!), compared to $10^{12}$ or worse for the truncated power basis. This allows reliable coefficient estimation even with many knots. This is the primary computational reason to prefer B-splines.

Using scipy.interpolate for B-splines:

In practice, you should use optimized library implementations rather than the recursive formula. In Python, the scipy.interpolate.BSpline class provides efficient evaluation using a variant of the de Boor algorithm. For design matrix construction specifically, patsy and statsmodels provide the bs() function analogous to R's splines::bs() function:

import patsy

# Create B-spline design with 5 interior knots, degree 3
B = patsy.dmatrix("bs(x, df=8, degree=3, include_intercept=True) - 1", 
                   data={"x": x})

The df parameter specifies the degrees of freedom (= number of basis functions), which equals $K + d + 1$ for open-boundary B-splines.

Complete B-spline Regression Example

Let's work through a complete example of B-spline regression, from data generation to model fitting to interpretation. We'll compare B-spline fits with different numbers of knots to illustrate the flexibility-smoothness tradeoff.

bspline_regression_complete.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import numpy as np
from scipy.interpolate import BSpline, splrep
import matplotlib.pyplot as plt
 
# =====================================================
# STEP 1: Generate Synthetic Data
# =====================================================
np.random.seed(42)
 
# True function: combination of polynomial and periodic components
def true_function(x):
    return 2 * np.sin(1.5 * x) + 0.5 * x - 0.02 * x**2
 
n = 150
x = np.sort(np.random.uniform(0, 10, n))
y_true = true_function(x)
sigma = 0.8
y = y_true + np.random.normal(0, sigma, n)
 
# =====================================================
# STEP 2: Define B-spline Basis Matrix Construction
# =====================================================
def bspline_design_matrix(x, knots, degree):
    """Build design matrix for B-splines with given interior knots."""
    # Extended knot sequence with repeated boundary knots
    t = np.concatenate([
        np.repeat(x.min(), degree + 1),
        knots,
        np.repeat(x.max(), degree + 1)
    ])
    
    n_basis = len(t) - degree - 1
    n_points = len(x)
    
    B = np.zeros((n_points, n_basis))
    for j in range(n_basis):
        c = np.zeros(n_basis)
        c[j] = 1.0
        spl = BSpline(t, c, degree, extrapolate=False)
        B[:, j] = np.nan_to_num(spl(x), nan=0.0)
    
    return B, t
 
# =====================================================
# STEP 3: Fit Models with Different Numbers of Knots
# =====================================================
results = {}
knot_counts = [3, 6, 10, 20]  # Interior knots
degree = 3  # Cubic splines
 
x_grid = np.linspace(x.min(), x.max(), 500)
 
for K in knot_counts:
    # Place knots at quantiles for uniform data density
    knots = np.percentile(x, np.linspace(0, 100, K + 2)[1:-1])
    
    # Build design matrices for data and prediction grid
    B_train, t = bspline_design_matrix(x, knots, degree)
    B_pred, _ = bspline_design_matrix(x_grid, knots, degree)
    
    # Fit via OLS
    beta_hat = np.linalg.lstsq(B_train, y, rcond=None)[0]
    
    # Predictions
    y_hat_train = B_train @ beta_hat
    y_hat_pred = B_pred @ beta_hat
    
    # Compute metrics
    residuals = y - y_hat_train
    rss = np.sum(residuals ** 2)
    tss = np.sum((y - y.mean()) ** 2)
    r_squared = 1 - rss / tss
    
    # Degrees of freedom = number of basis functions
    df = B_train.shape[1]
    
    # Adjusted R² and AIC for model comparison
    n = len(y)
    adj_r_squared = 1 - (1 - r_squared) * (n - 1) / (n - df - 1)
    log_likelihood = -n/2 * np.log(2*np.pi) - n/2 * np.log(rss/n) - n/2
    aic = 2 * df - 2 * log_likelihood
    
    results[K] = {
        'knots': knots,
        'beta': beta_hat,
        'y_pred': y_hat_pred,
        'r_squared': r_squared,
        'adj_r_squared': adj_r_squared,
        'aic': aic,
        'df': df,
        'condition': np.linalg.cond(B_train)
    }
    
    print(f"K={K:2d} knots: df={df:2d}, R²={r_squared:.4f}, "
          f"Adj R²={adj_r_squared:.4f}, AIC={aic:.1f}, "
          f"Cond={results[K]['condition']:.1f}")
 
# =====================================================
# STEP 4: Visualize Results
# =====================================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()
 
for ax, K in zip(axes, knot_counts):
    ax.scatter(x, y, alpha=0.5, s=20, label='Data')
    ax.plot(x_grid, true_function(x_grid), 'g--', lw=2, label='True function')
    ax.plot(x_grid, results[K]['y_pred'], 'r-', lw=2, 
            label=f'B-spline fit (K={K})')
    
    # Mark knot positions
    for xi in results[K]['knots']:
        ax.axvline(xi, color='gray', linestyle=':', alpha=0.5, lw=0.8)
    
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_title(f'{K} Interior Knots: df={results[K]["df"]}, '
                 f'R²={results[K]["r_squared"]:.3f}')
    ax.legend(loc='upper right', fontsize=9)
 
plt.tight_layout()
plt.savefig('bspline_regression_comparison.png', dpi=150)
plt.show()

Interpreting the Results:

Knots	df	R²	Adj R²	Interpretation
3	7	0.89	0.88	Underfit: too rigid, misses curvature
6	10	0.92	0.91	Good balance: captures signal
10	14	0.93	0.92	Slightly more flexible
20	24	0.94	0.92	Overfit risk: wiggly, Adj R² stagnates

Note that $R^2$ always increases with more knots (more parameters = lower training error), but Adjusted $R^2$ penalizes complexity. Beyond a certain point, adding knots hurts generalization.

The condition number remains small (under 100) even with 20 knots—B-splines maintain numerical stability where the truncated power basis would have failed catastrophically.

Knot Placement Strategy

In this example, we placed knots at quantiles of the x values. This is a common default that ensures roughly equal numbers of observations between knots. Alternative strategies include equally-spaced knots (better for visualization) or adaptive knot placement based on residual structure (more advanced). We'll cover knot selection systematically in Page 4.

Efficient Evaluation: de Boor's Algorithm

The naive recursive evaluation of B-splines via Cox-de Boor is inefficient—it recomputes the same lower-degree B-splines multiple times. de Boor's algorithm provides a much faster approach for evaluating a B-spline curve $s(x) = \sum_{j} \beta_j B_{j,d}(x)$ at a given point $x$.

The key insight is that at any point $x$, at most $d+1$ B-splines are non-zero (due to local support). De Boor's algorithm exploits this by building up the value iteratively using only the non-zero B-splines.

Algorithm Sketch:

Find the knot interval containing $x$: find $r$ such that $t_r \leq x < t_{r+1}$
Initialize: $d_j^{[0]} = \beta_{r-d+j}$ for $j = 0, \ldots, d$ (the $d+1$ relevant coefficients)
Iterate for $k = 1, \ldots, d$: $$d_j^{[k]} = (1 - \alpha_{j,k}) d_j^{[k-1]} + \alpha_{j,k} d_{j+1}^{[k-1]}$$ where $\alpha_{j,k} = \frac{x - t_{r-d+k+j}}{t_{r+1+j} - t_{r-d+k+j}}$
Result: $s(x) = d_0^{[d]}$

This computes $s(x)$ in $O(d^2)$ operations, compared to $O(2^d)$ for naive recursion.

de_boor_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
 
def de_boor(x, t, c, degree):
    """
    Evaluate B-spline curve using de Boor's algorithm.
    
    Parameters:
    -----------
    x : float
        Evaluation point
    t : array
        Knot sequence (extended)
    c : array
        Control points / coefficients
    degree : int
        Spline degree
    
    Returns:
    --------
    float : Value of spline at x
    """
    d = degree
    
    # Find the knot interval [t_r, t_{r+1}) containing x
    # The point x should be in [t[d], t[len(t)-d-1])
    r = np.searchsorted(t, x, side='right') - 1
    
    # Clamp to valid range
    r = max(d, min(r, len(t) - d - 2))
    
    # Initialize: extract relevant coefficients
    # d_j = c[r-d+j] for j = 0, ..., d
    d_arr = np.array([c[r - d + j] for j in range(d + 1)], dtype=float)
    
    # Iterate: perform d rounds of convex combinations
    for k in range(1, d + 1):
        for j in range(d, k - 1, -1):  # Right to left to allow in-place update
            # alpha = (x - t[r-d+k+j]) / (t[r+1+j] - t[r-d+k+j])
            # But j is 0-indexed, so adjust:
            left_idx = r - d + j + k
            right_idx = r + 1 + j
            
            if t[right_idx] == t[left_idx]:
                alpha = 0.0
            else:
                alpha = (x - t[left_idx]) / (t[right_idx] - t[left_idx])
            
            d_arr[j] = (1 - alpha) * d_arr[j - 1] + alpha * d_arr[j]
    
    return d_arr[d]
 
 
def evaluate_bspline_curve(x_vals, t, c, degree):
    """Vectorized evaluation of B-spline curve at multiple points."""
    return np.array([de_boor(x, t, c, degree) for x in x_vals])
 
 
# Example: Verify de Boor matches scipy.BSpline
from scipy.interpolate import BSpline
 
# Set up a cubic spline with known coefficients
t = np.array([0, 0, 0, 0, 0.3, 0.5, 0.7, 1, 1, 1, 1], dtype=float)
c = np.array([1.0, 2.0, 1.5, 3.0, 2.5, 1.0, 2.0])
degree = 3
 
x_test = np.linspace(0, 1, 20)
 
# scipy reference
spl = BSpline(t, c, degree)
y_scipy = spl(x_test)
 
# Our implementation
y_deboor = evaluate_bspline_curve(x_test, t, c, degree)
 
# Check agreement
print("Max difference from scipy:", np.max(np.abs(y_scipy - y_deboor)))

Practical Advice

Unless you're implementing B-splines for a specialized system (embedded device, GPU kernel, etc.), use library implementations like scipy.interpolate.BSpline. These are highly optimized, handle edge cases correctly, and have been tested extensively. Understanding de Boor's algorithm is valuable for insight, but reinventing it is rarely necessary in ML practice.

Connection to Other Spline Representations

B-splines are one of several ways to represent spline functions. Understanding the relationships between representations clarifies when each is appropriate.

Comparison of Spline Representations
Representation	Basis Functions	Pros	Cons
Truncated Power	$(1, x, \ldots, x^d, (x-\xi_j)_+^d)$	Intuitive, easy derivation	Numerically unstable
B-spline	Cox-de Boor basis	Numerically stable, local support	Complex construction
Natural Spline	Constrained B-spline	Reduces df by 4 for cubics	Boundary assumptions
Cardinal Spline	Interpolating basis	Direct knot value control	Less smooth at boundaries
Hermite	Value + derivative basis	Explicit derivative control	More parameters needed

Key Relationships:

Truncated Power ↔ B-spline: Same spline space, different basis. Any truncated power spline can be written as a B-spline and vice versa—just with different coefficients. B-splines are preferred computationally.
B-spline → Natural Spline: Natural cubic splines are B-splines with additional constraints (zero second derivative at endpoints). We'll cover these in detail on Page 2.
B-spline Derivatives: The derivative of a degree-$d$ B-spline is a degree-$(d-1)$ B-spline: $$\frac{d}{dx} B_{j,d}(x) = d \left( \frac{B_{j,d-1}(x)}{t_{j+d} - t_j} - \frac{B_{j+1,d-1}(x)}{t_{j+d+1} - t_{j+1}} \right)$$

This is important for penalized splines (Page 5).

B-spline Integrals: Integration of B-splines leads to higher-degree B-splines. This is useful for computing penalty matrices.

The Name 'Basis Spline'

The 'B' in B-spline stands for 'Basis.' Carl de Boor, who developed much of modern B-spline theory, chose this name because B-splines form a particularly convenient basis for the space of spline functions. The name does not stand for 'Bell' (though they are bell-shaped) or any other word.

Summary: B-splines as the Foundation

We've covered substantial ground in establishing B-splines as the foundation for spline regression. Let's consolidate the key ideas:

Key Takeaways

•Splines are piecewise polynomials joined at knots with smoothness constraints. A degree-$d$ spline is $C^{d-1}$ continuous at simple knots.
•The spline space has dimension $K + d + 1$, where $K$ is the number of interior knots and $d$ is the degree.
•B-splines are the numerically stable choice for representing splines. The truncated power basis, though intuitive, suffers from catastrophic ill-conditioning.
•B-splines are defined recursively via the Cox-de Boor formula, starting from indicator functions (degree 0) and building up through weighted averages.
•Local support is the key property: Each B-spline is non-zero only over $d+1$ knot intervals. This enables sparse design matrices and efficient computation.
•Partition of unity ensures interpretability: B-splines sum to 1 everywhere, making coefficients act as local weights.
•de Boor's algorithm enables efficient $O(d^2)$ evaluation of B-spline curves at any point.
•Spline regression is just linear regression with a B-spline design matrix. All standard techniques (OLS, ridge, inference) apply directly.

What's Next:

With B-splines as our foundation, we can now explore more specialized spline constructions:

Page 2: Natural Splines — Constrained cubic splines that reduce degrees of freedom and improve behavior at boundaries
Page 3: Smoothing Splines — An optimization-based approach that avoids explicit knot selection entirely
Page 4: Knot Selection — Strategies for choosing the number and placement of knots
Page 5: Penalized Splines — Combining dense knot placement with penalty terms for the ultimate in flexibility

Page Complete

You now understand B-splines: their mathematical definition via the Cox-de Boor recurrence, their remarkable properties (local support, partition of unity, numerical stability), and their application to regression modeling. This foundation underlies all modern spline-based methods in machine learning and statistics.

1 / 5

Loading learning content...

Machine LearningSpline Regression

Spline Regression Methods

LevelIntermediate

Duration90 mins

TopicSpline Regression

1 / 5

Basis Splines (B-splines): The Foundation of Modern Spline Regression

Beyond Polynomial Regression: The Need for Splines

This global nature creates two severe problems:

Runge's phenomenon: High-degree polynomials oscillate wildly near the boundaries, producing poor fits despite low training error.
Numerical instability: The Vandermonde matrix $(x_i^j)$ becomes extremely ill-conditioned as degree increases, causing coefficient estimates to explode.

What You Will Learn

What is a Spline?

Before diving into B-splines specifically, we need to understand what a spline is in general.

Definition (Spline Function):

A spline of degree $d$ on the interval $[a, b]$ with knots $a = \xi_0 < \xi_1 < \xi_2 < \cdots < \xi_K < \xi_{K+1} = b$ is a function $s: [a, b] \to \mathbb{R}$ such that:

On each subinterval $[\xi_j, \xi_{j+1})$, the function $s$ is a polynomial of degree at most $d$.
The function $s$ has continuous derivatives up to order $d-1$ at each interior knot $\xi_1, \ldots, \xi_K$.

Historical Note: The Physical Spline

Degrees of Freedom of a Spline Space:

How many parameters are needed to specify a spline of degree $d$ with $K$ interior knots?

$$\dim(\text{Spline Space}) = (K+1)(d+1) - Kd = K + d + 1$$

Spline Space Dimensions for Common Cases
Degree $d$	Name	Continuity	Interior Knots $K$	Dimension
0	Step function	$C^{-1}$ (discontinuous)	$K$	$K + 1$
1	Linear spline	$C^0$ (continuous)	$K$	$K + 2$
2	Quadratic spline	$C^1$ (smooth)	$K$	$K + 3$
3	Cubic spline	$C^2$ (very smooth)	$K$	$K + 4$
$d$	Degree-$d$ spline	$C^{d-1}$	$K$	$K + d + 1$

The Basis Function Approach:

Since the space of splines of degree $d$ with $K$ knots has dimension $K + d + 1$, we can represent any such spline as a linear combination of $K + d + 1$ basis functions:

$$s(x) = \sum_{j=1}^{K+d+1} \beta_j B_j(x)$$

The question is: what basis should we use? The most natural-seeming basis—truncated power functions—turns out to be poorly suited for computation. Enter B-splines.

The Truncated Power Basis (And Why We Avoid It)

The most straightforward basis for the space of degree-$d$ splines with knots $\xi_1, \ldots, \xi_K$ is the truncated power basis:

$${1, x, x^2, \ldots, x^d, (x - \xi_1)+^d, (x - \xi_2)+^d, \ldots, (x - \xi_K)_+^d}$$

where the truncated power function is defined as:

$$(x - \xi)_+^d = \begin{cases} (x - \xi)^d & \text{if } x > \xi \ 0 & \text{if } x \leq \xi \end{cases}$$

This basis is intuitive: the first $d+1$ functions are the standard polynomial basis, and each additional function 'kicks in' at its corresponding knot to allow a new polynomial piece.

truncated_power_basis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
import matplotlib.pyplot as plt
 
def truncated_power(x, xi, d):
    """Compute (x - xi)_+^d"""
    return np.maximum(0, x - xi) ** d
 
def build_truncated_power_design(x, knots, degree):
    """
    Build design matrix for truncated power basis.
    
    For degree d with K knots: columns are
    [1, x, x^2, ..., x^d, (x-xi_1)_+^d, ..., (x-xi_K)_+^d]
    
    Total columns: d + 1 + K
    """
    n = len(x)
    K = len(knots)
    
    # Polynomial part: 1, x, x^2, ..., x^d
    X_poly = np.column_stack([x ** j for j in range(degree + 1)])
    
    # Truncated power part: (x - xi_j)_+^d for each knot
    X_trunc = np.column_stack([truncated_power(x, xi, degree) for xi in knots])
    
    return np.hstack([X_poly, X_trunc])
 
# Example: Cubic splines with knots at 0.25, 0.5, 0.75
x = np.linspace(0, 1, 200)
knots = [0.25, 0.5, 0.75]
degree = 3
 
X = build_truncated_power_design(x, knots, degree)
print(f"Design matrix shape: {X.shape}")  # (200, 7) = 3+1 + 3
 
# Condition number reveals the problem!
print(f"Condition number: {np.linalg.cond(X):.2e}")

Numerical Disaster: Ill-Conditioning

Why the Ill-Conditioning?

The Solution: B-Splines

B-spline Definition: The Cox-de Boor Recurrence

Extended Knot Sequence:

To construct B-splines of degree $d$ with $K$ interior knots $\xi_1 < \cdots < \xi_K$ on $[a, b]$, we first define an extended knot sequence by adding $d+1$ copies of the boundary values:

This extended sequence has $2d + K + 2$ knots in total. The repeated boundary knots ensure that the B-splines interpolate exactly at the endpoints.

Why Repeat Boundary Knots?

The Cox-de Boor Recurrence Formula:

B-splines are defined recursively. For notational convenience, let $B_{j,k}(x)$ denote the $j$-th B-spline of degree $k$ (so $k$ goes from 0 to $d$).

Base case (degree 0): $$B_{j,0}(x) = \begin{cases} 1 & \text{if } t_j \leq x < t_{j+1} \ 0 & \text{otherwise} \end{cases}$$

This is simply the indicator function for the interval $[t_j, t_{j+1})$—a rectangular pulse.

Recursive case (degree $k \geq 1$): $$B_{j,k}(x) = \frac{x - t_j}{t_{j+k} - t_j} B_{j,k-1}(x) + \frac{t_{j+k+1} - x}{t_{j+k+1} - t_{j+1}} B_{j+1,k-1}(x)$$

Here, we adopt the convention that $0/0 = 0$ whenever a denominator vanishes (which happens when consecutive knots coincide).

This recurrence is sometimes also written as: $$B_{j,k}(x) = \omega_{j,k}(x) \cdot B_{j,k-1}(x) + \big(1 - \omega_{j+1,k}(x)\big) \cdot B_{j+1,k-1}(x)$$

where $\omega_{j,k}(x) = \frac{x - t_j}{t_{j+k} - t_j}$ is a local linear interpolation weight.

cox_de_boor_recursion.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import numpy as np
 
def cox_de_boor(x, j, k, t):
    """
    Compute B_{j,k}(x) using the Cox-de Boor recurrence.
    
    Parameters:
    -----------
    x : float
        Evaluation point
    j : int
        Index of B-spline (0-indexed)
    k : int
        Degree of B-spline
    t : array
        Extended knot sequence
    
    Returns:
    --------
    float : Value of B_{j,k}(x)
    """
    # Base case: degree 0 (indicator function)
    if k == 0:
        if t[j] <= x < t[j + 1]:
            return 1.0
        else:
            return 0.0
    
    # Recursive case: degree k
    # Handle 0/0 convention: if denominator is 0, that term is 0
    
    # First term: (x - t_j) / (t_{j+k} - t_j) * B_{j,k-1}(x)
    denom1 = t[j + k] - t[j]
    if denom1 == 0:
        term1 = 0.0
    else:
        term1 = ((x - t[j]) / denom1) * cox_de_boor(x, j, k - 1, t)
    
    # Second term: (t_{j+k+1} - x) / (t_{j+k+1} - t_{j+1}) * B_{j+1,k-1}(x)
    denom2 = t[j + k + 1] - t[j + 1]
    if denom2 == 0:
        term2 = 0.0
    else:
        term2 = ((t[j + k + 1] - x) / denom2) * cox_de_boor(x, j + 1, k - 1, t)
    
    return term1 + term2
 
 
def build_extended_knots(interior_knots, a, b, degree):
    """
    Build extended knot sequence with repeated boundary knots.
    
    Parameters:
    -----------
    interior_knots : array
        Interior knots xi_1, ..., xi_K
    a, b : float
        Domain boundaries
    degree : int
        Spline degree
    
    Returns:
    --------
    array : Extended knot sequence
    """
    left_boundary = [a] * (degree + 1)
    right_boundary = [b] * (degree + 1)
    return np.array(left_boundary + list(interior_knots) + right_boundary)
 
 
# Example: Cubic B-splines (degree 3) with 3 interior knots
interior_knots = [0.25, 0.5, 0.75]
a, b = 0.0, 1.0
degree = 3
 
t = build_extended_knots(interior_knots, a, b, degree)
print(f"Extended knot sequence ({len(t)} knots):")
print(t)
 
# Number of B-splines: n = len(t) - degree - 1 = K + degree + 1
n_basis = len(t) - degree - 1
print(f"Number of B-spline basis functions: {n_basis}")

Understanding the Recurrence Geometrically:

The Cox-de Boor formula has a beautiful geometric interpretation:

Start with degree-0 B-splines: rectangular pulses on each knot interval
At each level, each B-spline is a weighted average of two degree-$(k-1)$ B-splines
The weights are linear functions that transition smoothly across the knot span
Each recursion step adds a degree of smoothness

By the time we reach degree $d$, each B-spline $B_{j,d}(x)$ is:

Non-zero only on the interval $[t_j, t_{j+d+1}]$ (local support)
A piecewise polynomial of degree $d$
$C^{d-1}$ continuous everywhere (smooth)
Non-negative everywhere
Bell-shaped (single maximum)

Fundamental Properties of B-splines

Essential B-spline Properties

•Local Support: Each B-spline $B_{j,d}(x)$ is non-zero only on the interval $[t_j, t_{j+d+1}]$, spanning exactly $d+1$ knot intervals. This means changes to one coefficient affect only a local portion of the curve.
•Non-negativity: $B_{j,d}(x) \geq 0$ for all $x$. This is immediate from the recurrence: the base case is non-negative, and the recurrence preserves non-negativity.
•Partition of Unity: $\sum_{j=0}^{n-1} B_{j,d}(x) = 1$ for all $x \in [a, b]$, where $n = K + d + 1$ is the number of B-splines. This ensures that the coefficients can be interpreted as weights.
•Convex Hull Property: If $s(x) = \sum_j \beta_j B_{j,d}(x)$ and $L \leq \beta_j \leq U$ for all $j$, then $L \leq s(x) \leq U$ for all $x$. The spline curve lies within the convex hull of its control coefficients.
•Smoothness: Degree-$d$ B-splines are $C^{d-1}$ continuous at simple knots. At a knot of multiplicity $m$, continuity is reduced to $C^{d-m}$. This gives fine control over smoothness.
•Linear Independence: The B-splines ${B_{j,d}}_{j=0}^{n-1}$ are linearly independent, forming a basis for the spline space.

Proof of Partition of Unity:

The partition of unity property is crucial and follows from induction on degree. For degree 0, the B-splines are indicator functions of disjoint intervals, so exactly one equals 1 at any point.

For degree $k$, suppose the property holds for degree $k-1$. Then: $$\sum_j B_{j,k}(x) = \sum_j \left[ \omega_{j,k}(x) B_{j,k-1}(x) + (1-\omega_{j+1,k}(x)) B_{j+1,k-1}(x) \right]$$

Re-indexing the second sum and using $\omega_{j,k}(x) + (1-\omega_{j,k}(x)) = 1$: $$= \sum_j B_{j,k-1}(x) \cdot \left[ \omega_{j,k}(x) + (1-\omega_{j,k}(x)) \right] = \sum_j B_{j,k-1}(x) = 1$$

by the induction hypothesis. The repeated boundary knots ensure this works at the domain boundaries.

Why These Properties Matter for ML

B-spline Properties by Degree
Degree $d$	Continuity at Simple Knots	Support Width	Shape Characteristics
0	Discontinuous	1 interval	Rectangular pulse
1	$C^0$ (continuous)	2 intervals	Triangular (tent function)
2	$C^1$ (smooth)	3 intervals	Quadratic bump
3	$C^2$ (very smooth)	4 intervals	Bell-shaped curve
$d$	$C^{d-1}$	$d+1$ intervals	Smooth, bell-shaped

Constructing the B-spline Design Matrix

$$\mathbf{B}{i,j} = B{j,d}(x_i) \quad \text{for } i = 1, \ldots, n \text{ and } j = 0, \ldots, p-1$$

The spline regression model is then: $$y_i = \sum_{j=0}^{p-1} \beta_j B_{j,d}(x_i) + \epsilon_i = \mathbf{B}_i \boldsymbol{\beta} + \epsilon_i$$

In matrix form: $\mathbf{y} = \mathbf{B} \boldsymbol{\beta} + \boldsymbol{\epsilon}$

This is simply linear regression with the B-spline design matrix! We can apply ordinary least squares, ridge regression, or any other linear model technique.

bspline_design_matrix.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import numpy as np
from scipy.interpolate import BSpline
import matplotlib.pyplot as plt
 
def build_bspline_design(x, interior_knots, degree, include_intercept=False):
    """
    Build the B-spline design matrix using scipy.
    
    Parameters:
    -----------
    x : array of shape (n,)
        Evaluation points
    interior_knots : array of shape (K,)
        Interior knots
    degree : int
        Spline degree (typically 3 for cubic)
    include_intercept : bool
        If True, add column of 1s (standard in statsmodels convention)
    
    Returns:
    --------
    B : array of shape (n, K + degree + 1 + int(include_intercept))
        B-spline design matrix
    """
    # Build extended knot sequence
    t = np.concatenate([
        np.repeat(x.min(), degree + 1),
        interior_knots,
        np.repeat(x.max(), degree + 1)
    ])
    
    # Number of basis functions
    n_basis = len(t) - degree - 1
    
    # Evaluate each basis function at all points
    B = np.zeros((len(x), n_basis))
    for j in range(n_basis):
        # Create coefficient vector: 1 at position j, 0 elsewhere
        c = np.zeros(n_basis)
        c[j] = 1.0
        
        # Create B-spline and evaluate
        spline = BSpline(t, c, degree, extrapolate=False)
        B[:, j] = spline(x)
    
    # Handle NaN at right boundary (open interval convention)
    B = np.nan_to_num(B, nan=0.0)
    
    if include_intercept:
        B = np.column_stack([np.ones(len(x)), B])
    
    return B
 
 
# Example: Generate data and fit B-spline regression
np.random.seed(42)
 
# True function: sinusoidal with noise
n = 100
x = np.sort(np.random.uniform(0, 2*np.pi, n))
y_true = np.sin(x) + 0.5 * np.cos(2*x)
y = y_true + np.random.normal(0, 0.3, n)
 
# Define knots and build design matrix
interior_knots = np.linspace(0.5, 2*np.pi - 0.5, 5)  # 5 interior knots
degree = 3  # Cubic B-splines
 
B = build_bspline_design(x, interior_knots, degree)
print(f"Design matrix shape: {B.shape}")  # (100, 9) = 5 + 3 + 1
 
# Fit via OLS
beta_hat = np.linalg.lstsq(B, y, rcond=None)[0]
y_hat = B @ beta_hat
 
# Check condition number - should be reasonable!
print(f"Condition number: {np.linalg.cond(B):.2f}")
 
# Compare with truncated power (import from earlier)
# You'll find B-spline condition number is MUCH smaller

Numerical Stability Achieved

Using scipy.interpolate for B-splines:

import patsy

# Create B-spline design with 5 interior knots, degree 3
B = patsy.dmatrix("bs(x, df=8, degree=3, include_intercept=True) - 1", 
                   data={"x": x})

The df parameter specifies the degrees of freedom (= number of basis functions), which equals $K + d + 1$ for open-boundary B-splines.

Complete B-spline Regression Example

bspline_regression_complete.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import numpy as np
from scipy.interpolate import BSpline, splrep
import matplotlib.pyplot as plt
 
# =====================================================
# STEP 1: Generate Synthetic Data
# =====================================================
np.random.seed(42)
 
# True function: combination of polynomial and periodic components
def true_function(x):
    return 2 * np.sin(1.5 * x) + 0.5 * x - 0.02 * x**2
 
n = 150
x = np.sort(np.random.uniform(0, 10, n))
y_true = true_function(x)
sigma = 0.8
y = y_true + np.random.normal(0, sigma, n)
 
# =====================================================
# STEP 2: Define B-spline Basis Matrix Construction
# =====================================================
def bspline_design_matrix(x, knots, degree):
    """Build design matrix for B-splines with given interior knots."""
    # Extended knot sequence with repeated boundary knots
    t = np.concatenate([
        np.repeat(x.min(), degree + 1),
        knots,
        np.repeat(x.max(), degree + 1)
    ])
    
    n_basis = len(t) - degree - 1
    n_points = len(x)
    
    B = np.zeros((n_points, n_basis))
    for j in range(n_basis):
        c = np.zeros(n_basis)
        c[j] = 1.0
        spl = BSpline(t, c, degree, extrapolate=False)
        B[:, j] = np.nan_to_num(spl(x), nan=0.0)
    
    return B, t
 
# =====================================================
# STEP 3: Fit Models with Different Numbers of Knots
# =====================================================
results = {}
knot_counts = [3, 6, 10, 20]  # Interior knots
degree = 3  # Cubic splines
 
x_grid = np.linspace(x.min(), x.max(), 500)
 
for K in knot_counts:
    # Place knots at quantiles for uniform data density
    knots = np.percentile(x, np.linspace(0, 100, K + 2)[1:-1])
    
    # Build design matrices for data and prediction grid
    B_train, t = bspline_design_matrix(x, knots, degree)
    B_pred, _ = bspline_design_matrix(x_grid, knots, degree)
    
    # Fit via OLS
    beta_hat = np.linalg.lstsq(B_train, y, rcond=None)[0]
    
    # Predictions
    y_hat_train = B_train @ beta_hat
    y_hat_pred = B_pred @ beta_hat
    
    # Compute metrics
    residuals = y - y_hat_train
    rss = np.sum(residuals ** 2)
    tss = np.sum((y - y.mean()) ** 2)
    r_squared = 1 - rss / tss
    
    # Degrees of freedom = number of basis functions
    df = B_train.shape[1]
    
    # Adjusted R² and AIC for model comparison
    n = len(y)
    adj_r_squared = 1 - (1 - r_squared) * (n - 1) / (n - df - 1)
    log_likelihood = -n/2 * np.log(2*np.pi) - n/2 * np.log(rss/n) - n/2
    aic = 2 * df - 2 * log_likelihood
    
    results[K] = {
        'knots': knots,
        'beta': beta_hat,
        'y_pred': y_hat_pred,
        'r_squared': r_squared,
        'adj_r_squared': adj_r_squared,
        'aic': aic,
        'df': df,
        'condition': np.linalg.cond(B_train)
    }
    
    print(f"K={K:2d} knots: df={df:2d}, R²={r_squared:.4f}, "
          f"Adj R²={adj_r_squared:.4f}, AIC={aic:.1f}, "
          f"Cond={results[K]['condition']:.1f}")
 
# =====================================================
# STEP 4: Visualize Results
# =====================================================
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()
 
for ax, K in zip(axes, knot_counts):
    ax.scatter(x, y, alpha=0.5, s=20, label='Data')
    ax.plot(x_grid, true_function(x_grid), 'g--', lw=2, label='True function')
    ax.plot(x_grid, results[K]['y_pred'], 'r-', lw=2, 
            label=f'B-spline fit (K={K})')
    
    # Mark knot positions
    for xi in results[K]['knots']:
        ax.axvline(xi, color='gray', linestyle=':', alpha=0.5, lw=0.8)
    
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_title(f'{K} Interior Knots: df={results[K]["df"]}, '
                 f'R²={results[K]["r_squared"]:.3f}')
    ax.legend(loc='upper right', fontsize=9)
 
plt.tight_layout()
plt.savefig('bspline_regression_comparison.png', dpi=150)
plt.show()

Interpreting the Results:

Knots	df	R²	Adj R²	Interpretation
3	7	0.89	0.88	Underfit: too rigid, misses curvature
6	10	0.92	0.91	Good balance: captures signal
10	14	0.93	0.92	Slightly more flexible
20	24	0.94	0.92	Overfit risk: wiggly, Adj R² stagnates

Note that $R^2$ always increases with more knots (more parameters = lower training error), but Adjusted $R^2$ penalizes complexity. Beyond a certain point, adding knots hurts generalization.

The condition number remains small (under 100) even with 20 knots—B-splines maintain numerical stability where the truncated power basis would have failed catastrophically.

Knot Placement Strategy

Efficient Evaluation: de Boor's Algorithm

Algorithm Sketch:

Find the knot interval containing $x$: find $r$ such that $t_r \leq x < t_{r+1}$
Initialize: $d_j^{[0]} = \beta_{r-d+j}$ for $j = 0, \ldots, d$ (the $d+1$ relevant coefficients)
Iterate for $k = 1, \ldots, d$: $$d_j^{[k]} = (1 - \alpha_{j,k}) d_j^{[k-1]} + \alpha_{j,k} d_{j+1}^{[k-1]}$$ where $\alpha_{j,k} = \frac{x - t_{r-d+k+j}}{t_{r+1+j} - t_{r-d+k+j}}$
Result: $s(x) = d_0^{[d]}$

This computes $s(x)$ in $O(d^2)$ operations, compared to $O(2^d)$ for naive recursion.

de_boor_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
 
def de_boor(x, t, c, degree):
    """
    Evaluate B-spline curve using de Boor's algorithm.
    
    Parameters:
    -----------
    x : float
        Evaluation point
    t : array
        Knot sequence (extended)
    c : array
        Control points / coefficients
    degree : int
        Spline degree
    
    Returns:
    --------
    float : Value of spline at x
    """
    d = degree
    
    # Find the knot interval [t_r, t_{r+1}) containing x
    # The point x should be in [t[d], t[len(t)-d-1])
    r = np.searchsorted(t, x, side='right') - 1
    
    # Clamp to valid range
    r = max(d, min(r, len(t) - d - 2))
    
    # Initialize: extract relevant coefficients
    # d_j = c[r-d+j] for j = 0, ..., d
    d_arr = np.array([c[r - d + j] for j in range(d + 1)], dtype=float)
    
    # Iterate: perform d rounds of convex combinations
    for k in range(1, d + 1):
        for j in range(d, k - 1, -1):  # Right to left to allow in-place update
            # alpha = (x - t[r-d+k+j]) / (t[r+1+j] - t[r-d+k+j])
            # But j is 0-indexed, so adjust:
            left_idx = r - d + j + k
            right_idx = r + 1 + j
            
            if t[right_idx] == t[left_idx]:
                alpha = 0.0
            else:
                alpha = (x - t[left_idx]) / (t[right_idx] - t[left_idx])
            
            d_arr[j] = (1 - alpha) * d_arr[j - 1] + alpha * d_arr[j]
    
    return d_arr[d]
 
 
def evaluate_bspline_curve(x_vals, t, c, degree):
    """Vectorized evaluation of B-spline curve at multiple points."""
    return np.array([de_boor(x, t, c, degree) for x in x_vals])
 
 
# Example: Verify de Boor matches scipy.BSpline
from scipy.interpolate import BSpline
 
# Set up a cubic spline with known coefficients
t = np.array([0, 0, 0, 0, 0.3, 0.5, 0.7, 1, 1, 1, 1], dtype=float)
c = np.array([1.0, 2.0, 1.5, 3.0, 2.5, 1.0, 2.0])
degree = 3
 
x_test = np.linspace(0, 1, 20)
 
# scipy reference
spl = BSpline(t, c, degree)
y_scipy = spl(x_test)
 
# Our implementation
y_deboor = evaluate_bspline_curve(x_test, t, c, degree)
 
# Check agreement
print("Max difference from scipy:", np.max(np.abs(y_scipy - y_deboor)))

Practical Advice

Connection to Other Spline Representations

B-splines are one of several ways to represent spline functions. Understanding the relationships between representations clarifies when each is appropriate.

Comparison of Spline Representations
Representation	Basis Functions	Pros	Cons
Truncated Power	$(1, x, \ldots, x^d, (x-\xi_j)_+^d)$	Intuitive, easy derivation	Numerically unstable
B-spline	Cox-de Boor basis	Numerically stable, local support	Complex construction
Natural Spline	Constrained B-spline	Reduces df by 4 for cubics	Boundary assumptions
Cardinal Spline	Interpolating basis	Direct knot value control	Less smooth at boundaries
Hermite	Value + derivative basis	Explicit derivative control	More parameters needed

Key Relationships:

Truncated Power ↔ B-spline: Same spline space, different basis. Any truncated power spline can be written as a B-spline and vice versa—just with different coefficients. B-splines are preferred computationally.
B-spline → Natural Spline: Natural cubic splines are B-splines with additional constraints (zero second derivative at endpoints). We'll cover these in detail on Page 2.
B-spline Derivatives: The derivative of a degree-$d$ B-spline is a degree-$(d-1)$ B-spline: $$\frac{d}{dx} B_{j,d}(x) = d \left( \frac{B_{j,d-1}(x)}{t_{j+d} - t_j} - \frac{B_{j+1,d-1}(x)}{t_{j+d+1} - t_{j+1}} \right)$$

This is important for penalized splines (Page 5).

B-spline Integrals: Integration of B-splines leads to higher-degree B-splines. This is useful for computing penalty matrices.

The Name 'Basis Spline'

Summary: B-splines as the Foundation

We've covered substantial ground in establishing B-splines as the foundation for spline regression. Let's consolidate the key ideas:

Key Takeaways

•Splines are piecewise polynomials joined at knots with smoothness constraints. A degree-$d$ spline is $C^{d-1}$ continuous at simple knots.
•The spline space has dimension $K + d + 1$, where $K$ is the number of interior knots and $d$ is the degree.
•B-splines are the numerically stable choice for representing splines. The truncated power basis, though intuitive, suffers from catastrophic ill-conditioning.
•B-splines are defined recursively via the Cox-de Boor formula, starting from indicator functions (degree 0) and building up through weighted averages.
•Local support is the key property: Each B-spline is non-zero only over $d+1$ knot intervals. This enables sparse design matrices and efficient computation.
•Partition of unity ensures interpretability: B-splines sum to 1 everywhere, making coefficients act as local weights.
•de Boor's algorithm enables efficient $O(d^2)$ evaluation of B-spline curves at any point.
•Spline regression is just linear regression with a B-spline design matrix. All standard techniques (OLS, ridge, inference) apply directly.

What's Next:

With B-splines as our foundation, we can now explore more specialized spline constructions:

Page 2: Natural Splines — Constrained cubic splines that reduce degrees of freedom and improve behavior at boundaries
Page 3: Smoothing Splines — An optimization-based approach that avoids explicit knot selection entirely
Page 4: Knot Selection — Strategies for choosing the number and placement of knots
Page 5: Penalized Splines — Combining dense knot placement with penalty terms for the ultimate in flexibility

Page Complete

1 / 5