Polynomial Regression - Learning Module

Loading content...

0/278

Orthogonal Polynomials

A Better Polynomial Basis

Throughout this module, we've used the monomial basis $\{1, x, x^2, \ldots, x^d\}$ for polynomial regression. This is the most intuitive choice—but it's often the worst choice numerically.

The problem is that high powers of $x$ become increasingly correlated:

$x^4$ and $x^5$ are nearly collinear for most data distributions
This collinearity creates an ill-conditioned design matrix
Small changes in data produce large changes in coefficients
Numerical errors accumulate catastrophically

Orthogonal polynomials solve these problems by construction. They're designed to be uncorrelated (orthogonal) with respect to a specific inner product, leading to perfectly conditioned matrices and numerically stable computations.

What You Will Learn

By the end of this page, you will understand: (1) The mathematical concept of orthogonality for polynomials; (2) The three-term recurrence relation for generating orthogonal polynomials; (3) Properties and applications of Legendre, Chebyshev, and Hermite polynomials; (4) How to use orthogonal polynomials for numerically stable regression; (5) The connection between orthogonal polynomials and decorrelated regression.

The Concept of Orthogonality for Functions

Inner Products for Functions:

Just as vectors can be orthogonal ($\mathbf{u} \cdot \mathbf{v} = 0$), functions can be orthogonal with respect to an inner product. For functions $f(x)$ and $g(x)$ on interval $[a, b]$ with weight function $w(x) \geq 0$:

$$\langle f, g \rangle = \int_a^b f(x) g(x) w(x) , dx$$

Functions are orthogonal if $\langle f, g \rangle = 0$.

For Polynomials:

A sequence of polynomials $\{P_0(x), P_1(x), P_2(x), \ldots\}$ is orthogonal if: $$\langle P_m, P_n \rangle = \int_a^b P_m(x) P_n(x) w(x) , dx = 0 \quad \text{for } m eq n$$

Different choices of interval $[a, b]$ and weight $w(x)$ give different orthogonal polynomial families.

Classical Orthogonal Polynomial Families
Family	Interval	Weight $w(x)$	Primary Use
Legendre $P_n(x)$	$[-1, 1]$	$1$	General-purpose, uniform data
Chebyshev $T_n(x)$	$[-1, 1]$	$1/\sqrt{1-x^2}$	Optimal approximation, minimizing max error
Chebyshev $U_n(x)$	$[-1, 1]$	$\sqrt{1-x^2}$	Second kind, integration
Hermite $H_n(x)$	$(-\infty, \infty)$	$e^{-x^2}$	Gaussian-distributed data
Laguerre $L_n(x)$	$[0, \infty)$	$e^{-x}$	Exponentially decaying data

Why Orthogonality Matters for Regression:

When we expand a function in an orthogonal basis, the coefficients are uncorrelated. If $f(x) = \sum_{k=0}^{d} c_k P_k(x)$, then:

$$c_k = \frac{\langle f, P_k \rangle}{\langle P_k, P_k \rangle}$$

Each coefficient can be computed independently! In matrix terms, if $\mathbf{\Phi}$ contains orthogonal polynomials evaluated at data points, then $\mathbf{\Phi}^T \mathbf{\Phi}$ is (approximately) diagonal—perfectly conditioned.

The Key Advantage

With orthogonal polynomials, adding a new term (higher degree) doesn't change the coefficients of existing terms! This is because the new term is orthogonal to all previous terms. With monomials, adding $x^5$ changes all coefficients $\beta_0, \ldots, \beta_4$. With orthogonal polynomials, they remain stable.

The Three-Term Recurrence Relation

A Unifying Structure:

All classical orthogonal polynomial families satisfy a three-term recurrence relation:

$$P_{n+1}(x) = (A_n x + B_n) P_n(x) - C_n P_{n-1}(x)$$

with initial conditions $P_0(x) = 1$ and $P_1(x) = A_0 x + B_0$.

The coefficients $A_n$, $B_n$, $C_n$ are specific to each polynomial family. This recurrence provides:

Stable computation: Build polynomials iteratively without computing high powers
Memory efficiency: Only need previous two polynomials in memory
Numerical stability: Avoids catastrophic cancellation in direct formulas

orthogonal_polynomials.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
import numpy as np
from typing import Callable, Tuple
 
def legendre_recurrence(x: np.ndarray, n_max: int) -> np.ndarray:
    """
    Generate Legendre polynomials P_0(x), P_1(x), ..., P_n(x) using recurrence.
    
    Recurrence: (n+1) P_{n+1}(x) = (2n+1) x P_n(x) - n P_{n-1}(x)
    
    Parameters:
        x: Points to evaluate at (array)
        n_max: Maximum polynomial degree
        
    Returns:
        Array of shape (len(x), n_max+1) where column k is P_k(x)
    """
    x = np.asarray(x)
    n_points = len(x)
    P = np.zeros((n_points, n_max + 1))
    
    # Initial conditions
    P[:, 0] = 1.0  # P_0(x) = 1
    if n_max >= 1:
        P[:, 1] = x   # P_1(x) = x
    
    # Three-term recurrence
    for n in range(1, n_max):
        # (n+1) P_{n+1} = (2n+1) x P_n - n P_{n-1}
        P[:, n+1] = ((2*n + 1) * x * P[:, n] - n * P[:, n-1]) / (n + 1)
    
    return P
 
def chebyshev_recurrence(x: np.ndarray, n_max: int) -> np.ndarray:
    """
    Generate Chebyshev polynomials T_0(x), T_1(x), ..., T_n(x) using recurrence.
    
    Recurrence: T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x)
    
    These are the 'first kind' Chebyshev polynomials.
    """
    x = np.asarray(x)
    n_points = len(x)
    T = np.zeros((n_points, n_max + 1))
    
    # Initial conditions
    T[:, 0] = 1.0  # T_0(x) = 1
    if n_max >= 1:
        T[:, 1] = x   # T_1(x) = x
    
    # Three-term recurrence
    for n in range(1, n_max):
        T[:, n+1] = 2 * x * T[:, n] - T[:, n-1]
    
    return T
 
def hermite_recurrence(x: np.ndarray, n_max: int, 
                        physicist: bool = True) -> np.ndarray:
    """
    Generate Hermite polynomials using recurrence.
    
    Physicist's Hermite: H_{n+1}(x) = 2x H_n(x) - 2n H_{n-1}(x)
    Probabilist's: He_{n+1}(x) = x He_n(x) - n He_{n-1}(x)
    """
    x = np.asarray(x)
    n_points = len(x)
    H = np.zeros((n_points, n_max + 1))
    
    # Initial conditions
    H[:, 0] = 1.0
    if n_max >= 1:
        H[:, 1] = 2 * x if physicist else x
    
    # Three-term recurrence
    for n in range(1, n_max):
        if physicist:
            H[:, n+1] = 2 * x * H[:, n] - 2 * n * H[:, n-1]
        else:
            H[:, n+1] = x * H[:, n] - n * H[:, n-1]
    
    return H
 
# Demonstrate condition number improvement
def compare_conditioning(x: np.ndarray, degrees: list):
    """
    Compare condition numbers of design matrices:
    monomials vs Legendre polynomials.
    """
    print("Condition Number Comparison: Monomials vs Legendre")
    print("=" * 60)
    print(f"{'Degree':>8} | {'Monomials':>15} | {'Legendre':>15} | {'Improvement':>12}")
    print("-" * 60)
    
    for d in degrees:
        # Monomial basis
        Phi_mono = np.column_stack([x**k for k in range(d + 1)])
        cond_mono = np.linalg.cond(Phi_mono)
        
        # Legendre basis
        Phi_leg = legendre_recurrence(x, d)
        cond_leg = np.linalg.cond(Phi_leg)
        
        improvement = cond_mono / cond_leg
        
        print(f"{d:>8} | {cond_mono:>15.2e} | {cond_leg:>15.2e} | {improvement:>12.1f}x")
 
# Test with data in [-1, 1]
np.random.seed(42)
x = np.random.uniform(-1, 1, 100)
compare_conditioning(x, degrees=[2, 5, 10, 15, 20])

Domain Matters!

Orthogonal polynomials are orthogonal only on their native interval. Legendre polynomials require $x \in [-1, 1]$. For data in $[a, b]$, you must transform: $z = 2(x - a)/(b - a) - 1$. Using Legendre polynomials outside $[-1, 1]$ destroys their orthogonality and conditioning benefits!

Legendre Polynomials

Definition and Properties:

Legendre polynomials $P_n(x)$ are orthogonal on $[-1, 1]$ with weight $w(x) = 1$:

$$\int_{-1}^{1} P_m(x) P_n(x) , dx = \frac{2}{2n+1} \delta_{mn}$$

where $\delta_{mn}$ is the Kronecker delta (1 if $m=n$, 0 otherwise).

First Several Legendre Polynomials:

$n$	$P_n(x)$
0	$1$
1	$x$
2	$\frac{1}{2}(3x^2 - 1)$
3	$\frac{1}{2}(5x^3 - 3x)$
4	$\frac{1}{8}(35x^4 - 30x^2 + 3)$

Key Properties:

$P_n(1) = 1$ and $P_n(-1) = (-1)^n$
$P_n(x)$ has exactly $n$ real roots in $(-1, 1)$
$|P_n(x)| \leq 1$ for all $x \in [-1, 1]$

legendre_regression.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import numpy as np
from typing import Tuple
 
class LegendreRegression:
    """
    Polynomial regression using Legendre polynomial basis.
    
    Provides numerically stable regression with uncorrelated coefficients.
    """
    
    def __init__(self, degree: int):
        self.degree = degree
        self.coefficients = None
        self.x_min = None
        self.x_max = None
    
    def _transform_to_standard(self, x: np.ndarray) -> np.ndarray:
        """Transform x from [x_min, x_max] to [-1, 1]."""
        return 2 * (x - self.x_min) / (self.x_max - self.x_min) - 1
    
    def _legendre_basis(self, z: np.ndarray) -> np.ndarray:
        """Evaluate Legendre polynomials at points z in [-1, 1]."""
        n = len(z)
        P = np.zeros((n, self.degree + 1))
        
        P[:, 0] = 1.0
        if self.degree >= 1:
            P[:, 1] = z
        
        for k in range(1, self.degree):
            P[:, k+1] = ((2*k + 1) * z * P[:, k] - k * P[:, k-1]) / (k + 1)
        
        return P
    
    def fit(self, x: np.ndarray, y: np.ndarray) -> 'LegendreRegression':
        """
        Fit Legendre polynomial regression.
        """
        self.x_min = np.min(x)
        self.x_max = np.max(x)
        
        # Transform to [-1, 1]
        z = self._transform_to_standard(x)
        
        # Build Legendre basis matrix
        P = self._legendre_basis(z)
        
        # OLS fit (now numerically stable!)
        self.coefficients = np.linalg.lstsq(P, y, rcond=None)[0]
        
        return self
    
    def predict(self, x: np.ndarray) -> np.ndarray:
        """Predict at new points."""
        z = self._transform_to_standard(x)
        P = self._legendre_basis(z)
        return P @ self.coefficients
    
    def get_coefficient_interpretation(self) -> dict:
        """
        Interpret Legendre coefficients.
        
        Unlike monomial coefficients, Legendre coefficients have nice properties:
        - c_0: Mean of the fitted function
        - c_1: Linear trend (slope-like)
        - c_2: Quadratic curvature
        - Higher terms: Higher-order oscillations
        """
        interpretation = {}
        
        for i, c in enumerate(self.coefficients):
            if i == 0:
                desc = "Mean/Intercept"
            elif i == 1:
                desc = "Linear trend"
            elif i == 2:
                desc = "Quadratic curvature"
            else:
                desc = f"Degree-{i} oscillation"
            
            interpretation[f"c_{i}"] = {
                'value': c,
                'description': desc,
                'magnitude': np.abs(c)
            }
        
        return interpretation
 
# Demonstration: Compare monomial vs Legendre
np.random.seed(42)
x = np.linspace(0, 10, 50)
y_true = np.sin(x * 0.8) + 0.3 * x
y = y_true + np.random.normal(0, 0.3, len(x))
 
print("Comparison: Monomial vs Legendre Basis")
print("=" * 60)
 
for degree in [5, 10, 15]:
    # Monomial fit
    x_scaled = (x - 5) / 5  # Scale to approx [-1, 1]
    Phi_mono = np.column_stack([x_scaled**k for k in range(degree + 1)])
    try:
        beta_mono = np.linalg.lstsq(Phi_mono, y, rcond=None)[0]
        cond_mono = np.linalg.cond(Phi_mono)
        coef_norm_mono = np.linalg.norm(beta_mono)
    except:
        cond_mono = np.inf
        coef_norm_mono = np.inf
    
    # Legendre fit
    model_leg = LegendreRegression(degree).fit(x, y)
    P_leg = model_leg._legendre_basis(model_leg._transform_to_standard(x))
    cond_leg = np.linalg.cond(P_leg)
    coef_norm_leg = np.linalg.norm(model_leg.coefficients)
    
    print(f"
Degree {degree}:")
    print(f"  Condition number: Mono={cond_mono:.2e}, Legendre={cond_leg:.2e}")
    print(f"  Coefficient norm: Mono={coef_norm_mono:.2f}, Legendre={coef_norm_leg:.2f}")

When to Use Legendre Polynomials

Legendre polynomials are the default choice for general-purpose orthogonal polynomial regression when: (1) data can be rescaled to $[-1, 1]$, (2) data is roughly uniformly distributed or you're treating all regions equally, (3) you want simple, interpretable coefficients. For non-uniform data distributions, consider other families.

Chebyshev Polynomials

Definition and Properties:

Chebyshev polynomials of the first kind $T_n(x)$ are defined by:

$$T_n(x) = \cos(n \cdot \arccos(x))$$

They're orthogonal on $[-1, 1]$ with weight $w(x) = 1/\sqrt{1-x^2}$:

$$\int_{-1}^{1} \frac{T_m(x) T_n(x)}{\sqrt{1-x^2}} , dx = \begin{cases} 0 & m eq n \\ \pi & m = n = 0 \\ \pi/2 & m = n eq 0 \end{cases}$$

First Several Chebyshev Polynomials:

$n$	$T_n(x)$
0	$1$
1	$x$
2	$2x^2 - 1$
3	$4x^3 - 3x$
4	$8x^4 - 8x^2 + 1$

The Minimax Property:

Chebyshev polynomials have a remarkable optimality property: among all degree-$n$ polynomials with leading coefficient $2^{n-1}$, $T_n(x)$ has the smallest maximum absolute value on $[-1, 1]$.

$$\max_{x \in [-1,1]} |T_n(x)| = 1$$

Why This Matters for Regression:

When approximating a function $f(x)$ with a polynomial, using Chebyshev polynomials:

Minimizes the maximum error across the interval (minimax approximation)
The error oscillates uniformly (equioscillation)
Avoids Runge's phenomenon at interval boundaries

This makes Chebyshev polynomials ideal when you need uniform accuracy across the entire domain.

chebyshev_regression.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
import numpy as np
 
class ChebyshevRegression:
    """
    Polynomial regression using Chebyshev polynomial basis.
    
    Optimal for minimax (uniform) approximation quality.
    """
    
    def __init__(self, degree: int):
        self.degree = degree
        self.coefficients = None
        self.x_min = None
        self.x_max = None
    
    def _transform_to_standard(self, x: np.ndarray) -> np.ndarray:
        """Transform x from [x_min, x_max] to [-1, 1]."""
        return 2 * (x - self.x_min) / (self.x_max - self.x_min) - 1
    
    def _chebyshev_basis(self, z: np.ndarray) -> np.ndarray:
        """Evaluate Chebyshev polynomials at points z in [-1, 1]."""
        n = len(z)
        T = np.zeros((n, self.degree + 1))
        
        T[:, 0] = 1.0
        if self.degree >= 1:
            T[:, 1] = z
        
        for k in range(1, self.degree):
            T[:, k+1] = 2 * z * T[:, k] - T[:, k-1]
        
        return T
    
    def fit(self, x: np.ndarray, y: np.ndarray) -> 'ChebyshevRegression':
        """Fit Chebyshev polynomial regression."""
        self.x_min = np.min(x)
        self.x_max = np.max(x)
        
        z = self._transform_to_standard(x)
        T = self._chebyshev_basis(z)
        
        self.coefficients = np.linalg.lstsq(T, y, rcond=None)[0]
        return self
    
    def predict(self, x: np.ndarray) -> np.ndarray:
        """Predict at new points."""
        z = self._transform_to_standard(x)
        T = self._chebyshev_basis(z)
        return T @ self.coefficients
 
def chebyshev_nodes(n: int, a: float = -1, b: float = 1) -> np.ndarray:
    """
    Generate Chebyshev nodes for optimal polynomial interpolation.
    
    These nodes minimize interpolation error (avoid Runge's phenomenon).
    
    Nodes are: x_k = cos((2k+1)π / (2n)), k = 0, 1, ..., n-1
    """
    k = np.arange(n)
    nodes = np.cos((2 * k + 1) * np.pi / (2 * n))
    
    # Transform from [-1, 1] to [a, b]
    return 0.5 * ((b - a) * nodes + (a + b))
 
# Demonstration: Runge's phenomenon and Chebyshev nodes
def demonstrate_runge_phenomenon():
    """
    Show how Chebyshev nodes avoid Runge's phenomenon.
    """
    # The classic Runge function
    runge = lambda x: 1 / (1 + 25 * x**2)
    
    print("Runge's Phenomenon: Uniform vs Chebyshev Nodes")
    print("=" * 60)
    print("f(x) = 1 / (1 + 25x²) on [-1, 1]")
    print("-" * 60)
    
    # Dense grid for error evaluation
    x_eval = np.linspace(-1, 1, 500)
    y_true = runge(x_eval)
    
    degrees = [5, 10, 15, 20]
    
    print(f"{'Degree':>8} | {'Max Err (Uniform)':>18} | {'Max Err (Chebyshev)':>20}")
    print("-" * 60)
    
    for deg in degrees:
        # Uniform nodes
        x_uniform = np.linspace(-1, 1, deg + 1)
        y_uniform = runge(x_uniform)
        
        Phi_uniform = np.column_stack([x_uniform**k for k in range(deg + 1)])
        beta_uniform = np.linalg.lstsq(Phi_uniform, y_uniform, rcond=None)[0]
        
        Phi_eval = np.column_stack([x_eval**k for k in range(deg + 1)])
        y_pred_uniform = Phi_eval @ beta_uniform
        max_err_uniform = np.max(np.abs(y_true - y_pred_uniform))
        
        # Chebyshev nodes
        x_cheb = chebyshev_nodes(deg + 1, -1, 1)
        y_cheb = runge(x_cheb)
        
        Phi_cheb = np.column_stack([x_cheb**k for k in range(deg + 1)])
        beta_cheb = np.linalg.lstsq(Phi_cheb, y_cheb, rcond=None)[0]
        
        Phi_eval_cheb = np.column_stack([x_eval**k for k in range(deg + 1)])
        y_pred_cheb = Phi_eval_cheb @ beta_cheb
        max_err_cheb = np.max(np.abs(y_true - y_pred_cheb))
        
        improvement = max_err_uniform / max_err_cheb if max_err_cheb > 0 else float('inf')
        
        print(f"{deg:>8} | {max_err_uniform:>18.6f} | {max_err_cheb:>20.6f}")
    
    print("
Note: Uniform nodes cause error to EXPLODE at high degrees")
    print("      Chebyshev nodes keep error bounded and decreasing")
 
demonstrate_runge_phenomenon()

Chebyshev Nodes

When choosing where to sample a function for polynomial interpolation, Chebyshev nodes (roots of $T_n(x)$) are optimal. They cluster near the boundaries of $[-1, 1]$, compensating for the polynomial's tendency to oscillate wildly there. This is the practical solution to Runge's phenomenon.

Hermite Polynomials

Definition and Properties:

Hermite polynomials are orthogonal on $(-\infty, \infty)$ with Gaussian weight $w(x) = e^{-x^2}$:

$$\int_{-\infty}^{\infty} H_m(x) H_n(x) e^{-x^2} , dx = \sqrt{\pi} , 2^n , n! , \delta_{mn}$$

Two Conventions:

Type	Definition	Use Case
Physicist's $H_n(x)$	$(-1)^n e^{x^2} \frac{d^n}{dx^n} e^{-x^2}$	Quantum mechanics, physics
Probabilist's $\text{He}_n(x)$	$(-1)^n e^{x^2/2} \frac{d^n}{dx^n} e^{-x^2/2}$	Statistics, ML with Gaussian data

First Several (Probabilist's):

$n$	$\text{He}_n(x)$
0	$1$
1	$x$
2	$x^2 - 1$
3	$x^3 - 3x$
4	$x^4 - 6x^2 + 3$

Why Hermite Polynomials for Gaussian Data:

When your data $x$ is approximately Gaussian (normal), Hermite polynomials are the natural choice because:

The weight function $e^{-x^2}$ matches the Gaussian density
The polynomials are orthogonal with respect to the data distribution
Coefficients represent independent 'moments' of the relationship
Particularly effective for regression on standardized (z-scored) features

In machine learning, this connects to Gaussian bases used in kernel methods and spectral features.

hermite_regression.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import numpy as np
 
class ProbabilistHermiteRegression:
    """
    Polynomial regression using probabilist's Hermite polynomial basis.
    
    Optimal when input features are approximately Gaussian distributed.
    """
    
    def __init__(self, degree: int):
        self.degree = degree
        self.coefficients = None
        self.x_mean = None
        self.x_std = None
    
    def _standardize(self, x: np.ndarray) -> np.ndarray:
        """Standardize to mean 0, std 1 (appropriate for Hermite)."""
        return (x - self.x_mean) / self.x_std
    
    def _hermite_basis(self, z: np.ndarray) -> np.ndarray:
        """
        Evaluate probabilist's Hermite polynomials.
        
        Recurrence: He_{n+1}(x) = x * He_n(x) - n * He_{n-1}(x)
        """
        n = len(z)
        He = np.zeros((n, self.degree + 1))
        
        He[:, 0] = 1.0
        if self.degree >= 1:
            He[:, 1] = z
        
        for k in range(1, self.degree):
            He[:, k+1] = z * He[:, k] - k * He[:, k-1]
        
        return He
    
    def fit(self, x: np.ndarray, y: np.ndarray) -> 'ProbabilistHermiteRegression':
        """Fit Hermite polynomial regression."""
        self.x_mean = np.mean(x)
        self.x_std = np.std(x)
        if self.x_std < 1e-10:
            self.x_std = 1.0
        
        z = self._standardize(x)
        He = self._hermite_basis(z)
        
        self.coefficients = np.linalg.lstsq(He, y, rcond=None)[0]
        return self
    
    def predict(self, x: np.ndarray) -> np.ndarray:
        """Predict at new points."""
        z = self._standardize(x)
        He = self._hermite_basis(z)
        return He @ self.coefficients
 
# Compare Hermite vs Legendre for Gaussian data
np.random.seed(42)
 
# Gaussian-distributed x (where Hermite shines)
x_gaussian = np.random.normal(0, 1, 100)
y_true = x_gaussian**2 - 0.5 * x_gaussian**3 + 2
y = y_true + np.random.normal(0, 0.3, len(x_gaussian))
 
# Hold out test data
x_test = np.random.normal(0, 1, 50)
y_test = x_test**2 - 0.5 * x_test**3 + 2 + np.random.normal(0, 0.3, len(x_test))
 
print("Comparison on Gaussian-Distributed Data")
print("=" * 60)
 
for degree in [3, 5, 8]:
    # Hermite regression
    model_hermite = ProbabilistHermiteRegression(degree).fit(x_gaussian, y)
    y_pred_hermite = model_hermite.predict(x_test)
    mse_hermite = np.mean((y_test - y_pred_hermite)**2)
    
    # Legendre regression (for comparison)
    from scipy.special import legendre
    
    # Need to transform to [-1, 1] for Legendre
    x_min, x_max = x_gaussian.min(), x_gaussian.max()
    z_train = 2 * (x_gaussian - x_min) / (x_max - x_min) - 1
    z_test = np.clip(2 * (x_test - x_min) / (x_max - x_min) - 1, -1, 1)
    
    Phi_leg_train = np.column_stack([np.polyval(legendre(k), z_train) 
                                      for k in range(degree + 1)])
    Phi_leg_test = np.column_stack([np.polyval(legendre(k), z_test) 
                                     for k in range(degree + 1)])
    
    beta_leg = np.linalg.lstsq(Phi_leg_train, y, rcond=None)[0]
    y_pred_leg = Phi_leg_test @ beta_leg
    mse_legendre = np.mean((y_test - y_pred_leg)**2)
    
    print(f"Degree {degree}: Hermite MSE = {mse_hermite:.4f}, "
          f"Legendre MSE = {mse_legendre:.4f}")

Choosing the Right Orthogonal Family

Legendre: Uniform or bounded data on a finite interval. General-purpose default. Chebyshev: Need uniform accuracy across interval; sensitive to boundary errors. Hermite: Gaussian-distributed data; unbounded domains. Laguerre: Exponentially distributed data; $[0, \infty)$ domain.

Practical Implementation Guide

Using Orthogonal Polynomials in Practice:

Most scientific computing libraries provide implementations of orthogonal polynomials. Here's how to use them effectively:

practical_orthogonal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
import numpy as np
from numpy.polynomial import legendre, chebyshev, hermite
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
 
def orthogonal_polynomial_regression(x: np.ndarray, y: np.ndarray,
                                       degree: int, 
                                       family: str = 'legendre',
                                       regularization: float = 0.0) -> dict:
    """
    Production-ready orthogonal polynomial regression.
    
    Parameters:
        x: Input features (1D)
        y: Targets
        degree: Maximum polynomial degree
        family: 'legendre', 'chebyshev', or 'hermite'
        regularization: Ridge regularization strength
        
    Returns:
        Dictionary with fitted model information
    """
    # Choose family and appropriate transform
    if family == 'legendre':
        # Transform to [-1, 1]
        x_min, x_max = np.min(x), np.max(x)
        z = 2 * (x - x_min) / (x_max - x_min) - 1
        
        # Build basis using NumPy's legendre module
        # legendre.legvander creates the Vandermonde matrix for Legendre polys
        Phi = legendre.legvander(z, degree)
        
    elif family == 'chebyshev':
        # Transform to [-1, 1]
        x_min, x_max = np.min(x), np.max(x)
        z = 2 * (x - x_min) / (x_max - x_min) - 1
        
        # Chebyshev Vandermonde matrix
        Phi = chebyshev.chebvander(z, degree)
        
    elif family == 'hermite':
        # Standardize (mean 0, std 1) for Hermite
        x_mean, x_std = np.mean(x), np.std(x)
        if x_std < 1e-10: x_std = 1.0
        z = (x - x_mean) / x_std
        
        # Probabilist's Hermite (hermite_e in NumPy)
        from numpy.polynomial.hermite_e import hermevander
        Phi = hermevander(z, degree)
        x_min, x_max = x_mean, x_std  # Store for transform
        
    else:
        raise ValueError(f"Unknown family: {family}")
    
    # Fit with optional regularization
    if regularization > 0:
        reg_matrix = regularization * np.eye(degree + 1)
        reg_matrix[0, 0] = 0  # Don't regularize intercept
        A = Phi.T @ Phi + reg_matrix
        b = Phi.T @ y
        coefficients = np.linalg.solve(A, b)
    else:
        coefficients = np.linalg.lstsq(Phi, y, rcond=None)[0]
    
    # Compute fit quality
    y_pred = Phi @ coefficients
    rss = np.sum((y - y_pred)**2)
    tss = np.sum((y - np.mean(y))**2)
    r_squared = 1 - rss / tss
    
    return {
        'family': family,
        'degree': degree,
        'coefficients': coefficients,
        'transform_params': (x_min, x_max),
        'r_squared': r_squared,
        'condition_number': np.linalg.cond(Phi)
    }
 
# Production example with real-world-like data
np.random.seed(42)
 
# Simulate some experimental data (e.g., dose-response curve)
dose = np.linspace(0.1, 10, 50)  # Drug dose
# True relationship: Hill equation (sigmoidal) approximated by polynomial
response_true = 100 * dose**2 / (5**2 + dose**2)
response = response_true + np.random.normal(0, 3, len(dose))
 
# Compare different orthogonal families
print("Orthogonal Polynomial Regression Comparison")
print("Dataset: Simulated dose-response curve")
print("=" * 70)
print(f"{'Family':>12} | {'Degree':>6} | {'R²':>8} | {'Cond #':>12} | {'Status':>10}")
print("-" * 70)
 
for family in ['legendre', 'chebyshev']:
    for degree in [3, 5, 7]:
        result = orthogonal_polynomial_regression(
            dose, response, degree, family, regularization=0.01
        )
        
        status = "Good" if result['condition_number'] < 1e8 else "Warning"
        
        print(f"{result['family']:>12} | {result['degree']:>6} | "
              f"{result['r_squared']:>8.4f} | "
              f"{result['condition_number']:>12.2e} | {status:>10}")

Best Practices Summary

•Always transform to native interval before computing orthogonal polynomials
•Use library implementations (NumPy, SciPy) for numerical stability
•Check condition number after building basis—should be O(1) to O(100)
•Combine with regularization for additional robustness
•Choose family by data distribution: Legendre for uniform, Hermite for Gaussian

Summary: Orthogonal Polynomials

We've explored orthogonal polynomial bases as the numerically superior alternative to monomials. Let's consolidate:

Key Takeaways

•Orthogonal polynomials are uncorrelated basis functions, leading to well-conditioned design matrices and stable coefficient estimates.
•Three-term recurrence allows efficient, stable computation without directly evaluating high powers.
•Legendre polynomials are the general-purpose choice for finite intervals with roughly uniform data.
•Chebyshev polynomials minimize maximum error (minimax) and avoid boundary oscillations.
•Hermite polynomials are optimal for Gaussian-distributed data on unbounded domains.
•Domain transformation is essential: always map data to the polynomial family's native interval.

What's Next:

Orthogonal polynomials solve numerical issues but still produce global fits—each polynomial term affects the entire domain. The next page introduces piecewise polynomials and splines, which provide local control: changes in one region don't affect fits elsewhere. This leads to flexible, robust regression for complex patterns.

Page Complete

You now command orthogonal polynomial bases: their mathematical foundations, computational methods, and practical implementation. This knowledge elevates polynomial regression from a fragile technique to a robust, numerically sound methodology.

Orthogonal Polynomials

A Better Polynomial Basis

Throughout this module, we've used the monomial basis $\{1, x, x^2, \ldots, x^d\}$ for polynomial regression. This is the most intuitive choice—but it's often the worst choice numerically.

The problem is that high powers of $x$ become increasingly correlated:

$x^4$ and $x^5$ are nearly collinear for most data distributions
This collinearity creates an ill-conditioned design matrix
Small changes in data produce large changes in coefficients
Numerical errors accumulate catastrophically

What You Will Learn

The Concept of Orthogonality for Functions

Inner Products for Functions:

$$\langle f, g \rangle = \int_a^b f(x) g(x) w(x) , dx$$

Functions are orthogonal if $\langle f, g \rangle = 0$.

For Polynomials:

A sequence of polynomials $\{P_0(x), P_1(x), P_2(x), \ldots\}$ is orthogonal if: $$\langle P_m, P_n \rangle = \int_a^b P_m(x) P_n(x) w(x) , dx = 0 \quad \text{for } m eq n$$

Different choices of interval $[a, b]$ and weight $w(x)$ give different orthogonal polynomial families.

Classical Orthogonal Polynomial Families
Family	Interval	Weight $w(x)$	Primary Use
Legendre $P_n(x)$	$[-1, 1]$	$1$	General-purpose, uniform data
Chebyshev $T_n(x)$	$[-1, 1]$	$1/\sqrt{1-x^2}$	Optimal approximation, minimizing max error
Chebyshev $U_n(x)$	$[-1, 1]$	$\sqrt{1-x^2}$	Second kind, integration
Hermite $H_n(x)$	$(-\infty, \infty)$	$e^{-x^2}$	Gaussian-distributed data
Laguerre $L_n(x)$	$[0, \infty)$	$e^{-x}$	Exponentially decaying data

Why Orthogonality Matters for Regression:

When we expand a function in an orthogonal basis, the coefficients are uncorrelated. If $f(x) = \sum_{k=0}^{d} c_k P_k(x)$, then:

$$c_k = \frac{\langle f, P_k \rangle}{\langle P_k, P_k \rangle}$$

The Key Advantage

The Three-Term Recurrence Relation

A Unifying Structure:

All classical orthogonal polynomial families satisfy a three-term recurrence relation:

$$P_{n+1}(x) = (A_n x + B_n) P_n(x) - C_n P_{n-1}(x)$$

with initial conditions $P_0(x) = 1$ and $P_1(x) = A_0 x + B_0$.

The coefficients $A_n$, $B_n$, $C_n$ are specific to each polynomial family. This recurrence provides:

Stable computation: Build polynomials iteratively without computing high powers
Memory efficiency: Only need previous two polynomials in memory
Numerical stability: Avoids catastrophic cancellation in direct formulas

orthogonal_polynomials.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
import numpy as np
from typing import Callable, Tuple
 
def legendre_recurrence(x: np.ndarray, n_max: int) -> np.ndarray:
    """
    Generate Legendre polynomials P_0(x), P_1(x), ..., P_n(x) using recurrence.
    
    Recurrence: (n+1) P_{n+1}(x) = (2n+1) x P_n(x) - n P_{n-1}(x)
    
    Parameters:
        x: Points to evaluate at (array)
        n_max: Maximum polynomial degree
        
    Returns:
        Array of shape (len(x), n_max+1) where column k is P_k(x)
    """
    x = np.asarray(x)
    n_points = len(x)
    P = np.zeros((n_points, n_max + 1))
    
    # Initial conditions
    P[:, 0] = 1.0  # P_0(x) = 1
    if n_max >= 1:
        P[:, 1] = x   # P_1(x) = x
    
    # Three-term recurrence
    for n in range(1, n_max):
        # (n+1) P_{n+1} = (2n+1) x P_n - n P_{n-1}
        P[:, n+1] = ((2*n + 1) * x * P[:, n] - n * P[:, n-1]) / (n + 1)
    
    return P
 
def chebyshev_recurrence(x: np.ndarray, n_max: int) -> np.ndarray:
    """
    Generate Chebyshev polynomials T_0(x), T_1(x), ..., T_n(x) using recurrence.
    
    Recurrence: T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x)
    
    These are the 'first kind' Chebyshev polynomials.
    """
    x = np.asarray(x)
    n_points = len(x)
    T = np.zeros((n_points, n_max + 1))
    
    # Initial conditions
    T[:, 0] = 1.0  # T_0(x) = 1
    if n_max >= 1:
        T[:, 1] = x   # T_1(x) = x
    
    # Three-term recurrence
    for n in range(1, n_max):
        T[:, n+1] = 2 * x * T[:, n] - T[:, n-1]
    
    return T
 
def hermite_recurrence(x: np.ndarray, n_max: int, 
                        physicist: bool = True) -> np.ndarray:
    """
    Generate Hermite polynomials using recurrence.
    
    Physicist's Hermite: H_{n+1}(x) = 2x H_n(x) - 2n H_{n-1}(x)
    Probabilist's: He_{n+1}(x) = x He_n(x) - n He_{n-1}(x)
    """
    x = np.asarray(x)
    n_points = len(x)
    H = np.zeros((n_points, n_max + 1))
    
    # Initial conditions
    H[:, 0] = 1.0
    if n_max >= 1:
        H[:, 1] = 2 * x if physicist else x
    
    # Three-term recurrence
    for n in range(1, n_max):
        if physicist:
            H[:, n+1] = 2 * x * H[:, n] - 2 * n * H[:, n-1]
        else:
            H[:, n+1] = x * H[:, n] - n * H[:, n-1]
    
    return H
 
# Demonstrate condition number improvement
def compare_conditioning(x: np.ndarray, degrees: list):
    """
    Compare condition numbers of design matrices:
    monomials vs Legendre polynomials.
    """
    print("Condition Number Comparison: Monomials vs Legendre")
    print("=" * 60)
    print(f"{'Degree':>8} | {'Monomials':>15} | {'Legendre':>15} | {'Improvement':>12}")
    print("-" * 60)
    
    for d in degrees:
        # Monomial basis
        Phi_mono = np.column_stack([x**k for k in range(d + 1)])
        cond_mono = np.linalg.cond(Phi_mono)
        
        # Legendre basis
        Phi_leg = legendre_recurrence(x, d)
        cond_leg = np.linalg.cond(Phi_leg)
        
        improvement = cond_mono / cond_leg
        
        print(f"{d:>8} | {cond_mono:>15.2e} | {cond_leg:>15.2e} | {improvement:>12.1f}x")
 
# Test with data in [-1, 1]
np.random.seed(42)
x = np.random.uniform(-1, 1, 100)
compare_conditioning(x, degrees=[2, 5, 10, 15, 20])

Domain Matters!

Legendre Polynomials

Definition and Properties:

Legendre polynomials $P_n(x)$ are orthogonal on $[-1, 1]$ with weight $w(x) = 1$:

$$\int_{-1}^{1} P_m(x) P_n(x) , dx = \frac{2}{2n+1} \delta_{mn}$$

where $\delta_{mn}$ is the Kronecker delta (1 if $m=n$, 0 otherwise).

First Several Legendre Polynomials:

$n$	$P_n(x)$
0	$1$
1	$x$
2	$\frac{1}{2}(3x^2 - 1)$
3	$\frac{1}{2}(5x^3 - 3x)$
4	$\frac{1}{8}(35x^4 - 30x^2 + 3)$

Key Properties:

$P_n(1) = 1$ and $P_n(-1) = (-1)^n$
$P_n(x)$ has exactly $n$ real roots in $(-1, 1)$
$|P_n(x)| \leq 1$ for all $x \in [-1, 1]$

legendre_regression.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import numpy as np
from typing import Tuple
 
class LegendreRegression:
    """
    Polynomial regression using Legendre polynomial basis.
    
    Provides numerically stable regression with uncorrelated coefficients.
    """
    
    def __init__(self, degree: int):
        self.degree = degree
        self.coefficients = None
        self.x_min = None
        self.x_max = None
    
    def _transform_to_standard(self, x: np.ndarray) -> np.ndarray:
        """Transform x from [x_min, x_max] to [-1, 1]."""
        return 2 * (x - self.x_min) / (self.x_max - self.x_min) - 1
    
    def _legendre_basis(self, z: np.ndarray) -> np.ndarray:
        """Evaluate Legendre polynomials at points z in [-1, 1]."""
        n = len(z)
        P = np.zeros((n, self.degree + 1))
        
        P[:, 0] = 1.0
        if self.degree >= 1:
            P[:, 1] = z
        
        for k in range(1, self.degree):
            P[:, k+1] = ((2*k + 1) * z * P[:, k] - k * P[:, k-1]) / (k + 1)
        
        return P
    
    def fit(self, x: np.ndarray, y: np.ndarray) -> 'LegendreRegression':
        """
        Fit Legendre polynomial regression.
        """
        self.x_min = np.min(x)
        self.x_max = np.max(x)
        
        # Transform to [-1, 1]
        z = self._transform_to_standard(x)
        
        # Build Legendre basis matrix
        P = self._legendre_basis(z)
        
        # OLS fit (now numerically stable!)
        self.coefficients = np.linalg.lstsq(P, y, rcond=None)[0]
        
        return self
    
    def predict(self, x: np.ndarray) -> np.ndarray:
        """Predict at new points."""
        z = self._transform_to_standard(x)
        P = self._legendre_basis(z)
        return P @ self.coefficients
    
    def get_coefficient_interpretation(self) -> dict:
        """
        Interpret Legendre coefficients.
        
        Unlike monomial coefficients, Legendre coefficients have nice properties:
        - c_0: Mean of the fitted function
        - c_1: Linear trend (slope-like)
        - c_2: Quadratic curvature
        - Higher terms: Higher-order oscillations
        """
        interpretation = {}
        
        for i, c in enumerate(self.coefficients):
            if i == 0:
                desc = "Mean/Intercept"
            elif i == 1:
                desc = "Linear trend"
            elif i == 2:
                desc = "Quadratic curvature"
            else:
                desc = f"Degree-{i} oscillation"
            
            interpretation[f"c_{i}"] = {
                'value': c,
                'description': desc,
                'magnitude': np.abs(c)
            }
        
        return interpretation
 
# Demonstration: Compare monomial vs Legendre
np.random.seed(42)
x = np.linspace(0, 10, 50)
y_true = np.sin(x * 0.8) + 0.3 * x
y = y_true + np.random.normal(0, 0.3, len(x))
 
print("Comparison: Monomial vs Legendre Basis")
print("=" * 60)
 
for degree in [5, 10, 15]:
    # Monomial fit
    x_scaled = (x - 5) / 5  # Scale to approx [-1, 1]
    Phi_mono = np.column_stack([x_scaled**k for k in range(degree + 1)])
    try:
        beta_mono = np.linalg.lstsq(Phi_mono, y, rcond=None)[0]
        cond_mono = np.linalg.cond(Phi_mono)
        coef_norm_mono = np.linalg.norm(beta_mono)
    except:
        cond_mono = np.inf
        coef_norm_mono = np.inf
    
    # Legendre fit
    model_leg = LegendreRegression(degree).fit(x, y)
    P_leg = model_leg._legendre_basis(model_leg._transform_to_standard(x))
    cond_leg = np.linalg.cond(P_leg)
    coef_norm_leg = np.linalg.norm(model_leg.coefficients)
    
    print(f"
Degree {degree}:")
    print(f"  Condition number: Mono={cond_mono:.2e}, Legendre={cond_leg:.2e}")
    print(f"  Coefficient norm: Mono={coef_norm_mono:.2f}, Legendre={coef_norm_leg:.2f}")

When to Use Legendre Polynomials

Chebyshev Polynomials

Definition and Properties:

Chebyshev polynomials of the first kind $T_n(x)$ are defined by:

$$T_n(x) = \cos(n \cdot \arccos(x))$$

They're orthogonal on $[-1, 1]$ with weight $w(x) = 1/\sqrt{1-x^2}$:

$$\int_{-1}^{1} \frac{T_m(x) T_n(x)}{\sqrt{1-x^2}} , dx = \begin{cases} 0 & m eq n \\ \pi & m = n = 0 \\ \pi/2 & m = n eq 0 \end{cases}$$

First Several Chebyshev Polynomials:

$n$	$T_n(x)$
0	$1$
1	$x$
2	$2x^2 - 1$
3	$4x^3 - 3x$
4	$8x^4 - 8x^2 + 1$

The Minimax Property:

Chebyshev polynomials have a remarkable optimality property: among all degree-$n$ polynomials with leading coefficient $2^{n-1}$, $T_n(x)$ has the smallest maximum absolute value on $[-1, 1]$.

$$\max_{x \in [-1,1]} |T_n(x)| = 1$$

Why This Matters for Regression:

When approximating a function $f(x)$ with a polynomial, using Chebyshev polynomials:

Minimizes the maximum error across the interval (minimax approximation)
The error oscillates uniformly (equioscillation)
Avoids Runge's phenomenon at interval boundaries

This makes Chebyshev polynomials ideal when you need uniform accuracy across the entire domain.

chebyshev_regression.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
import numpy as np
 
class ChebyshevRegression:
    """
    Polynomial regression using Chebyshev polynomial basis.
    
    Optimal for minimax (uniform) approximation quality.
    """
    
    def __init__(self, degree: int):
        self.degree = degree
        self.coefficients = None
        self.x_min = None
        self.x_max = None
    
    def _transform_to_standard(self, x: np.ndarray) -> np.ndarray:
        """Transform x from [x_min, x_max] to [-1, 1]."""
        return 2 * (x - self.x_min) / (self.x_max - self.x_min) - 1
    
    def _chebyshev_basis(self, z: np.ndarray) -> np.ndarray:
        """Evaluate Chebyshev polynomials at points z in [-1, 1]."""
        n = len(z)
        T = np.zeros((n, self.degree + 1))
        
        T[:, 0] = 1.0
        if self.degree >= 1:
            T[:, 1] = z
        
        for k in range(1, self.degree):
            T[:, k+1] = 2 * z * T[:, k] - T[:, k-1]
        
        return T
    
    def fit(self, x: np.ndarray, y: np.ndarray) -> 'ChebyshevRegression':
        """Fit Chebyshev polynomial regression."""
        self.x_min = np.min(x)
        self.x_max = np.max(x)
        
        z = self._transform_to_standard(x)
        T = self._chebyshev_basis(z)
        
        self.coefficients = np.linalg.lstsq(T, y, rcond=None)[0]
        return self
    
    def predict(self, x: np.ndarray) -> np.ndarray:
        """Predict at new points."""
        z = self._transform_to_standard(x)
        T = self._chebyshev_basis(z)
        return T @ self.coefficients
 
def chebyshev_nodes(n: int, a: float = -1, b: float = 1) -> np.ndarray:
    """
    Generate Chebyshev nodes for optimal polynomial interpolation.
    
    These nodes minimize interpolation error (avoid Runge's phenomenon).
    
    Nodes are: x_k = cos((2k+1)π / (2n)), k = 0, 1, ..., n-1
    """
    k = np.arange(n)
    nodes = np.cos((2 * k + 1) * np.pi / (2 * n))
    
    # Transform from [-1, 1] to [a, b]
    return 0.5 * ((b - a) * nodes + (a + b))
 
# Demonstration: Runge's phenomenon and Chebyshev nodes
def demonstrate_runge_phenomenon():
    """
    Show how Chebyshev nodes avoid Runge's phenomenon.
    """
    # The classic Runge function
    runge = lambda x: 1 / (1 + 25 * x**2)
    
    print("Runge's Phenomenon: Uniform vs Chebyshev Nodes")
    print("=" * 60)
    print("f(x) = 1 / (1 + 25x²) on [-1, 1]")
    print("-" * 60)
    
    # Dense grid for error evaluation
    x_eval = np.linspace(-1, 1, 500)
    y_true = runge(x_eval)
    
    degrees = [5, 10, 15, 20]
    
    print(f"{'Degree':>8} | {'Max Err (Uniform)':>18} | {'Max Err (Chebyshev)':>20}")
    print("-" * 60)
    
    for deg in degrees:
        # Uniform nodes
        x_uniform = np.linspace(-1, 1, deg + 1)
        y_uniform = runge(x_uniform)
        
        Phi_uniform = np.column_stack([x_uniform**k for k in range(deg + 1)])
        beta_uniform = np.linalg.lstsq(Phi_uniform, y_uniform, rcond=None)[0]
        
        Phi_eval = np.column_stack([x_eval**k for k in range(deg + 1)])
        y_pred_uniform = Phi_eval @ beta_uniform
        max_err_uniform = np.max(np.abs(y_true - y_pred_uniform))
        
        # Chebyshev nodes
        x_cheb = chebyshev_nodes(deg + 1, -1, 1)
        y_cheb = runge(x_cheb)
        
        Phi_cheb = np.column_stack([x_cheb**k for k in range(deg + 1)])
        beta_cheb = np.linalg.lstsq(Phi_cheb, y_cheb, rcond=None)[0]
        
        Phi_eval_cheb = np.column_stack([x_eval**k for k in range(deg + 1)])
        y_pred_cheb = Phi_eval_cheb @ beta_cheb
        max_err_cheb = np.max(np.abs(y_true - y_pred_cheb))
        
        improvement = max_err_uniform / max_err_cheb if max_err_cheb > 0 else float('inf')
        
        print(f"{deg:>8} | {max_err_uniform:>18.6f} | {max_err_cheb:>20.6f}")
    
    print("
Note: Uniform nodes cause error to EXPLODE at high degrees")
    print("      Chebyshev nodes keep error bounded and decreasing")
 
demonstrate_runge_phenomenon()

Chebyshev Nodes

Hermite Polynomials

Definition and Properties:

Hermite polynomials are orthogonal on $(-\infty, \infty)$ with Gaussian weight $w(x) = e^{-x^2}$:

$$\int_{-\infty}^{\infty} H_m(x) H_n(x) e^{-x^2} , dx = \sqrt{\pi} , 2^n , n! , \delta_{mn}$$

Two Conventions:

Type	Definition	Use Case
Physicist's $H_n(x)$	$(-1)^n e^{x^2} \frac{d^n}{dx^n} e^{-x^2}$	Quantum mechanics, physics
Probabilist's $\text{He}_n(x)$	$(-1)^n e^{x^2/2} \frac{d^n}{dx^n} e^{-x^2/2}$	Statistics, ML with Gaussian data

First Several (Probabilist's):

$n$	$\text{He}_n(x)$
0	$1$
1	$x$
2	$x^2 - 1$
3	$x^3 - 3x$
4	$x^4 - 6x^2 + 3$

Why Hermite Polynomials for Gaussian Data:

When your data $x$ is approximately Gaussian (normal), Hermite polynomials are the natural choice because:

The weight function $e^{-x^2}$ matches the Gaussian density
The polynomials are orthogonal with respect to the data distribution
Coefficients represent independent 'moments' of the relationship
Particularly effective for regression on standardized (z-scored) features

In machine learning, this connects to Gaussian bases used in kernel methods and spectral features.

hermite_regression.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import numpy as np
 
class ProbabilistHermiteRegression:
    """
    Polynomial regression using probabilist's Hermite polynomial basis.
    
    Optimal when input features are approximately Gaussian distributed.
    """
    
    def __init__(self, degree: int):
        self.degree = degree
        self.coefficients = None
        self.x_mean = None
        self.x_std = None
    
    def _standardize(self, x: np.ndarray) -> np.ndarray:
        """Standardize to mean 0, std 1 (appropriate for Hermite)."""
        return (x - self.x_mean) / self.x_std
    
    def _hermite_basis(self, z: np.ndarray) -> np.ndarray:
        """
        Evaluate probabilist's Hermite polynomials.
        
        Recurrence: He_{n+1}(x) = x * He_n(x) - n * He_{n-1}(x)
        """
        n = len(z)
        He = np.zeros((n, self.degree + 1))
        
        He[:, 0] = 1.0
        if self.degree >= 1:
            He[:, 1] = z
        
        for k in range(1, self.degree):
            He[:, k+1] = z * He[:, k] - k * He[:, k-1]
        
        return He
    
    def fit(self, x: np.ndarray, y: np.ndarray) -> 'ProbabilistHermiteRegression':
        """Fit Hermite polynomial regression."""
        self.x_mean = np.mean(x)
        self.x_std = np.std(x)
        if self.x_std < 1e-10:
            self.x_std = 1.0
        
        z = self._standardize(x)
        He = self._hermite_basis(z)
        
        self.coefficients = np.linalg.lstsq(He, y, rcond=None)[0]
        return self
    
    def predict(self, x: np.ndarray) -> np.ndarray:
        """Predict at new points."""
        z = self._standardize(x)
        He = self._hermite_basis(z)
        return He @ self.coefficients
 
# Compare Hermite vs Legendre for Gaussian data
np.random.seed(42)
 
# Gaussian-distributed x (where Hermite shines)
x_gaussian = np.random.normal(0, 1, 100)
y_true = x_gaussian**2 - 0.5 * x_gaussian**3 + 2
y = y_true + np.random.normal(0, 0.3, len(x_gaussian))
 
# Hold out test data
x_test = np.random.normal(0, 1, 50)
y_test = x_test**2 - 0.5 * x_test**3 + 2 + np.random.normal(0, 0.3, len(x_test))
 
print("Comparison on Gaussian-Distributed Data")
print("=" * 60)
 
for degree in [3, 5, 8]:
    # Hermite regression
    model_hermite = ProbabilistHermiteRegression(degree).fit(x_gaussian, y)
    y_pred_hermite = model_hermite.predict(x_test)
    mse_hermite = np.mean((y_test - y_pred_hermite)**2)
    
    # Legendre regression (for comparison)
    from scipy.special import legendre
    
    # Need to transform to [-1, 1] for Legendre
    x_min, x_max = x_gaussian.min(), x_gaussian.max()
    z_train = 2 * (x_gaussian - x_min) / (x_max - x_min) - 1
    z_test = np.clip(2 * (x_test - x_min) / (x_max - x_min) - 1, -1, 1)
    
    Phi_leg_train = np.column_stack([np.polyval(legendre(k), z_train) 
                                      for k in range(degree + 1)])
    Phi_leg_test = np.column_stack([np.polyval(legendre(k), z_test) 
                                     for k in range(degree + 1)])
    
    beta_leg = np.linalg.lstsq(Phi_leg_train, y, rcond=None)[0]
    y_pred_leg = Phi_leg_test @ beta_leg
    mse_legendre = np.mean((y_test - y_pred_leg)**2)
    
    print(f"Degree {degree}: Hermite MSE = {mse_hermite:.4f}, "
          f"Legendre MSE = {mse_legendre:.4f}")

Choosing the Right Orthogonal Family

Practical Implementation Guide

Using Orthogonal Polynomials in Practice:

Most scientific computing libraries provide implementations of orthogonal polynomials. Here's how to use them effectively:

practical_orthogonal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
import numpy as np
from numpy.polynomial import legendre, chebyshev, hermite
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
 
def orthogonal_polynomial_regression(x: np.ndarray, y: np.ndarray,
                                       degree: int, 
                                       family: str = 'legendre',
                                       regularization: float = 0.0) -> dict:
    """
    Production-ready orthogonal polynomial regression.
    
    Parameters:
        x: Input features (1D)
        y: Targets
        degree: Maximum polynomial degree
        family: 'legendre', 'chebyshev', or 'hermite'
        regularization: Ridge regularization strength
        
    Returns:
        Dictionary with fitted model information
    """
    # Choose family and appropriate transform
    if family == 'legendre':
        # Transform to [-1, 1]
        x_min, x_max = np.min(x), np.max(x)
        z = 2 * (x - x_min) / (x_max - x_min) - 1
        
        # Build basis using NumPy's legendre module
        # legendre.legvander creates the Vandermonde matrix for Legendre polys
        Phi = legendre.legvander(z, degree)
        
    elif family == 'chebyshev':
        # Transform to [-1, 1]
        x_min, x_max = np.min(x), np.max(x)
        z = 2 * (x - x_min) / (x_max - x_min) - 1
        
        # Chebyshev Vandermonde matrix
        Phi = chebyshev.chebvander(z, degree)
        
    elif family == 'hermite':
        # Standardize (mean 0, std 1) for Hermite
        x_mean, x_std = np.mean(x), np.std(x)
        if x_std < 1e-10: x_std = 1.0
        z = (x - x_mean) / x_std
        
        # Probabilist's Hermite (hermite_e in NumPy)
        from numpy.polynomial.hermite_e import hermevander
        Phi = hermevander(z, degree)
        x_min, x_max = x_mean, x_std  # Store for transform
        
    else:
        raise ValueError(f"Unknown family: {family}")
    
    # Fit with optional regularization
    if regularization > 0:
        reg_matrix = regularization * np.eye(degree + 1)
        reg_matrix[0, 0] = 0  # Don't regularize intercept
        A = Phi.T @ Phi + reg_matrix
        b = Phi.T @ y
        coefficients = np.linalg.solve(A, b)
    else:
        coefficients = np.linalg.lstsq(Phi, y, rcond=None)[0]
    
    # Compute fit quality
    y_pred = Phi @ coefficients
    rss = np.sum((y - y_pred)**2)
    tss = np.sum((y - np.mean(y))**2)
    r_squared = 1 - rss / tss
    
    return {
        'family': family,
        'degree': degree,
        'coefficients': coefficients,
        'transform_params': (x_min, x_max),
        'r_squared': r_squared,
        'condition_number': np.linalg.cond(Phi)
    }
 
# Production example with real-world-like data
np.random.seed(42)
 
# Simulate some experimental data (e.g., dose-response curve)
dose = np.linspace(0.1, 10, 50)  # Drug dose
# True relationship: Hill equation (sigmoidal) approximated by polynomial
response_true = 100 * dose**2 / (5**2 + dose**2)
response = response_true + np.random.normal(0, 3, len(dose))
 
# Compare different orthogonal families
print("Orthogonal Polynomial Regression Comparison")
print("Dataset: Simulated dose-response curve")
print("=" * 70)
print(f"{'Family':>12} | {'Degree':>6} | {'R²':>8} | {'Cond #':>12} | {'Status':>10}")
print("-" * 70)
 
for family in ['legendre', 'chebyshev']:
    for degree in [3, 5, 7]:
        result = orthogonal_polynomial_regression(
            dose, response, degree, family, regularization=0.01
        )
        
        status = "Good" if result['condition_number'] < 1e8 else "Warning"
        
        print(f"{result['family']:>12} | {result['degree']:>6} | "
              f"{result['r_squared']:>8.4f} | "
              f"{result['condition_number']:>12.2e} | {status:>10}")

Best Practices Summary

•Always transform to native interval before computing orthogonal polynomials
•Use library implementations (NumPy, SciPy) for numerical stability
•Check condition number after building basis—should be O(1) to O(100)
•Combine with regularization for additional robustness
•Choose family by data distribution: Legendre for uniform, Hermite for Gaussian

Summary: Orthogonal Polynomials

We've explored orthogonal polynomial bases as the numerically superior alternative to monomials. Let's consolidate:

Key Takeaways

•Orthogonal polynomials are uncorrelated basis functions, leading to well-conditioned design matrices and stable coefficient estimates.
•Three-term recurrence allows efficient, stable computation without directly evaluating high powers.
•Legendre polynomials are the general-purpose choice for finite intervals with roughly uniform data.
•Chebyshev polynomials minimize maximum error (minimax) and avoid boundary oscillations.
•Hermite polynomials are optimal for Gaussian-distributed data on unbounded domains.
•Domain transformation is essential: always map data to the polynomial family's native interval.

What's Next:

Page Complete