Loading content...
Throughout this module, we've used the monomial basis $\{1, x, x^2, \ldots, x^d\}$ for polynomial regression. This is the most intuitive choice—but it's often the worst choice numerically.
The problem is that high powers of $x$ become increasingly correlated:
Orthogonal polynomials solve these problems by construction. They're designed to be uncorrelated (orthogonal) with respect to a specific inner product, leading to perfectly conditioned matrices and numerically stable computations.
By the end of this page, you will understand: (1) The mathematical concept of orthogonality for polynomials; (2) The three-term recurrence relation for generating orthogonal polynomials; (3) Properties and applications of Legendre, Chebyshev, and Hermite polynomials; (4) How to use orthogonal polynomials for numerically stable regression; (5) The connection between orthogonal polynomials and decorrelated regression.
Inner Products for Functions:
Just as vectors can be orthogonal ($\mathbf{u} \cdot \mathbf{v} = 0$), functions can be orthogonal with respect to an inner product. For functions $f(x)$ and $g(x)$ on interval $[a, b]$ with weight function $w(x) \geq 0$:
$$\langle f, g \rangle = \int_a^b f(x) g(x) w(x) , dx$$
Functions are orthogonal if $\langle f, g \rangle = 0$.
For Polynomials:
A sequence of polynomials $\{P_0(x), P_1(x), P_2(x), \ldots\}$ is orthogonal if: $$\langle P_m, P_n \rangle = \int_a^b P_m(x) P_n(x) w(x) , dx = 0 \quad \text{for } m eq n$$
Different choices of interval $[a, b]$ and weight $w(x)$ give different orthogonal polynomial families.
| Family | Interval | Weight $w(x)$ | Primary Use |
|---|---|---|---|
| Legendre $P_n(x)$ | $[-1, 1]$ | $1$ | General-purpose, uniform data |
| Chebyshev $T_n(x)$ | $[-1, 1]$ | $1/\sqrt{1-x^2}$ | Optimal approximation, minimizing max error |
| Chebyshev $U_n(x)$ | $[-1, 1]$ | $\sqrt{1-x^2}$ | Second kind, integration |
| Hermite $H_n(x)$ | $(-\infty, \infty)$ | $e^{-x^2}$ | Gaussian-distributed data |
| Laguerre $L_n(x)$ | $[0, \infty)$ | $e^{-x}$ | Exponentially decaying data |
Why Orthogonality Matters for Regression:
When we expand a function in an orthogonal basis, the coefficients are uncorrelated. If $f(x) = \sum_{k=0}^{d} c_k P_k(x)$, then:
$$c_k = \frac{\langle f, P_k \rangle}{\langle P_k, P_k \rangle}$$
Each coefficient can be computed independently! In matrix terms, if $\mathbf{\Phi}$ contains orthogonal polynomials evaluated at data points, then $\mathbf{\Phi}^T \mathbf{\Phi}$ is (approximately) diagonal—perfectly conditioned.
With orthogonal polynomials, adding a new term (higher degree) doesn't change the coefficients of existing terms! This is because the new term is orthogonal to all previous terms. With monomials, adding $x^5$ changes all coefficients $\beta_0, \ldots, \beta_4$. With orthogonal polynomials, they remain stable.
A Unifying Structure:
All classical orthogonal polynomial families satisfy a three-term recurrence relation:
$$P_{n+1}(x) = (A_n x + B_n) P_n(x) - C_n P_{n-1}(x)$$
with initial conditions $P_0(x) = 1$ and $P_1(x) = A_0 x + B_0$.
The coefficients $A_n$, $B_n$, $C_n$ are specific to each polynomial family. This recurrence provides:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
import numpy as npfrom typing import Callable, Tuple def legendre_recurrence(x: np.ndarray, n_max: int) -> np.ndarray: """ Generate Legendre polynomials P_0(x), P_1(x), ..., P_n(x) using recurrence. Recurrence: (n+1) P_{n+1}(x) = (2n+1) x P_n(x) - n P_{n-1}(x) Parameters: x: Points to evaluate at (array) n_max: Maximum polynomial degree Returns: Array of shape (len(x), n_max+1) where column k is P_k(x) """ x = np.asarray(x) n_points = len(x) P = np.zeros((n_points, n_max + 1)) # Initial conditions P[:, 0] = 1.0 # P_0(x) = 1 if n_max >= 1: P[:, 1] = x # P_1(x) = x # Three-term recurrence for n in range(1, n_max): # (n+1) P_{n+1} = (2n+1) x P_n - n P_{n-1} P[:, n+1] = ((2*n + 1) * x * P[:, n] - n * P[:, n-1]) / (n + 1) return P def chebyshev_recurrence(x: np.ndarray, n_max: int) -> np.ndarray: """ Generate Chebyshev polynomials T_0(x), T_1(x), ..., T_n(x) using recurrence. Recurrence: T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x) These are the 'first kind' Chebyshev polynomials. """ x = np.asarray(x) n_points = len(x) T = np.zeros((n_points, n_max + 1)) # Initial conditions T[:, 0] = 1.0 # T_0(x) = 1 if n_max >= 1: T[:, 1] = x # T_1(x) = x # Three-term recurrence for n in range(1, n_max): T[:, n+1] = 2 * x * T[:, n] - T[:, n-1] return T def hermite_recurrence(x: np.ndarray, n_max: int, physicist: bool = True) -> np.ndarray: """ Generate Hermite polynomials using recurrence. Physicist's Hermite: H_{n+1}(x) = 2x H_n(x) - 2n H_{n-1}(x) Probabilist's: He_{n+1}(x) = x He_n(x) - n He_{n-1}(x) """ x = np.asarray(x) n_points = len(x) H = np.zeros((n_points, n_max + 1)) # Initial conditions H[:, 0] = 1.0 if n_max >= 1: H[:, 1] = 2 * x if physicist else x # Three-term recurrence for n in range(1, n_max): if physicist: H[:, n+1] = 2 * x * H[:, n] - 2 * n * H[:, n-1] else: H[:, n+1] = x * H[:, n] - n * H[:, n-1] return H # Demonstrate condition number improvementdef compare_conditioning(x: np.ndarray, degrees: list): """ Compare condition numbers of design matrices: monomials vs Legendre polynomials. """ print("Condition Number Comparison: Monomials vs Legendre") print("=" * 60) print(f"{'Degree':>8} | {'Monomials':>15} | {'Legendre':>15} | {'Improvement':>12}") print("-" * 60) for d in degrees: # Monomial basis Phi_mono = np.column_stack([x**k for k in range(d + 1)]) cond_mono = np.linalg.cond(Phi_mono) # Legendre basis Phi_leg = legendre_recurrence(x, d) cond_leg = np.linalg.cond(Phi_leg) improvement = cond_mono / cond_leg print(f"{d:>8} | {cond_mono:>15.2e} | {cond_leg:>15.2e} | {improvement:>12.1f}x") # Test with data in [-1, 1]np.random.seed(42)x = np.random.uniform(-1, 1, 100)compare_conditioning(x, degrees=[2, 5, 10, 15, 20])Orthogonal polynomials are orthogonal only on their native interval. Legendre polynomials require $x \in [-1, 1]$. For data in $[a, b]$, you must transform: $z = 2(x - a)/(b - a) - 1$. Using Legendre polynomials outside $[-1, 1]$ destroys their orthogonality and conditioning benefits!
Definition and Properties:
Legendre polynomials $P_n(x)$ are orthogonal on $[-1, 1]$ with weight $w(x) = 1$:
$$\int_{-1}^{1} P_m(x) P_n(x) , dx = \frac{2}{2n+1} \delta_{mn}$$
where $\delta_{mn}$ is the Kronecker delta (1 if $m=n$, 0 otherwise).
First Several Legendre Polynomials:
| $n$ | $P_n(x)$ |
|---|---|
| 0 | $1$ |
| 1 | $x$ |
| 2 | $\frac{1}{2}(3x^2 - 1)$ |
| 3 | $\frac{1}{2}(5x^3 - 3x)$ |
| 4 | $\frac{1}{8}(35x^4 - 30x^2 + 3)$ |
Key Properties:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
import numpy as npfrom typing import Tuple class LegendreRegression: """ Polynomial regression using Legendre polynomial basis. Provides numerically stable regression with uncorrelated coefficients. """ def __init__(self, degree: int): self.degree = degree self.coefficients = None self.x_min = None self.x_max = None def _transform_to_standard(self, x: np.ndarray) -> np.ndarray: """Transform x from [x_min, x_max] to [-1, 1].""" return 2 * (x - self.x_min) / (self.x_max - self.x_min) - 1 def _legendre_basis(self, z: np.ndarray) -> np.ndarray: """Evaluate Legendre polynomials at points z in [-1, 1].""" n = len(z) P = np.zeros((n, self.degree + 1)) P[:, 0] = 1.0 if self.degree >= 1: P[:, 1] = z for k in range(1, self.degree): P[:, k+1] = ((2*k + 1) * z * P[:, k] - k * P[:, k-1]) / (k + 1) return P def fit(self, x: np.ndarray, y: np.ndarray) -> 'LegendreRegression': """ Fit Legendre polynomial regression. """ self.x_min = np.min(x) self.x_max = np.max(x) # Transform to [-1, 1] z = self._transform_to_standard(x) # Build Legendre basis matrix P = self._legendre_basis(z) # OLS fit (now numerically stable!) self.coefficients = np.linalg.lstsq(P, y, rcond=None)[0] return self def predict(self, x: np.ndarray) -> np.ndarray: """Predict at new points.""" z = self._transform_to_standard(x) P = self._legendre_basis(z) return P @ self.coefficients def get_coefficient_interpretation(self) -> dict: """ Interpret Legendre coefficients. Unlike monomial coefficients, Legendre coefficients have nice properties: - c_0: Mean of the fitted function - c_1: Linear trend (slope-like) - c_2: Quadratic curvature - Higher terms: Higher-order oscillations """ interpretation = {} for i, c in enumerate(self.coefficients): if i == 0: desc = "Mean/Intercept" elif i == 1: desc = "Linear trend" elif i == 2: desc = "Quadratic curvature" else: desc = f"Degree-{i} oscillation" interpretation[f"c_{i}"] = { 'value': c, 'description': desc, 'magnitude': np.abs(c) } return interpretation # Demonstration: Compare monomial vs Legendrenp.random.seed(42)x = np.linspace(0, 10, 50)y_true = np.sin(x * 0.8) + 0.3 * xy = y_true + np.random.normal(0, 0.3, len(x)) print("Comparison: Monomial vs Legendre Basis")print("=" * 60) for degree in [5, 10, 15]: # Monomial fit x_scaled = (x - 5) / 5 # Scale to approx [-1, 1] Phi_mono = np.column_stack([x_scaled**k for k in range(degree + 1)]) try: beta_mono = np.linalg.lstsq(Phi_mono, y, rcond=None)[0] cond_mono = np.linalg.cond(Phi_mono) coef_norm_mono = np.linalg.norm(beta_mono) except: cond_mono = np.inf coef_norm_mono = np.inf # Legendre fit model_leg = LegendreRegression(degree).fit(x, y) P_leg = model_leg._legendre_basis(model_leg._transform_to_standard(x)) cond_leg = np.linalg.cond(P_leg) coef_norm_leg = np.linalg.norm(model_leg.coefficients) print(f"Degree {degree}:") print(f" Condition number: Mono={cond_mono:.2e}, Legendre={cond_leg:.2e}") print(f" Coefficient norm: Mono={coef_norm_mono:.2f}, Legendre={coef_norm_leg:.2f}")Legendre polynomials are the default choice for general-purpose orthogonal polynomial regression when: (1) data can be rescaled to $[-1, 1]$, (2) data is roughly uniformly distributed or you're treating all regions equally, (3) you want simple, interpretable coefficients. For non-uniform data distributions, consider other families.
Definition and Properties:
Chebyshev polynomials of the first kind $T_n(x)$ are defined by:
$$T_n(x) = \cos(n \cdot \arccos(x))$$
They're orthogonal on $[-1, 1]$ with weight $w(x) = 1/\sqrt{1-x^2}$:
$$\int_{-1}^{1} \frac{T_m(x) T_n(x)}{\sqrt{1-x^2}} , dx = \begin{cases} 0 & m eq n \\ \pi & m = n = 0 \\ \pi/2 & m = n eq 0 \end{cases}$$
First Several Chebyshev Polynomials:
| $n$ | $T_n(x)$ |
|---|---|
| 0 | $1$ |
| 1 | $x$ |
| 2 | $2x^2 - 1$ |
| 3 | $4x^3 - 3x$ |
| 4 | $8x^4 - 8x^2 + 1$ |
The Minimax Property:
Chebyshev polynomials have a remarkable optimality property: among all degree-$n$ polynomials with leading coefficient $2^{n-1}$, $T_n(x)$ has the smallest maximum absolute value on $[-1, 1]$.
$$\max_{x \in [-1,1]} |T_n(x)| = 1$$
Why This Matters for Regression:
When approximating a function $f(x)$ with a polynomial, using Chebyshev polynomials:
This makes Chebyshev polynomials ideal when you need uniform accuracy across the entire domain.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
import numpy as np class ChebyshevRegression: """ Polynomial regression using Chebyshev polynomial basis. Optimal for minimax (uniform) approximation quality. """ def __init__(self, degree: int): self.degree = degree self.coefficients = None self.x_min = None self.x_max = None def _transform_to_standard(self, x: np.ndarray) -> np.ndarray: """Transform x from [x_min, x_max] to [-1, 1].""" return 2 * (x - self.x_min) / (self.x_max - self.x_min) - 1 def _chebyshev_basis(self, z: np.ndarray) -> np.ndarray: """Evaluate Chebyshev polynomials at points z in [-1, 1].""" n = len(z) T = np.zeros((n, self.degree + 1)) T[:, 0] = 1.0 if self.degree >= 1: T[:, 1] = z for k in range(1, self.degree): T[:, k+1] = 2 * z * T[:, k] - T[:, k-1] return T def fit(self, x: np.ndarray, y: np.ndarray) -> 'ChebyshevRegression': """Fit Chebyshev polynomial regression.""" self.x_min = np.min(x) self.x_max = np.max(x) z = self._transform_to_standard(x) T = self._chebyshev_basis(z) self.coefficients = np.linalg.lstsq(T, y, rcond=None)[0] return self def predict(self, x: np.ndarray) -> np.ndarray: """Predict at new points.""" z = self._transform_to_standard(x) T = self._chebyshev_basis(z) return T @ self.coefficients def chebyshev_nodes(n: int, a: float = -1, b: float = 1) -> np.ndarray: """ Generate Chebyshev nodes for optimal polynomial interpolation. These nodes minimize interpolation error (avoid Runge's phenomenon). Nodes are: x_k = cos((2k+1)π / (2n)), k = 0, 1, ..., n-1 """ k = np.arange(n) nodes = np.cos((2 * k + 1) * np.pi / (2 * n)) # Transform from [-1, 1] to [a, b] return 0.5 * ((b - a) * nodes + (a + b)) # Demonstration: Runge's phenomenon and Chebyshev nodesdef demonstrate_runge_phenomenon(): """ Show how Chebyshev nodes avoid Runge's phenomenon. """ # The classic Runge function runge = lambda x: 1 / (1 + 25 * x**2) print("Runge's Phenomenon: Uniform vs Chebyshev Nodes") print("=" * 60) print("f(x) = 1 / (1 + 25x²) on [-1, 1]") print("-" * 60) # Dense grid for error evaluation x_eval = np.linspace(-1, 1, 500) y_true = runge(x_eval) degrees = [5, 10, 15, 20] print(f"{'Degree':>8} | {'Max Err (Uniform)':>18} | {'Max Err (Chebyshev)':>20}") print("-" * 60) for deg in degrees: # Uniform nodes x_uniform = np.linspace(-1, 1, deg + 1) y_uniform = runge(x_uniform) Phi_uniform = np.column_stack([x_uniform**k for k in range(deg + 1)]) beta_uniform = np.linalg.lstsq(Phi_uniform, y_uniform, rcond=None)[0] Phi_eval = np.column_stack([x_eval**k for k in range(deg + 1)]) y_pred_uniform = Phi_eval @ beta_uniform max_err_uniform = np.max(np.abs(y_true - y_pred_uniform)) # Chebyshev nodes x_cheb = chebyshev_nodes(deg + 1, -1, 1) y_cheb = runge(x_cheb) Phi_cheb = np.column_stack([x_cheb**k for k in range(deg + 1)]) beta_cheb = np.linalg.lstsq(Phi_cheb, y_cheb, rcond=None)[0] Phi_eval_cheb = np.column_stack([x_eval**k for k in range(deg + 1)]) y_pred_cheb = Phi_eval_cheb @ beta_cheb max_err_cheb = np.max(np.abs(y_true - y_pred_cheb)) improvement = max_err_uniform / max_err_cheb if max_err_cheb > 0 else float('inf') print(f"{deg:>8} | {max_err_uniform:>18.6f} | {max_err_cheb:>20.6f}") print("Note: Uniform nodes cause error to EXPLODE at high degrees") print(" Chebyshev nodes keep error bounded and decreasing") demonstrate_runge_phenomenon()When choosing where to sample a function for polynomial interpolation, Chebyshev nodes (roots of $T_n(x)$) are optimal. They cluster near the boundaries of $[-1, 1]$, compensating for the polynomial's tendency to oscillate wildly there. This is the practical solution to Runge's phenomenon.
Definition and Properties:
Hermite polynomials are orthogonal on $(-\infty, \infty)$ with Gaussian weight $w(x) = e^{-x^2}$:
$$\int_{-\infty}^{\infty} H_m(x) H_n(x) e^{-x^2} , dx = \sqrt{\pi} , 2^n , n! , \delta_{mn}$$
Two Conventions:
| Type | Definition | Use Case |
|---|---|---|
| Physicist's $H_n(x)$ | $(-1)^n e^{x^2} \frac{d^n}{dx^n} e^{-x^2}$ | Quantum mechanics, physics |
| Probabilist's $\text{He}_n(x)$ | $(-1)^n e^{x^2/2} \frac{d^n}{dx^n} e^{-x^2/2}$ | Statistics, ML with Gaussian data |
First Several (Probabilist's):
| $n$ | $\text{He}_n(x)$ |
|---|---|
| 0 | $1$ |
| 1 | $x$ |
| 2 | $x^2 - 1$ |
| 3 | $x^3 - 3x$ |
| 4 | $x^4 - 6x^2 + 3$ |
Why Hermite Polynomials for Gaussian Data:
When your data $x$ is approximately Gaussian (normal), Hermite polynomials are the natural choice because:
In machine learning, this connects to Gaussian bases used in kernel methods and spectral features.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
import numpy as np class ProbabilistHermiteRegression: """ Polynomial regression using probabilist's Hermite polynomial basis. Optimal when input features are approximately Gaussian distributed. """ def __init__(self, degree: int): self.degree = degree self.coefficients = None self.x_mean = None self.x_std = None def _standardize(self, x: np.ndarray) -> np.ndarray: """Standardize to mean 0, std 1 (appropriate for Hermite).""" return (x - self.x_mean) / self.x_std def _hermite_basis(self, z: np.ndarray) -> np.ndarray: """ Evaluate probabilist's Hermite polynomials. Recurrence: He_{n+1}(x) = x * He_n(x) - n * He_{n-1}(x) """ n = len(z) He = np.zeros((n, self.degree + 1)) He[:, 0] = 1.0 if self.degree >= 1: He[:, 1] = z for k in range(1, self.degree): He[:, k+1] = z * He[:, k] - k * He[:, k-1] return He def fit(self, x: np.ndarray, y: np.ndarray) -> 'ProbabilistHermiteRegression': """Fit Hermite polynomial regression.""" self.x_mean = np.mean(x) self.x_std = np.std(x) if self.x_std < 1e-10: self.x_std = 1.0 z = self._standardize(x) He = self._hermite_basis(z) self.coefficients = np.linalg.lstsq(He, y, rcond=None)[0] return self def predict(self, x: np.ndarray) -> np.ndarray: """Predict at new points.""" z = self._standardize(x) He = self._hermite_basis(z) return He @ self.coefficients # Compare Hermite vs Legendre for Gaussian datanp.random.seed(42) # Gaussian-distributed x (where Hermite shines)x_gaussian = np.random.normal(0, 1, 100)y_true = x_gaussian**2 - 0.5 * x_gaussian**3 + 2y = y_true + np.random.normal(0, 0.3, len(x_gaussian)) # Hold out test datax_test = np.random.normal(0, 1, 50)y_test = x_test**2 - 0.5 * x_test**3 + 2 + np.random.normal(0, 0.3, len(x_test)) print("Comparison on Gaussian-Distributed Data")print("=" * 60) for degree in [3, 5, 8]: # Hermite regression model_hermite = ProbabilistHermiteRegression(degree).fit(x_gaussian, y) y_pred_hermite = model_hermite.predict(x_test) mse_hermite = np.mean((y_test - y_pred_hermite)**2) # Legendre regression (for comparison) from scipy.special import legendre # Need to transform to [-1, 1] for Legendre x_min, x_max = x_gaussian.min(), x_gaussian.max() z_train = 2 * (x_gaussian - x_min) / (x_max - x_min) - 1 z_test = np.clip(2 * (x_test - x_min) / (x_max - x_min) - 1, -1, 1) Phi_leg_train = np.column_stack([np.polyval(legendre(k), z_train) for k in range(degree + 1)]) Phi_leg_test = np.column_stack([np.polyval(legendre(k), z_test) for k in range(degree + 1)]) beta_leg = np.linalg.lstsq(Phi_leg_train, y, rcond=None)[0] y_pred_leg = Phi_leg_test @ beta_leg mse_legendre = np.mean((y_test - y_pred_leg)**2) print(f"Degree {degree}: Hermite MSE = {mse_hermite:.4f}, " f"Legendre MSE = {mse_legendre:.4f}")Legendre: Uniform or bounded data on a finite interval. General-purpose default. Chebyshev: Need uniform accuracy across interval; sensitive to boundary errors. Hermite: Gaussian-distributed data; unbounded domains. Laguerre: Exponentially distributed data; $[0, \infty)$ domain.
Using Orthogonal Polynomials in Practice:
Most scientific computing libraries provide implementations of orthogonal polynomials. Here's how to use them effectively:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
import numpy as npfrom numpy.polynomial import legendre, chebyshev, hermitefrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn.linear_model import Ridge def orthogonal_polynomial_regression(x: np.ndarray, y: np.ndarray, degree: int, family: str = 'legendre', regularization: float = 0.0) -> dict: """ Production-ready orthogonal polynomial regression. Parameters: x: Input features (1D) y: Targets degree: Maximum polynomial degree family: 'legendre', 'chebyshev', or 'hermite' regularization: Ridge regularization strength Returns: Dictionary with fitted model information """ # Choose family and appropriate transform if family == 'legendre': # Transform to [-1, 1] x_min, x_max = np.min(x), np.max(x) z = 2 * (x - x_min) / (x_max - x_min) - 1 # Build basis using NumPy's legendre module # legendre.legvander creates the Vandermonde matrix for Legendre polys Phi = legendre.legvander(z, degree) elif family == 'chebyshev': # Transform to [-1, 1] x_min, x_max = np.min(x), np.max(x) z = 2 * (x - x_min) / (x_max - x_min) - 1 # Chebyshev Vandermonde matrix Phi = chebyshev.chebvander(z, degree) elif family == 'hermite': # Standardize (mean 0, std 1) for Hermite x_mean, x_std = np.mean(x), np.std(x) if x_std < 1e-10: x_std = 1.0 z = (x - x_mean) / x_std # Probabilist's Hermite (hermite_e in NumPy) from numpy.polynomial.hermite_e import hermevander Phi = hermevander(z, degree) x_min, x_max = x_mean, x_std # Store for transform else: raise ValueError(f"Unknown family: {family}") # Fit with optional regularization if regularization > 0: reg_matrix = regularization * np.eye(degree + 1) reg_matrix[0, 0] = 0 # Don't regularize intercept A = Phi.T @ Phi + reg_matrix b = Phi.T @ y coefficients = np.linalg.solve(A, b) else: coefficients = np.linalg.lstsq(Phi, y, rcond=None)[0] # Compute fit quality y_pred = Phi @ coefficients rss = np.sum((y - y_pred)**2) tss = np.sum((y - np.mean(y))**2) r_squared = 1 - rss / tss return { 'family': family, 'degree': degree, 'coefficients': coefficients, 'transform_params': (x_min, x_max), 'r_squared': r_squared, 'condition_number': np.linalg.cond(Phi) } # Production example with real-world-like datanp.random.seed(42) # Simulate some experimental data (e.g., dose-response curve)dose = np.linspace(0.1, 10, 50) # Drug dose# True relationship: Hill equation (sigmoidal) approximated by polynomialresponse_true = 100 * dose**2 / (5**2 + dose**2)response = response_true + np.random.normal(0, 3, len(dose)) # Compare different orthogonal familiesprint("Orthogonal Polynomial Regression Comparison")print("Dataset: Simulated dose-response curve")print("=" * 70)print(f"{'Family':>12} | {'Degree':>6} | {'R²':>8} | {'Cond #':>12} | {'Status':>10}")print("-" * 70) for family in ['legendre', 'chebyshev']: for degree in [3, 5, 7]: result = orthogonal_polynomial_regression( dose, response, degree, family, regularization=0.01 ) status = "Good" if result['condition_number'] < 1e8 else "Warning" print(f"{result['family']:>12} | {result['degree']:>6} | " f"{result['r_squared']:>8.4f} | " f"{result['condition_number']:>12.2e} | {status:>10}")We've explored orthogonal polynomial bases as the numerically superior alternative to monomials. Let's consolidate:
What's Next:
Orthogonal polynomials solve numerical issues but still produce global fits—each polynomial term affects the entire domain. The next page introduces piecewise polynomials and splines, which provide local control: changes in one region don't affect fits elsewhere. This leads to flexible, robust regression for complex patterns.
You now command orthogonal polynomial bases: their mathematical foundations, computational methods, and practical implementation. This knowledge elevates polynomial regression from a fragile technique to a robust, numerically sound methodology.