Loading content...
The matrix inverse A⁻¹ only exists for square, non-singular matrices. But in machine learning, we routinely encounter:
The Moore-Penrose pseudoinverse A⁺ generalizes the inverse to all matrices. It provides:
For full-rank matrices: A⁺ = (AᵀA)⁻¹Aᵀ (left inverse) or A⁺ = Aᵀ(AAᵀ)⁻¹ (right inverse). For general matrices, the SVD provides the definition.
By the end of this page, you will understand the pseudoinverse definition via SVD, its four defining properties (Moore-Penrose conditions), when and why it gives the minimum-norm least squares solution, and its computational implementation.
The Singular Value Decomposition of any m×n matrix A is:
A = UΣVᵀ
where:
The pseudoinverse A⁺ is defined as:
A⁺ = VΣ⁺Uᵀ
where Σ⁺ is the n×m matrix with entries:
1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as npfrom numpy.linalg import svd, pinv, norm def compute_pseudoinverse_svd(A: np.ndarray, tol: float = 1e-10) -> np.ndarray: """ Compute the pseudoinverse A⁺ using SVD. A = UΣVᵀ => A⁺ = VΣ⁺Uᵀ """ U, sigma, Vt = svd(A, full_matrices=False) # Invert non-zero singular values sigma_inv = np.array([1/s if s > tol else 0 for s in sigma]) # Construct Σ⁺ (diagonal matrix) Sigma_pinv = np.diag(sigma_inv) # A⁺ = VΣ⁺Uᵀ A_pinv = Vt.T @ Sigma_pinv @ U.T return A_pinv # Example: Overdetermined full-rank (m > n)A1 = np.array([[1, 2], [3, 4], [5, 6]])print("=== Overdetermined System ===")print(f"A (3×2):\n{A1}")A1_pinv = compute_pseudoinverse_svd(A1)print(f"A⁺ (2×3):\n{np.round(A1_pinv, 4)}")print(f"NumPy pinv:\n{np.round(pinv(A1), 4)}")print(f"Match: {np.allclose(A1_pinv, pinv(A1))}") # Verify: A⁺ = (AᵀA)⁻¹Aᵀ for full column rankA1_pinv_formula = np.linalg.inv(A1.T @ A1) @ A1.Tprint(f"(AᵀA)⁻¹Aᵀ:\n{np.round(A1_pinv_formula, 4)}") # Example: Underdetermined (m < n)print("\n=== Underdetermined System ===")A2 = np.array([[1, 2, 3], [4, 5, 6]])print(f"A (2×3):\n{A2}")A2_pinv = compute_pseudoinverse_svd(A2)print(f"A⁺ (3×2):\n{np.round(A2_pinv, 4)}")The pseudoinverse A⁺ is uniquely characterized by four conditions:
These conditions are necessary and sufficient. Any matrix X satisfying all four equals A⁺.
Interpretation:
12345678910111213141516171819202122232425262728293031323334353637383940414243
import numpy as npfrom numpy.linalg import pinv def verify_moore_penrose(A: np.ndarray, tol: float = 1e-10) -> dict: """Verify all four Moore-Penrose conditions for A⁺.""" A_pinv = pinv(A) results = {} # Condition 1: AA⁺A = A cond1 = A @ A_pinv @ A results['cond1'] = np.allclose(cond1, A, atol=tol) # Condition 2: A⁺AA⁺ = A⁺ cond2 = A_pinv @ A @ A_pinv results['cond2'] = np.allclose(cond2, A_pinv, atol=tol) # Condition 3: (AA⁺)ᵀ = AA⁺ AA_pinv = A @ A_pinv results['cond3'] = np.allclose(AA_pinv.T, AA_pinv, atol=tol) # Condition 4: (A⁺A)ᵀ = A⁺A A_pinvA = A_pinv @ A results['cond4'] = np.allclose(A_pinvA.T, A_pinvA, atol=tol) print("=== Moore-Penrose Conditions ===") print(f"1. AA⁺A = A: {results['cond1']}") print(f"2. A⁺AA⁺ = A⁺: {results['cond2']}") print(f"3. (AA⁺)ᵀ = AA⁺: {results['cond3']}") print(f"4. (A⁺A)ᵀ = A⁺A: {results['cond4']}") return results # Test with various matrix typesA = np.array([[1, 2], [3, 4], [5, 6]])verify_moore_penrose(A) print("\n=== Projection Properties ===")A_pinv = pinv(A)P_col = A @ A_pinv # Projects onto Col(A)P_row = A_pinv @ A # Projects onto Row(A)print(f"AA⁺ (proj onto Col(A)):\n{np.round(P_col, 4)}")print(f"A⁺A (proj onto Row(A)):\n{np.round(P_row, 4)}")For the system Ax = b, the pseudoinverse gives:
x̂ = A⁺b
This solution has a profound property:
A⁺b is the minimum-norm vector that minimizes ||Ax - b||
Two optimality conditions in one:
Cases:
| Type | Dimensions | Solution | Property |
|---|---|---|---|
| Full rank, square | m = n = r | A⁺ = A⁻¹ | Exact inverse |
| Full column rank | m > n = r | A⁺ = (AᵀA)⁻¹Aᵀ | Left inverse, least squares |
| Full row rank | m = r < n | A⁺ = Aᵀ(AAᵀ)⁻¹ | Right inverse, min norm |
| Rank deficient | r < min(m,n) | A⁺ via SVD | Min norm least squares |
Why minimum norm?
When a system is underdetermined or rank-deficient, infinitely many solutions exist. The pseudoinverse selects the solution with smallest Euclidean norm—the one closest to the origin.
Geometric intuition:
The solution set is an affine subspace (plane, line, etc.). The minimum-norm solution is the point on this subspace nearest to the origin—found by orthogonal projection of 0 onto the affine subspace.
Connection to regularization:
Minimum-norm solutions connect to regularization. As λ → 0 in ridge regression:
(AᵀA + λI)⁻¹Aᵀb → A⁺b
The pseudoinverse is the limit of regularized solutions—no explicit regularization, but inherent preference for small coefficients.
1234567891011121314151617181920212223242526272829303132333435363738
import numpy as npfrom numpy.linalg import pinv, norm, lstsq def demonstrate_minimum_norm(): """ Show that A⁺b gives minimum-norm solution for underdetermined systems. """ # Underdetermined: 2 equations, 4 unknowns A = np.array([[1, 2, 1, 0], [0, 1, 2, 1]]) b = np.array([4, 3]) # Pseudoinverse solution x_pinv = pinv(A) @ b print("=== Minimum Norm Property ===") print(f"A (2×4):\n{A}") print(f"b: {b}") print(f"\nPseudoinverse solution x⁺ = A⁺b: {np.round(x_pinv, 4)}") print(f"||x⁺|| = {norm(x_pinv):.4f}") print(f"Ax⁺ = {A @ x_pinv} (should equal b)") # Compare with other solutions # x = x_pinv + null space component # Find null space of A from scipy.linalg import null_space N = null_space(A) # Columns span null space print(f"\nNull space dimension: {N.shape[1]}") # Try adding null space components print("\nComparing with other solutions:") for alpha in [1, -1, 2]: x_other = x_pinv + alpha * N[:, 0] # Add null space component print(f" x = x⁺ + {alpha}*n: {np.round(x_other, 4)}") print(f" ||x|| = {norm(x_other):.4f} >= {norm(x_pinv):.4f}") print(f" Ax = {np.round(A @ x_other, 4)}") demonstrate_minimum_norm()Computing A⁺ in practice:
NumPy: np.linalg.pinv(A) — uses SVD with default tolerance
SVD-based: Compute SVD, invert singular values above threshold
lstsq: np.linalg.lstsq(A, b) — solves least squares without explicit A⁺
Tolerance for small singular values:
Numerically, singular values may be tiny but non-zero due to floating-point errors. The pseudoinverse treats σᵢ < ε as zero:
SVD is the gold standard for numerical stability in rank-deficient cases.
Never compute A⁺ via (AᵀA)⁻¹Aᵀ for rank-deficient matrices. AᵀA will be singular, and numerical errors can cause catastrophic results. Always use SVD-based computation (pinv) for safety.
Connection to regularization:
The pseudoinverse relates to ridge regression:
Pseudoinverse gives the solution that regularization methods converge to when regularization strength approaches zero.
Congratulations! You've mastered Projections and Least Squares. You understand orthogonal projection geometrically and algebraically, projection matrices and their properties, normal equations, the geometric view of least squares, and the pseudoinverse for general solutions. This foundation is essential for linear regression, PCA, and optimization throughout ML.