Loading learning content...
If a matrix $A$ represents a transformation that warps space in a particular way, the inverse matrix $A^{-1}$ represents the opposite transformation—the one that undoes everything $A$ did.
This simple idea has profound implications:
But not every transformation can be undone. If a 3D space is flattened to a 2D plane, information is lost—we can't reconstruct the original 3D point from its 2D projection. Invertibility is the property that separates transformations where nothing is lost from those where collapse occurs.
By the end of this page, you will understand what matrix inverses mean geometrically, when they exist, multiple methods for computing them, and why you should often AVOID explicitly computing inverses in practice.
Definition:
For a square matrix $A$, the inverse $A^{-1}$ (if it exists) is the unique matrix satisfying:
$$AA^{-1} = A^{-1}A = I$$
where $I$ is the identity matrix.
Geometric interpretation:
If $A$ represents a transformation (rotation, scaling, shearing, etc.), then $A^{-1}$ represents the exact opposite transformation:
Applying $A$ then $A^{-1}$ (or vice versa) returns you to where you started: $(A^{-1}A)\mathbf{x} = I\mathbf{x} = \mathbf{x}$.
$$A = \\begin{bmatrix} 2 & 0 \\\\ 0 & 4 \\end{bmatrix}$$
This scales x by 2 and y by 4.$$A^{-1} = \\begin{bmatrix} 1/2 & 0 \\\\ 0 & 1/4 \\end{bmatrix}$$
This scales x by 1/2 and y by 1/4, undoing the original scaling.For diagonal matrices, the inverse is simple: invert each diagonal entry. $AA^{-1} = \begin{bmatrix} 2 \cdot 1/2 & 0 \\ 0 & 4 \cdot 1/4 \end{bmatrix} = I$ ✓
A non-square matrix can't have an inverse in the standard sense. A 3×2 matrix maps from $\mathbb{R}^2$ to $\mathbb{R}^3$—we can't map back uniquely because 3D has more information than 2D. We can define 'left inverse' and 'right inverse' for non-square matrices, but they're not true inverses.
A square matrix $A$ is invertible (also called nonsingular or regular) if and only if ANY of the following equivalent conditions hold:
If ANY of these fails, ALL of them fail—the matrix is singular and has no inverse.
| Property | Invertible ($|A| eq 0$) | Singular ($|A| = 0$) |
|---|---|---|
| Rank | Full (= $n$) | Deficient (< $n$) |
| Null space | Trivial $\{\mathbf{0}\}$ | Non-trivial (dimension > 0) |
| $A\mathbf{x} = \mathbf{b}$ | Unique solution | Zero or infinitely many solutions |
| Geometrically | No dimension collapse | Collapses some dimension |
| Column space | All of $\mathbb{R}^n$ | Proper subspace |
| Eigenvalues | All nonzero | At least one is zero |
A matrix can be technically invertible but numerically unstable. If the determinant is $10^{-15}$, the inverse exists but computing it accurately is impossible in floating-point arithmetic. The condition number (ratio of largest to smallest singular value) measures this: $\kappa(A) > 10^{10}$ means trouble.
Understanding how inverses behave under various operations is essential for manipulating matrix expressions.
| Property | Formula | Proof Sketch |
|---|---|---|
| Inverse of inverse | $(A^{-1})^{-1} = A$ | $(A^{-1})^{-1} A^{-1} = I \Rightarrow $ multiply both sides by $A$ |
| Inverse of product | $(AB)^{-1} = B^{-1}A^{-1}$ | $(AB)(B^{-1}A^{-1}) = A(BB^{-1})A^{-1} = AIA^{-1} = I$ |
| Inverse of transpose | $(A^T)^{-1} = (A^{-1})^T$ | $(A^T)((A^{-1})^T) = (A^{-1}A)^T = I^T = I$ |
| Inverse of scalar mult | $(cA)^{-1} = \frac{1}{c}A^{-1}$ | $(cA)(\frac{1}{c}A^{-1}) = \frac{c}{c}AA^{-1} = I$ |
| Determinant of inverse | $|A^{-1}| = 1/|A|$ | $|A||A^{-1}| = |AA^{-1}| = |I| = 1$ |
| Eigenvalues of inverse | $\lambda_{A^{-1}} = 1/\lambda_A$ | If $A\mathbf{v} = \lambda\mathbf{v}$, then $A^{-1}\mathbf{v} = (1/\lambda)\mathbf{v}$ |
$(AB)^{-1} = B^{-1}A^{-1}$—the order reverses. Intuitively: to undo 'apply $B$ then $A$', you must first undo $A$, then undo $B$. Like removing socks and shoes: to undo 'socks then shoes', you first remove shoes, then socks.
Extended product rule:
$$(ABC)^{-1} = C^{-1}B^{-1}A^{-1}$$ $$(A_1 A_2 ... A_n)^{-1} = A_n^{-1} ... A_2^{-1} A_1^{-1}$$
The inverse of a chain of transformations reverses the order completely.
For 2×2 matrices, there's a beautiful closed-form formula worth memorizing:
$$A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \implies A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}$$
where $ad - bc$ is the determinant of $A$.
The formula in words:
$$A = \\begin{bmatrix} 3 & 1 \\\\ 2 & 4 \\end{bmatrix}$$Determinant: $|A| = 3(4) - 1(2) = 12 - 2 = 10$
$$A^{-1} = \\frac{1}{10}\\begin{bmatrix} 4 & -1 \\\\ -2 & 3 \\end{bmatrix} = \\begin{bmatrix} 0.4 & -0.1 \\\\ -0.2 & 0.3 \\end{bmatrix}$$Verification: $AA^{-1} = \begin{bmatrix} 3(0.4)+1(-0.2) & 3(-0.1)+1(0.3) \\ 2(0.4)+4(-0.2) & 2(-0.1)+4(0.3) \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ ✓
If $ad - bc = 0$, the formula involves division by zero. This is exactly when the matrix is singular—the columns are linearly dependent ($[a,c]$ and $[b,d]$ are parallel), and no inverse exists.
For larger matrices, we need systematic methods. Here are the main approaches, from theoretical to practical.
Gauss-Jordan Elimination
A practical method: augment $A$ with the identity and row-reduce to get $I | A^{-1}$.
$$[A | I] \xrightarrow{\text{row ops}} [I | A^{-1}]$$
Algorithm:
Complexity: $O(n^3)$ operations.
Why it works: Row operations are multiplications by elementary matrices. Reducing $A$ to $I$ means $E_k ... E_2 E_1 A = I$, so $A^{-1} = E_k ... E_2 E_1 = E_k ... E_2 E_1 \cdot I$.
123456789101112131415161718192021222324252627282930
import numpy as npfrom scipy import linalg # Computing inverses in practiceA = np.array([[3, 1, 2], [0, 2, 1], [1, 0, 3]], dtype=float) # Method 1: Direct inverse (generally avoid!)A_inv = np.linalg.inv(A)print("Direct inverse:")print(A_inv) # Verifyprint(f"A @ A_inv (should be I):")print(np.round(A @ A_inv, 10)) # Method 2: Solve A x = b (preferred!)b = np.array([1, 2, 3])x = np.linalg.solve(A, b) # Solves Ax = b without computing A^{-1}print(f"Solution to Ax = b: {x}") # Method 3: LU decomposition for multiple right-hand sideslu, piv = linalg.lu_factor(A)# Now solve for multiple b vectors efficientlyfor i, bi in enumerate([[1,0,0], [0,1,0], [0,0,1]]): xi = linalg.lu_solve((lu, piv), bi) print(f"Column {i+1} of A^{{-1}}: {xi}")This might be surprising: computing the matrix inverse is rarely the best approach, even when it exists and you need $A^{-1}\mathbf{b}$.
The naive approach:
The better approach:
Why?
There are legitimate cases: • Needing $A^{-1}$ for many different $\mathbf{b}$ vectors (pre-compute once) • The inverse has special structure you want to exploit • Symbolic manipulation where numerical stability isn't a concern • $A$ is so small (2×2, 3×3) that it doesn't matter
But if in doubt, use numpy.linalg.solve() or equivalent.
What if a matrix isn't square or isn't full rank? The true inverse doesn't exist, but we can define a pseudoinverse that's the next best thing.
The Moore-Penrose pseudoinverse $A^+$ is defined for ANY matrix (square or not, full rank or not) and satisfies:
These four conditions uniquely determine $A^+$.
Geometric interpretation for overdetermined systems ($m > n$, more equations than unknowns):
The system $A\mathbf{x} = \mathbf{b}$ typically has no exact solution. The pseudoinverse gives the least squares solution:
$$\mathbf{x}^* = A^+\mathbf{b} = \arg\min_\mathbf{x} ||A\mathbf{x} - \mathbf{b}||^2$$
For full column rank $A$: $$A^+ = (A^T A)^{-1} A^T$$
This is exactly the least squares solution from linear regression!
For underdetermined systems ($m < n$, more unknowns than equations):
Many exact solutions exist. The pseudoinverse gives the minimum norm solution:
$$\mathbf{x}^* = A^+\mathbf{b} = \arg\min\{||\mathbf{x}|| : A\mathbf{x} = \mathbf{b}\}$$
For full row rank $A$: $$A^+ = A^T(AA^T)^{-1}$$
The most numerically stable way to compute $A^+$ is via SVD. If $A = U\Sigma V^T$, then $A^+ = V \Sigma^+ U^T$ where $\Sigma^+$ inverts non-zero singular values and leaves zeros as zeros. This works even for rank-deficient matrices.
12345678910111213141516171819202122232425262728
import numpy as np # The pseudoinverse in action # Overdetermined system (more equations than unknowns)A_tall = np.array([[1, 2], [3, 4], [5, 6]]) # 3x2b = np.array([1, 2, 3]) # No exact solution exists, but pseudoinverse gives least squaresA_pinv = np.linalg.pinv(A_tall)x_ls = A_pinv @ bprint(f"Least squares solution: {x_ls}")print(f"Residual norm: {np.linalg.norm(A_tall @ x_ls - b):.6f}") # Underdetermined system (more unknowns than equations)A_wide = np.array([[1, 2, 3], [4, 5, 6]]) # 2x3b2 = np.array([1, 2]) # Many exact solutions exist, pseudoinverse gives minimum normA_pinv2 = np.linalg.pinv(A_wide)x_mn = A_pinv2 @ b2print(f"Minimum norm solution: {x_mn}")print(f"Solution norm: {np.linalg.norm(x_mn):.6f}")print(f"Ax = b check: {A_wide @ x_mn}")The inverse is the transformation that undoes another, but its existence and computation have subtleties every ML practitioner must understand.
Module complete!
You've now mastered matrices and linear transformations. You understand matrices as functions that transform space, can perform and interpret matrix operations, comprehend rank and nullity as measures of dimensional flow, and know when and how to work with inverses.
Next steps: The following modules build on this foundation—eigenvalues and eigenvectors reveal the 'natural directions' of transformations, and matrix decompositions factor complex transformations into simpler components essential for ML algorithms like PCA, SVD-based recommendation, and optimization.
Congratulations! You've completed Module 2: Matrices and Linear Transformations. You now have the geometric intuition and computational skills to work with matrices as ML practitioners do. These concepts will reappear constantly in neural networks, dimensionality reduction, and optimization algorithms.