Machine LearningLinear Algebra

Matrices and Linear Transformations

LevelIntermediate

Duration60 mins

TopicLinear Algebra

5 / 5

Inverse Matrices

Undoing Transformations

If a matrix $A$ represents a transformation that warps space in a particular way, the inverse matrix $A^{-1}$ represents the opposite transformation—the one that undoes everything $A$ did.

This simple idea has profound implications:

It determines when linear systems have unique solutions
It explains when ML models are stable vs. ill-conditioned
It underlies everything from solving regression to computing attention weights

But not every transformation can be undone. If a 3D space is flattened to a 2D plane, information is lost—we can't reconstruct the original 3D point from its 2D projection. Invertibility is the property that separates transformations where nothing is lost from those where collapse occurs.

What You Will Master

By the end of this page, you will understand what matrix inverses mean geometrically, when they exist, multiple methods for computing them, and why you should often AVOID explicitly computing inverses in practice.

Definition and Geometric Interpretation

Definition:

For a square matrix $A$, the inverse $A^{-1}$ (if it exists) is the unique matrix satisfying:

$$AA^{-1} = A^{-1}A = I$$

where $I$ is the identity matrix.

Geometric interpretation:

If $A$ represents a transformation (rotation, scaling, shearing, etc.), then $A^{-1}$ represents the exact opposite transformation:

If $A$ rotates by $\theta$, then $A^{-1}$ rotates by $-\theta$
If $A$ scales by factor 2, then $A^{-1}$ scales by factor 1/2
If $A$ shears left, then $A^{-1}$ shears right by the same amount

Applying $A$ then $A^{-1}$ (or vice versa) returns you to where you started: $(A^{-1}A)\mathbf{x} = I\mathbf{x} = \mathbf{x}$.

Inverse of a Scaling MatrixConsider a diagonal scaling matrix:

Input

$$A = \\begin{bmatrix} 2 & 0 \\\\ 0 & 4 \\end{bmatrix}$$

This scales x by 2 and y by 4.

Output

$$A^{-1} = \\begin{bmatrix} 1/2 & 0 \\\\ 0 & 1/4 \\end{bmatrix}$$

This scales x by 1/2 and y by 1/4, undoing the original scaling.

Explanation

For diagonal matrices, the inverse is simple: invert each diagonal entry. $AA^{-1} = \begin{bmatrix} 2 \cdot 1/2 & 0 \\ 0 & 4 \cdot 1/4 \end{bmatrix} = I$ ✓

Only Square Matrices Have (Full) Inverses

A non-square matrix can't have an inverse in the standard sense. A 3×2 matrix maps from $\mathbb{R}^2$ to $\mathbb{R}^3$—we can't map back uniquely because 3D has more information than 2D. We can define 'left inverse' and 'right inverse' for non-square matrices, but they're not true inverses.

When Does the Inverse Exist?

A square matrix $A$ is invertible (also called nonsingular or regular) if and only if ANY of the following equivalent conditions hold:

Determinant is nonzero: $|A| eq 0$
Full rank: $\text{rank}(A) = n$
Trivial null space: $\text{Null}(A) = \{\mathbf{0}\}$
Columns are linearly independent
Rows are linearly independent
$A$ has $n$ pivots in row echelon form
All eigenvalues are nonzero
$A\mathbf{x} = \mathbf{b}$ has a unique solution for every $\mathbf{b}$

If ANY of these fails, ALL of them fail—the matrix is singular and has no inverse.

Invertible vs Singular: The Complete Picture
Property	Invertible ($\|A\| eq 0$)	Singular ($\|A\| = 0$)
Rank	Full (= $n$)	Deficient (< $n$)
Null space	Trivial $\{\mathbf{0}\}$	Non-trivial (dimension > 0)
$A\mathbf{x} = \mathbf{b}$	Unique solution	Zero or infinitely many solutions
Geometrically	No dimension collapse	Collapses some dimension
Column space	All of $\mathbb{R}^n$	Proper subspace
Eigenvalues	All nonzero	At least one is zero

Near-Singularity is Also Dangerous

A matrix can be technically invertible but numerically unstable. If the determinant is $10^{-15}$, the inverse exists but computing it accurately is impossible in floating-point arithmetic. The condition number (ratio of largest to smallest singular value) measures this: $\kappa(A) > 10^{10}$ means trouble.

Properties of Matrix Inverses

Understanding how inverses behave under various operations is essential for manipulating matrix expressions.

Inverse Properties Reference
Property	Formula	Proof Sketch
Inverse of inverse	$(A^{-1})^{-1} = A$	$(A^{-1})^{-1} A^{-1} = I \Rightarrow $ multiply both sides by $A$
Inverse of product	$(AB)^{-1} = B^{-1}A^{-1}$	$(AB)(B^{-1}A^{-1}) = A(BB^{-1})A^{-1} = AIA^{-1} = I$
Inverse of transpose	$(A^T)^{-1} = (A^{-1})^T$	$(A^T)((A^{-1})^T) = (A^{-1}A)^T = I^T = I$
Inverse of scalar mult	$(cA)^{-1} = \frac{1}{c}A^{-1}$	$(cA)(\frac{1}{c}A^{-1}) = \frac{c}{c}AA^{-1} = I$
Determinant of inverse	$\|A^{-1}\| = 1/\|A\|$	$\|A\|\|A^{-1}\| = \|AA^{-1}\| = \|I\| = 1$
Eigenvalues of inverse	$\lambda_{A^{-1}} = 1/\lambda_A$	If $A\mathbf{v} = \lambda\mathbf{v}$, then $A^{-1}\mathbf{v} = (1/\lambda)\mathbf{v}$

Product Inverse: Order Reversal!

$(AB)^{-1} = B^{-1}A^{-1}$—the order reverses. Intuitively: to undo 'apply $B$ then $A$', you must first undo $A$, then undo $B$. Like removing socks and shoes: to undo 'socks then shoes', you first remove shoes, then socks.

Extended product rule:

$$(ABC)^{-1} = C^{-1}B^{-1}A^{-1}$$ $$(A_1 A_2 ... A_n)^{-1} = A_n^{-1} ... A_2^{-1} A_1^{-1}$$

The inverse of a chain of transformations reverses the order completely.

The 2×2 Inverse Formula

For 2×2 matrices, there's a beautiful closed-form formula worth memorizing:

$$A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \implies A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}$$

where $ad - bc$ is the determinant of $A$.

The formula in words:

Swap the diagonal elements ($a \leftrightarrow d$)
Negate the off-diagonal elements ($b \to -b$, $c \to -c$)
Divide everything by the determinant

Computing a 2×2 InverseFind the inverse of this matrix:

Input

$$A = \\begin{bmatrix} 3 & 1 \\\\ 2 & 4 \\end{bmatrix}$$

Output

Determinant: $|A| = 3(4) - 1(2) = 12 - 2 = 10$

$$A^{-1} = \\frac{1}{10}\\begin{bmatrix} 4 & -1 \\\\ -2 & 3 \\end{bmatrix} = \\begin{bmatrix} 0.4 & -0.1 \\\\ -0.2 & 0.3 \\end{bmatrix}$$

Explanation

Verification: $AA^{-1} = \begin{bmatrix} 3(0.4)+1(-0.2) & 3(-0.1)+1(0.3) \\ 2(0.4)+4(-0.2) & 2(-0.1)+4(0.3) \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ ✓

When This Formula Fails

If $ad - bc = 0$, the formula involves division by zero. This is exactly when the matrix is singular—the columns are linearly dependent ($[a,c]$ and $[b,d]$ are parallel), and no inverse exists.

Methods for Computing Inverses

For larger matrices, we need systematic methods. Here are the main approaches, from theoretical to practical.

Gauss-Jordan Elimination

A practical method: augment $A$ with the identity and row-reduce to get $I | A^{-1}$.

$$[A | I] \xrightarrow{\text{row ops}} [I | A^{-1}]$$

Algorithm:

Form the augmented matrix $[A | I]$
Apply row operations to reduce $A$ to $I$
The same operations transform $I$ into $A^{-1}$

Complexity: $O(n^3)$ operations.

Why it works: Row operations are multiplications by elementary matrices. Reducing $A$ to $I$ means $E_k ... E_2 E_1 A = I$, so $A^{-1} = E_k ... E_2 E_1 = E_k ... E_2 E_1 \cdot I$.

inverse_methods.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
from scipy import linalg
 
# Computing inverses in practice
A = np.array([[3, 1, 2],
              [0, 2, 1],
              [1, 0, 3]], dtype=float)
 
# Method 1: Direct inverse (generally avoid!)
A_inv = np.linalg.inv(A)
print("Direct inverse:")
print(A_inv)
 
# Verify
print(f"
A @ A_inv (should be I):")
print(np.round(A @ A_inv, 10))
 
# Method 2: Solve A x = b (preferred!)
b = np.array([1, 2, 3])
x = np.linalg.solve(A, b)  # Solves Ax = b without computing A^{-1}
print(f"
Solution to Ax = b: {x}")
 
# Method 3: LU decomposition for multiple right-hand sides
lu, piv = linalg.lu_factor(A)
# Now solve for multiple b vectors efficiently
for i, bi in enumerate([[1,0,0], [0,1,0], [0,0,1]]):
    xi = linalg.lu_solve((lu, piv), bi)
    print(f"Column {i+1} of A^{{-1}}: {xi}")

Why You Should (Usually) Avoid Explicit Inverses

This might be surprising: computing the matrix inverse is rarely the best approach, even when it exists and you need $A^{-1}\mathbf{b}$.

The naive approach:

Compute $A^{-1}$ explicitly
Multiply: $\mathbf{x} = A^{-1}\mathbf{b}$

The better approach:

Solve $A\mathbf{x} = \mathbf{b}$ directly using LU or QR factorization

Why?

Problems with Explicit Inverses

•Speed: Computing $A^{-1}$ costs $O(n^3)$, then multiplying costs $O(n^2)$. Solving $A\mathbf{x} = \mathbf{b}$ directly also costs $O(n^3)$—same leading term, but lower constants.
•Numerical stability: Forming $A^{-1}$ loses precision. Error analysis shows solving $A\mathbf{x} = \mathbf{b}$ has better error bounds than computing $A^{-1}$ then multiplying.
•Memory: $A^{-1}$ is generally dense even if $A$ is sparse. A sparse matrix with 99% zeros might have a dense inverse—memory explodes.
•Unnecessary work: If you only need $A^{-1}\mathbf{b}$ for one specific $\mathbf{b}$, computing all of $A^{-1}$ is wasteful.

When TO Compute the Inverse

There are legitimate cases: • Needing $A^{-1}$ for many different $\mathbf{b}$ vectors (pre-compute once) • The inverse has special structure you want to exploit • Symbolic manipulation where numerical stability isn't a concern • $A$ is so small (2×2, 3×3) that it doesn't matter

But if in doubt, use numpy.linalg.solve() or equivalent.

The Moore-Penrose Pseudoinverse

What if a matrix isn't square or isn't full rank? The true inverse doesn't exist, but we can define a pseudoinverse that's the next best thing.

The Moore-Penrose pseudoinverse $A^+$ is defined for ANY matrix (square or not, full rank or not) and satisfies:

$AA^+A = A$
$A^+AA^+ = A^+$
$(AA^+)^T = AA^+$
$(A^+A)^T = A^+A$

These four conditions uniquely determine $A^+$.

Geometric interpretation for overdetermined systems ($m > n$, more equations than unknowns):

The system $A\mathbf{x} = \mathbf{b}$ typically has no exact solution. The pseudoinverse gives the least squares solution:

$$\mathbf{x}^* = A^+\mathbf{b} = \arg\min_\mathbf{x} ||A\mathbf{x} - \mathbf{b}||^2$$

For full column rank $A$: $$A^+ = (A^T A)^{-1} A^T$$

This is exactly the least squares solution from linear regression!

For underdetermined systems ($m < n$, more unknowns than equations):

Many exact solutions exist. The pseudoinverse gives the minimum norm solution:

$$\mathbf{x}^* = A^+\mathbf{b} = \arg\min\{||\mathbf{x}|| : A\mathbf{x} = \mathbf{b}\}$$

For full row rank $A$: $$A^+ = A^T(AA^T)^{-1}$$

SVD-based Pseudoinverse

The most numerically stable way to compute $A^+$ is via SVD. If $A = U\Sigma V^T$, then $A^+ = V \Sigma^+ U^T$ where $\Sigma^+$ inverts non-zero singular values and leaves zeros as zeros. This works even for rank-deficient matrices.

pseudoinverse.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
 
# The pseudoinverse in action
 
# Overdetermined system (more equations than unknowns)
A_tall = np.array([[1, 2],
                   [3, 4],
                   [5, 6]])  # 3x2
b = np.array([1, 2, 3])
 
# No exact solution exists, but pseudoinverse gives least squares
A_pinv = np.linalg.pinv(A_tall)
x_ls = A_pinv @ b
print(f"Least squares solution: {x_ls}")
print(f"Residual norm: {np.linalg.norm(A_tall @ x_ls - b):.6f}")
 
# Underdetermined system (more unknowns than equations)
A_wide = np.array([[1, 2, 3],
                   [4, 5, 6]])  # 2x3
b2 = np.array([1, 2])
 
# Many exact solutions exist, pseudoinverse gives minimum norm
A_pinv2 = np.linalg.pinv(A_wide)
x_mn = A_pinv2 @ b2
print(f"
Minimum norm solution: {x_mn}")
print(f"Solution norm: {np.linalg.norm(x_mn):.6f}")
print(f"Ax = b check: {A_wide @ x_mn}")

Summary: Matrix Inverses

The inverse is the transformation that undoes another, but its existence and computation have subtleties every ML practitioner must understand.

Key Takeaways

•Inverse undoes transformation — If $A$ rotates/scales/shears, $A^{-1}$ does the opposite. $AA^{-1} = A^{-1}A = I$.
•Invertibility is all-or-nothing — Full rank ⟺ nonzero determinant ⟺ trivial null space ⟺ invertible. Any failure means complete failure.
•Important properties — $(AB)^{-1} = B^{-1}A^{-1}$ (order reverses). $(A^T)^{-1} = (A^{-1})^T$.
•The 2×2 formula — Swap diagonal, negate off-diagonal, divide by determinant. Worth memorizing.
•Computation methods — Gauss-Jordan, LU decomposition, adjugate formula. Practice uses LU for efficiency.
•Avoid explicit inverses when possible — Solve $A\mathbf{x} = \mathbf{b}$ directly for better speed, stability, and memory.
•Pseudoinverse extends to non-invertible cases — Gives least squares solution for overdetermined, minimum norm for underdetermined systems.

Module complete!

You've now mastered matrices and linear transformations. You understand matrices as functions that transform space, can perform and interpret matrix operations, comprehend rank and nullity as measures of dimensional flow, and know when and how to work with inverses.

Next steps: The following modules build on this foundation—eigenvalues and eigenvectors reveal the 'natural directions' of transformations, and matrix decompositions factor complex transformations into simpler components essential for ML algorithms like PCA, SVD-based recommendation, and optimization.

Module Complete

Congratulations! You've completed Module 2: Matrices and Linear Transformations. You now have the geometric intuition and computational skills to work with matrices as ML practitioners do. These concepts will reappear constantly in neural networks, dimensionality reduction, and optimization algorithms.

5 / 5

Loading learning content...

Machine LearningLinear Algebra

Matrices and Linear Transformations

LevelIntermediate

Duration60 mins

TopicLinear Algebra

5 / 5

Inverse Matrices

Undoing Transformations

If a matrix $A$ represents a transformation that warps space in a particular way, the inverse matrix $A^{-1}$ represents the opposite transformation—the one that undoes everything $A$ did.

This simple idea has profound implications:

It determines when linear systems have unique solutions
It explains when ML models are stable vs. ill-conditioned
It underlies everything from solving regression to computing attention weights

What You Will Master

Definition and Geometric Interpretation

Definition:

For a square matrix $A$, the inverse $A^{-1}$ (if it exists) is the unique matrix satisfying:

$$AA^{-1} = A^{-1}A = I$$

where $I$ is the identity matrix.

Geometric interpretation:

If $A$ represents a transformation (rotation, scaling, shearing, etc.), then $A^{-1}$ represents the exact opposite transformation:

If $A$ rotates by $\theta$, then $A^{-1}$ rotates by $-\theta$
If $A$ scales by factor 2, then $A^{-1}$ scales by factor 1/2
If $A$ shears left, then $A^{-1}$ shears right by the same amount

Applying $A$ then $A^{-1}$ (or vice versa) returns you to where you started: $(A^{-1}A)\mathbf{x} = I\mathbf{x} = \mathbf{x}$.

Inverse of a Scaling MatrixConsider a diagonal scaling matrix:

Input

$$A = \\begin{bmatrix} 2 & 0 \\\\ 0 & 4 \\end{bmatrix}$$

This scales x by 2 and y by 4.

Output

$$A^{-1} = \\begin{bmatrix} 1/2 & 0 \\\\ 0 & 1/4 \\end{bmatrix}$$

This scales x by 1/2 and y by 1/4, undoing the original scaling.

Explanation

For diagonal matrices, the inverse is simple: invert each diagonal entry. $AA^{-1} = \begin{bmatrix} 2 \cdot 1/2 & 0 \\ 0 & 4 \cdot 1/4 \end{bmatrix} = I$ ✓

Only Square Matrices Have (Full) Inverses

When Does the Inverse Exist?

A square matrix $A$ is invertible (also called nonsingular or regular) if and only if ANY of the following equivalent conditions hold:

Determinant is nonzero: $|A| eq 0$
Full rank: $\text{rank}(A) = n$
Trivial null space: $\text{Null}(A) = \{\mathbf{0}\}$
Columns are linearly independent
Rows are linearly independent
$A$ has $n$ pivots in row echelon form
All eigenvalues are nonzero
$A\mathbf{x} = \mathbf{b}$ has a unique solution for every $\mathbf{b}$

If ANY of these fails, ALL of them fail—the matrix is singular and has no inverse.

Invertible vs Singular: The Complete Picture
Property	Invertible ($\|A\| eq 0$)	Singular ($\|A\| = 0$)
Rank	Full (= $n$)	Deficient (< $n$)
Null space	Trivial $\{\mathbf{0}\}$	Non-trivial (dimension > 0)
$A\mathbf{x} = \mathbf{b}$	Unique solution	Zero or infinitely many solutions
Geometrically	No dimension collapse	Collapses some dimension
Column space	All of $\mathbb{R}^n$	Proper subspace
Eigenvalues	All nonzero	At least one is zero

Near-Singularity is Also Dangerous

Properties of Matrix Inverses

Understanding how inverses behave under various operations is essential for manipulating matrix expressions.

Inverse Properties Reference
Property	Formula	Proof Sketch
Inverse of inverse	$(A^{-1})^{-1} = A$	$(A^{-1})^{-1} A^{-1} = I \Rightarrow $ multiply both sides by $A$
Inverse of product	$(AB)^{-1} = B^{-1}A^{-1}$	$(AB)(B^{-1}A^{-1}) = A(BB^{-1})A^{-1} = AIA^{-1} = I$
Inverse of transpose	$(A^T)^{-1} = (A^{-1})^T$	$(A^T)((A^{-1})^T) = (A^{-1}A)^T = I^T = I$
Inverse of scalar mult	$(cA)^{-1} = \frac{1}{c}A^{-1}$	$(cA)(\frac{1}{c}A^{-1}) = \frac{c}{c}AA^{-1} = I$
Determinant of inverse	$\|A^{-1}\| = 1/\|A\|$	$\|A\|\|A^{-1}\| = \|AA^{-1}\| = \|I\| = 1$
Eigenvalues of inverse	$\lambda_{A^{-1}} = 1/\lambda_A$	If $A\mathbf{v} = \lambda\mathbf{v}$, then $A^{-1}\mathbf{v} = (1/\lambda)\mathbf{v}$

Product Inverse: Order Reversal!

Extended product rule:

$$(ABC)^{-1} = C^{-1}B^{-1}A^{-1}$$ $$(A_1 A_2 ... A_n)^{-1} = A_n^{-1} ... A_2^{-1} A_1^{-1}$$

The inverse of a chain of transformations reverses the order completely.

The 2×2 Inverse Formula

For 2×2 matrices, there's a beautiful closed-form formula worth memorizing:

$$A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \implies A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}$$

where $ad - bc$ is the determinant of $A$.

The formula in words:

Swap the diagonal elements ($a \leftrightarrow d$)
Negate the off-diagonal elements ($b \to -b$, $c \to -c$)
Divide everything by the determinant

Computing a 2×2 InverseFind the inverse of this matrix:

Input

$$A = \\begin{bmatrix} 3 & 1 \\\\ 2 & 4 \\end{bmatrix}$$

Output

Determinant: $|A| = 3(4) - 1(2) = 12 - 2 = 10$

$$A^{-1} = \\frac{1}{10}\\begin{bmatrix} 4 & -1 \\\\ -2 & 3 \\end{bmatrix} = \\begin{bmatrix} 0.4 & -0.1 \\\\ -0.2 & 0.3 \\end{bmatrix}$$

Explanation

Verification: $AA^{-1} = \begin{bmatrix} 3(0.4)+1(-0.2) & 3(-0.1)+1(0.3) \\ 2(0.4)+4(-0.2) & 2(-0.1)+4(0.3) \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ ✓

When This Formula Fails

If $ad - bc = 0$, the formula involves division by zero. This is exactly when the matrix is singular—the columns are linearly dependent ($[a,c]$ and $[b,d]$ are parallel), and no inverse exists.

Methods for Computing Inverses

For larger matrices, we need systematic methods. Here are the main approaches, from theoretical to practical.

Gauss-Jordan Elimination

A practical method: augment $A$ with the identity and row-reduce to get $I | A^{-1}$.

$$[A | I] \xrightarrow{\text{row ops}} [I | A^{-1}]$$

Algorithm:

Form the augmented matrix $[A | I]$
Apply row operations to reduce $A$ to $I$
The same operations transform $I$ into $A^{-1}$

Complexity: $O(n^3)$ operations.

Why it works: Row operations are multiplications by elementary matrices. Reducing $A$ to $I$ means $E_k ... E_2 E_1 A = I$, so $A^{-1} = E_k ... E_2 E_1 = E_k ... E_2 E_1 \cdot I$.

inverse_methods.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
from scipy import linalg
 
# Computing inverses in practice
A = np.array([[3, 1, 2],
              [0, 2, 1],
              [1, 0, 3]], dtype=float)
 
# Method 1: Direct inverse (generally avoid!)
A_inv = np.linalg.inv(A)
print("Direct inverse:")
print(A_inv)
 
# Verify
print(f"
A @ A_inv (should be I):")
print(np.round(A @ A_inv, 10))
 
# Method 2: Solve A x = b (preferred!)
b = np.array([1, 2, 3])
x = np.linalg.solve(A, b)  # Solves Ax = b without computing A^{-1}
print(f"
Solution to Ax = b: {x}")
 
# Method 3: LU decomposition for multiple right-hand sides
lu, piv = linalg.lu_factor(A)
# Now solve for multiple b vectors efficiently
for i, bi in enumerate([[1,0,0], [0,1,0], [0,0,1]]):
    xi = linalg.lu_solve((lu, piv), bi)
    print(f"Column {i+1} of A^{{-1}}: {xi}")

Why You Should (Usually) Avoid Explicit Inverses

This might be surprising: computing the matrix inverse is rarely the best approach, even when it exists and you need $A^{-1}\mathbf{b}$.

The naive approach:

Compute $A^{-1}$ explicitly
Multiply: $\mathbf{x} = A^{-1}\mathbf{b}$

The better approach:

Solve $A\mathbf{x} = \mathbf{b}$ directly using LU or QR factorization

Why?

Problems with Explicit Inverses

•Speed: Computing $A^{-1}$ costs $O(n^3)$, then multiplying costs $O(n^2)$. Solving $A\mathbf{x} = \mathbf{b}$ directly also costs $O(n^3)$—same leading term, but lower constants.
•Numerical stability: Forming $A^{-1}$ loses precision. Error analysis shows solving $A\mathbf{x} = \mathbf{b}$ has better error bounds than computing $A^{-1}$ then multiplying.
•Memory: $A^{-1}$ is generally dense even if $A$ is sparse. A sparse matrix with 99% zeros might have a dense inverse—memory explodes.
•Unnecessary work: If you only need $A^{-1}\mathbf{b}$ for one specific $\mathbf{b}$, computing all of $A^{-1}$ is wasteful.

When TO Compute the Inverse

But if in doubt, use numpy.linalg.solve() or equivalent.

The Moore-Penrose Pseudoinverse

What if a matrix isn't square or isn't full rank? The true inverse doesn't exist, but we can define a pseudoinverse that's the next best thing.

The Moore-Penrose pseudoinverse $A^+$ is defined for ANY matrix (square or not, full rank or not) and satisfies:

$AA^+A = A$
$A^+AA^+ = A^+$
$(AA^+)^T = AA^+$
$(A^+A)^T = A^+A$

These four conditions uniquely determine $A^+$.

Geometric interpretation for overdetermined systems ($m > n$, more equations than unknowns):

The system $A\mathbf{x} = \mathbf{b}$ typically has no exact solution. The pseudoinverse gives the least squares solution:

$$\mathbf{x}^* = A^+\mathbf{b} = \arg\min_\mathbf{x} ||A\mathbf{x} - \mathbf{b}||^2$$

For full column rank $A$: $$A^+ = (A^T A)^{-1} A^T$$

This is exactly the least squares solution from linear regression!

For underdetermined systems ($m < n$, more unknowns than equations):

Many exact solutions exist. The pseudoinverse gives the minimum norm solution:

$$\mathbf{x}^* = A^+\mathbf{b} = \arg\min\{||\mathbf{x}|| : A\mathbf{x} = \mathbf{b}\}$$

For full row rank $A$: $$A^+ = A^T(AA^T)^{-1}$$

SVD-based Pseudoinverse

pseudoinverse.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
 
# The pseudoinverse in action
 
# Overdetermined system (more equations than unknowns)
A_tall = np.array([[1, 2],
                   [3, 4],
                   [5, 6]])  # 3x2
b = np.array([1, 2, 3])
 
# No exact solution exists, but pseudoinverse gives least squares
A_pinv = np.linalg.pinv(A_tall)
x_ls = A_pinv @ b
print(f"Least squares solution: {x_ls}")
print(f"Residual norm: {np.linalg.norm(A_tall @ x_ls - b):.6f}")
 
# Underdetermined system (more unknowns than equations)
A_wide = np.array([[1, 2, 3],
                   [4, 5, 6]])  # 2x3
b2 = np.array([1, 2])
 
# Many exact solutions exist, pseudoinverse gives minimum norm
A_pinv2 = np.linalg.pinv(A_wide)
x_mn = A_pinv2 @ b2
print(f"
Minimum norm solution: {x_mn}")
print(f"Solution norm: {np.linalg.norm(x_mn):.6f}")
print(f"Ax = b check: {A_wide @ x_mn}")

Summary: Matrix Inverses

The inverse is the transformation that undoes another, but its existence and computation have subtleties every ML practitioner must understand.

Key Takeaways

•Inverse undoes transformation — If $A$ rotates/scales/shears, $A^{-1}$ does the opposite. $AA^{-1} = A^{-1}A = I$.
•Invertibility is all-or-nothing — Full rank ⟺ nonzero determinant ⟺ trivial null space ⟺ invertible. Any failure means complete failure.
•Important properties — $(AB)^{-1} = B^{-1}A^{-1}$ (order reverses). $(A^T)^{-1} = (A^{-1})^T$.
•The 2×2 formula — Swap diagonal, negate off-diagonal, divide by determinant. Worth memorizing.
•Computation methods — Gauss-Jordan, LU decomposition, adjugate formula. Practice uses LU for efficiency.
•Avoid explicit inverses when possible — Solve $A\mathbf{x} = \mathbf{b}$ directly for better speed, stability, and memory.
•Pseudoinverse extends to non-invertible cases — Gives least squares solution for overdetermined, minimum norm for underdetermined systems.

Module complete!

Module Complete

5 / 5