Machine LearningLinear Algebra

Matrices and Linear Transformations

LevelIntermediate

Duration60 mins

TopicLinear Algebra

4 / 5

Rank and Nullity

What Gets Preserved, What Gets Lost

When a linear transformation acts on a vector space, something interesting happens: some dimensions might collapse. A 3D space might be flattened into a 2D plane, or even crushed to a line. The concepts of rank and nullity precisely measure this dimensional behavior.\n\nRank tells us how much of the input space's structure survives transformation—the dimension of the output.\n\nNullity tells us how much is destroyed—the dimension of the 'invisible' subspace that maps to zero.\n\nTogether, they reveal the fundamental structure of any linear map and explain why some linear systems have unique solutions, others have infinitely many, and some have none at all.

What You Will Master

By the end of this page, you will understand rank and nullity geometrically, apply the rank-nullity theorem, recognize rank-deficient situations in ML (multicollinearity, singular matrices), and understand why these concepts matter for solving linear systems and training models.

Column Space and Rank

The column space (or range or image) of a matrix $A$ is the set of all possible outputs when we apply $A$ to any input:\n\n$$\text{Col}(A) = \{A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n\}$$\n\nFrom the column view of matrix-vector multiplication, $A\mathbf{x}$ is a linear combination of the columns of $A$:\n$$A\mathbf{x} = x_1\mathbf{a}_1 + x_2\mathbf{a}_2 + ... + x_n\mathbf{a}_n$$\n\nTherefore, the column space is exactly the span of the columns of $A$.

The rank of a matrix is the dimension of its column space:\n\n$$\text{rank}(A) = \dim(\text{Col}(A))$$\n\nEquivalently, rank is the number of linearly independent columns of $A$ (or equivalently, the number of linearly independent rows—these two numbers are always equal).\n\nGeometric interpretation:\n\nRank tells you the dimension of the output space. If $A$ is a $5 \times 3$ matrix with rank 2:\n- Input lives in $\mathbb{R}^3$ (3D)\n- Output lives in a 2D plane inside $\mathbb{R}^5$\n- One dimension of input has been collapsed

Computing Rank by InspectionConsider this $3 \\times 3$ matrix:

Input

$$A = \\begin{bmatrix} 1 & 2 & 3 \\\\ 2 & 4 & 6 \\\\ 1 & 1 & 1 \\end{bmatrix}$$

Output

Column 2 = 2 × Column 1. Column 3 = 3 × Column 1.\n\nOnly column 1 and column 3 are independent... wait, let's check: are columns 1 and 3 independent?\n\nRow 3: [1, 1, 1]. Columns 1 and 3 are both contributions to this row.\n\nActually, we need row reduction to be sure. After reduction: rank = 2.

Explanation

Row 2 is 2× Row 1, so it's redundant. The remaining two rows are independent. Rank = 2. The transformation collapses 3D space to a 2D plane.

Rank via Row Reduction

The most reliable way to compute rank is to row-reduce the matrix to echelon form. The rank equals the number of non-zero rows (equivalently, the number of pivot positions). This is what numerical libraries do internally.

Null Space and Nullity

The null space (or kernel) of a matrix $A$ is the set of all vectors that map to zero:\n\n$$\text{Null}(A) = \{\mathbf{x} : A\mathbf{x} = \mathbf{0}\}$$\n\nThe nullity is the dimension of this null space:\n\n$$\text{nullity}(A) = \dim(\text{Null}(A))$$

Geometric interpretation:\n\nThe null space is the set of input directions that the transformation "ignores" or "annihilates." If you're standing in the null space and the transformation acts on you, you don't move (you map to the origin).\n\nExamples:\n- Projection onto a plane: null space is the line perpendicular to the plane\n- A singular 2×2 matrix: null space is a line through the origin\n- The zero matrix: null space is the entire input space\n- An invertible matrix: null space contains only the zero vector\n\nKey insight: The null space is always a subspace—it contains the origin, and it's closed under addition and scalar multiplication. If $A\mathbf{u} = \mathbf{0}$ and $A\mathbf{v} = \mathbf{0}$, then $A(\mathbf{u} + \mathbf{v}) = A\mathbf{u} + A\mathbf{v} = \mathbf{0}$. ✓

Null Space and Non-Uniqueness

If $A\mathbf{x} = \mathbf{b}$ has a solution $\mathbf{x}_0$, then $\mathbf{x}_0 + \mathbf{n}$ is also a solution for any $\mathbf{n}$ in the null space (since $A(\mathbf{x}_0 + \mathbf{n}) = A\mathbf{x}_0 + A\mathbf{n} = \mathbf{b} + \mathbf{0} = \mathbf{b}$). A non-trivial null space means infinitely many solutions!

Nullity and Solution Behavior
Nullity	Null Space	Solutions to $A\mathbf{x} = \mathbf{b}$
$0$	Just $\{\mathbf{0}\}$	At most one solution (unique if it exists)
$> 0$	A subspace of dimension > 0	Either no solution or infinitely many
$= n$ (full dimension)	Entire $\mathbb{R}^n$	$A = O$ (zero matrix); $\mathbf{b} = \mathbf{0}$ gives all solutions

The Rank-Nullity Theorem

The Rank-Nullity Theorem (also called the Fundamental Theorem of Linear Maps) is one of the most important results in linear algebra:\n\n$$\boxed{\text{rank}(A) + \text{nullity}(A) = n}$$\n\nwhere $n$ is the number of columns of $A$ (the dimension of the domain).\n\nIn words: The dimension of what's preserved (rank) plus the dimension of what's destroyed (nullity) equals the dimension of the input space.\n\nConservation law interpretation:\n\nDimension can't be created or destroyed, only transferred. Of the $n$ dimensions going in:\n- Some become "output dimensions" (rank)\n- The rest become "collapsed dimensions" (nullity)\n\nNothing is lost in the accounting—it's a dimensional conservation law.

Rank-Nullity in ActionFor a 5×4 matrix with rank 3:

Input

$A$ is $5 \\times 4$, so domain is $\\mathbb{R}^4$ ($n = 4$).\nGiven: $\\text{rank}(A) = 3$.

Output

By rank-nullity: $\\text{nullity}(A) = 4 - 3 = 1$.\n\nThe null space is 1-dimensional (a line through the origin in $\\mathbb{R}^4$).\nThe column space is 3-dimensional (a 3D hyperplane in $\\mathbb{R}^5$).

Explanation

Four input dimensions → three output dimensions + one collapsed dimension. The transformation projects 4D space onto a 3D subspace, with one direction getting flattened to zero.

Full Rank Matrices

A matrix has 'full rank' when rank equals the smaller of its dimensions. For an $m \times n$ matrix: full rank means rank = min($m$, $n$). For square $n \times n$ matrices, full rank (= $n$) implies nullity = 0, which means the matrix is invertible. Rank and invertibility are intimately connected.

The Four Fundamental Subspaces

Every matrix $A_{m \times n}$ defines four fundamental subspaces that completely characterize its behavior. Understanding these is essential for grasping linear algebra's deepest structure.

The Four Fundamental Subspaces
Subspace	Definition	Dimension	Lives In
Column Space $C(A)$	All $A\mathbf{x}$ for $\mathbf{x} \in \mathbb{R}^n$	$r$ (rank)	$\mathbb{R}^m$
Null Space $N(A)$	All $\mathbf{x}$ with $A\mathbf{x} = \mathbf{0}$	$n - r$	$\mathbb{R}^n$
Row Space $C(A^T)$	All $A^T\mathbf{y}$ for $\mathbf{y} \in \mathbb{R}^m$	$r$ (rank)	$\mathbb{R}^n$
Left Null Space $N(A^T)$	All $\mathbf{y}$ with $A^T\mathbf{y} = \mathbf{0}$	$m - r$	$\mathbb{R}^m$

The orthogonality relationships:\n\nThese subspaces come in orthogonal pairs:\n- In $\mathbb{R}^n$: Row space $\perp$ Null space (they are orthogonal complements)\n- In $\mathbb{R}^m$: Column space $\perp$ Left null space\n\nWhy the orthogonality?\n\nIf $\mathbf{x}$ is in the null space, then $A\mathbf{x} = \mathbf{0}$. This means $\mathbf{x}$ is orthogonal to every row of $A$ (since row$_i \cdot \mathbf{x} = 0$). The rows of $A$ span the row space, so $\mathbf{x}$ is orthogonal to the entire row space.

Dimension Accounting

In $\mathbb{R}^n$: dim(Row Space) + dim(Null Space) = $r + (n-r) = n$ ✓\nIn $\mathbb{R}^m$: dim(Column Space) + dim(Left Null Space) = $r + (m-r) = m$ ✓\n\nOrthogonal complements always sum to the ambient dimension.

ML significance:\n\n- Column space: The reachable outputs—predictions a linear model can make\n- Null space: Input variations the model ignores—features that don't affect output\n- Row space: The distinguishable inputs—features the model actually uses\n- Left null space: Constraints on valid outputs—residuals in least squares

Rank and Solvability of Linear Systems

Rank is the key to understanding when the system $A\mathbf{x} = \mathbf{b}$ has solutions and how many.

The Existence Condition:\n\nThe system $A\mathbf{x} = \mathbf{b}$ has a solution iff $\mathbf{b}$ is in the column space of $A$.\n\nThis is equivalent to: $\text{rank}([A | \mathbf{b}]) = \text{rank}(A)$\n\nIf augmenting $A$ with $\mathbf{b}$ increases the rank, then $\mathbf{b}$ introduces a new independent direction—one the columns of $A$ can't reach.\n\nThe Uniqueness Condition:\n\nIf a solution exists, it's unique iff the null space is trivial (nullity = 0).\n\nThis happens when $A$ has full column rank: $\text{rank}(A) = n$ (number of columns).

Complete Classification of $A\\mathbf{x} = \\mathbf{b}$ Solutions
Condition	Solutions	Interpretation
$\mathbf{b} \notin C(A)$	None	$\mathbf{b}$ unreachable by any linear combination
$\mathbf{b} \in C(A)$, nullity = 0	Exactly one	Unique representation
$\mathbf{b} \in C(A)$, nullity > 0	Infinitely many	Add any null space vector to a particular solution

Square Matrices: The Simple Case

For square $n \times n$ matrices: full rank ($r = n$) implies both existence and uniqueness for ANY right-hand side $\mathbf{b}$. This is equivalent to being invertible. For non-square matrices, we must check both conditions separately.

Rank Deficiency in Machine Learning

Rank deficiency—when a matrix doesn't have full rank—causes problems throughout machine learning. Understanding when and why it happens helps you diagnose and fix issues.

Multicollinearity (Feature Correlation)\n\nWhen features are linearly dependent, the design matrix $X$ becomes rank-deficient (or nearly so).\n\nWhat happens:\n- In linear regression: $(X^T X)$ is singular or nearly singular\n- No unique solution: infinitely many weight vectors give the same predictions\n- Huge coefficient variance: weights become unstable, tiny data changes cause wild swings\n\nExample:\n$$\text{Feature 1: Height in cm}$$\n$$\text{Feature 2: Height in inches} = 0.394 \times \text{Feature 1}$$\n\nThese are perfectly linearly dependent. The design matrix has a column that's a scalar multiple of another. Rank is reduced by 1.\n\nSolutions: Remove redundant features, use regularization (ridge regression), apply PCA to decorrelate.

Computing Rank in Practice

In theory, rank is a clean integer. In practice with floating-point arithmetic, it becomes murky. Let's understand the challenges and solutions.

The numerical challenge:\n\nConsider a matrix with singular values [3.0, 2.0, 0.0000001]. Is the rank 2 or 3?\n\n- Mathematically, if that tiny value came from exact computation, rank = 3\n- But it might be numerical noise from a matrix that's truly rank 2\n\nThere's no perfect answer. We must choose a numerical rank based on a tolerance threshold.

Methods for Computing Rank

•Row Reduction (Gaussian Elimination): Count non-zero pivots. Simple but numerically unstable for ill-conditioned matrices.
•SVD (Singular Value Decomposition): Compute singular values, count those above threshold. Most numerically stable method.
•QR Decomposition with Pivoting: Count non-negligible diagonal elements of R. Good balance of speed and stability.
•Eigenvalue Decomposition (for $A^T A$): Count non-zero eigenvalues. Works because eigenvalues of $A^T A$ are squared singular values.

numerical_rank.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
 
# Computing numerical rank using SVD
def numerical_rank(A, tol=None):
    """
    Compute numerical rank using SVD.
    
    Parameters:
    -----------
    A : array-like
        Input matrix
    tol : float, optional
        Threshold for singular values. 
        Default: max(m,n) * max(singular_values) * machine_epsilon
    
    Returns:
    --------
    rank : int
        Numerical rank of A
    """
    A = np.asarray(A)
    s = np.linalg.svd(A, compute_uv=False)
    
    if tol is None:
        tol = max(A.shape) * np.max(s) * np.finfo(A.dtype).eps
    
    rank = np.sum(s > tol)
    return rank
 
# Example
A = np.array([[1, 2, 3],
              [2, 4, 6],   # Linearly dependent (2 × row 1)
              [1, 1, 1]])
 
print(f"Rank of A: {numerical_rank(A)}")
print(f"Singular values: {np.linalg.svd(A, compute_uv=False)}")
 
# Nearly rank-deficient matrix
B = np.array([[1, 2], [2, 4.0001]])  # Almost singular
print(f"\nRank of B: {np.linalg.matrix_rank(B)}")
print(f"Singular values of B: {np.linalg.svd(B, compute_uv=False)}")

Condition Number Warning

The ratio of largest to smallest singular value (condition number) indicates numerical sensitivity. If condition number is $10^{16}$ or larger, the matrix is effectively singular in double precision. Even if mathematically full rank, computations will be unreliable.

Summary: Rank and Nullity

Rank and nullity provide deep insight into what a linear transformation preserves and destroys. Here are the essential takeaways:

Key Takeaways

•Rank = dimension of output — The dimension of the column space; how many independent directions the transformation produces.
•Nullity = dimension of collapse — The dimension of the null space; how many directions map to zero.
•Rank + Nullity = Input dimension — The rank-nullity theorem: dimensions are conserved, just redistributed.
•Four fundamental subspaces — Column space, null space, row space, left null space. They come in orthogonal pairs.
•Rank determines solvability — Solutions exist iff $\mathbf{b}$ is in the column space. Uniqueness requires nullity = 0.
•Rank deficiency causes ML problems — Multicollinearity, underdetermined systems, singular covariances, gradient issues.
•Numerical rank is approximate — Use SVD with tolerance thresholds to handle floating-point reality.

What's next:\n\nWe've seen that rank deficiency causes matrices to be non-invertible. But what if a matrix IS invertible? The next page explores inverse matrices in depth—what they mean geometrically, when they exist, how to compute them, and their critical role in solving linear systems.

Page Complete

You now understand rank and nullity as measures of dimensional preservation and collapse. These aren't abstract concepts—they determine whether your linear regression is stable, whether your covariance matrix is invertible, and whether your optimization problem has a unique solution. Next: the theory and practice of matrix inverses.

4 / 5

Loading learning content...

Machine LearningLinear Algebra

Matrices and Linear Transformations

LevelIntermediate

Duration60 mins

TopicLinear Algebra

4 / 5

Rank and Nullity

What Gets Preserved, What Gets Lost

What You Will Master

Column Space and Rank

Computing Rank by InspectionConsider this $3 \\times 3$ matrix:

Input

$$A = \\begin{bmatrix} 1 & 2 & 3 \\\\ 2 & 4 & 6 \\\\ 1 & 1 & 1 \\end{bmatrix}$$

Output

Column 2 = 2 × Column 1. Column 3 = 3 × Column 1.\n\nOnly column 1 and column 3 are independent... wait, let's check: are columns 1 and 3 independent?\n\nRow 3: [1, 1, 1]. Columns 1 and 3 are both contributions to this row.\n\nActually, we need row reduction to be sure. After reduction: rank = 2.

Explanation

Row 2 is 2× Row 1, so it's redundant. The remaining two rows are independent. Rank = 2. The transformation collapses 3D space to a 2D plane.

Rank via Row Reduction

Null Space and Nullity

Null Space and Non-Uniqueness

Nullity and Solution Behavior
Nullity	Null Space	Solutions to $A\mathbf{x} = \mathbf{b}$
$0$	Just $\{\mathbf{0}\}$	At most one solution (unique if it exists)
$> 0$	A subspace of dimension > 0	Either no solution or infinitely many
$= n$ (full dimension)	Entire $\mathbb{R}^n$	$A = O$ (zero matrix); $\mathbf{b} = \mathbf{0}$ gives all solutions

The Rank-Nullity Theorem

Rank-Nullity in ActionFor a 5×4 matrix with rank 3:

Input

$A$ is $5 \\times 4$, so domain is $\\mathbb{R}^4$ ($n = 4$).\nGiven: $\\text{rank}(A) = 3$.

Output

By rank-nullity: $\\text{nullity}(A) = 4 - 3 = 1$.\n\nThe null space is 1-dimensional (a line through the origin in $\\mathbb{R}^4$).\nThe column space is 3-dimensional (a 3D hyperplane in $\\mathbb{R}^5$).

Explanation

Four input dimensions → three output dimensions + one collapsed dimension. The transformation projects 4D space onto a 3D subspace, with one direction getting flattened to zero.

Full Rank Matrices

The Four Fundamental Subspaces

Every matrix $A_{m \times n}$ defines four fundamental subspaces that completely characterize its behavior. Understanding these is essential for grasping linear algebra's deepest structure.

The Four Fundamental Subspaces
Subspace	Definition	Dimension	Lives In
Column Space $C(A)$	All $A\mathbf{x}$ for $\mathbf{x} \in \mathbb{R}^n$	$r$ (rank)	$\mathbb{R}^m$
Null Space $N(A)$	All $\mathbf{x}$ with $A\mathbf{x} = \mathbf{0}$	$n - r$	$\mathbb{R}^n$
Row Space $C(A^T)$	All $A^T\mathbf{y}$ for $\mathbf{y} \in \mathbb{R}^m$	$r$ (rank)	$\mathbb{R}^n$
Left Null Space $N(A^T)$	All $\mathbf{y}$ with $A^T\mathbf{y} = \mathbf{0}$	$m - r$	$\mathbb{R}^m$

Dimension Accounting

Rank and Solvability of Linear Systems

Rank is the key to understanding when the system $A\mathbf{x} = \mathbf{b}$ has solutions and how many.

Complete Classification of $A\\mathbf{x} = \\mathbf{b}$ Solutions
Condition	Solutions	Interpretation
$\mathbf{b} \notin C(A)$	None	$\mathbf{b}$ unreachable by any linear combination
$\mathbf{b} \in C(A)$, nullity = 0	Exactly one	Unique representation
$\mathbf{b} \in C(A)$, nullity > 0	Infinitely many	Add any null space vector to a particular solution

Square Matrices: The Simple Case

Rank Deficiency in Machine Learning

Rank deficiency—when a matrix doesn't have full rank—causes problems throughout machine learning. Understanding when and why it happens helps you diagnose and fix issues.

Computing Rank in Practice

In theory, rank is a clean integer. In practice with floating-point arithmetic, it becomes murky. Let's understand the challenges and solutions.

Methods for Computing Rank

•Row Reduction (Gaussian Elimination): Count non-zero pivots. Simple but numerically unstable for ill-conditioned matrices.
•SVD (Singular Value Decomposition): Compute singular values, count those above threshold. Most numerically stable method.
•QR Decomposition with Pivoting: Count non-negligible diagonal elements of R. Good balance of speed and stability.
•Eigenvalue Decomposition (for $A^T A$): Count non-zero eigenvalues. Works because eigenvalues of $A^T A$ are squared singular values.

numerical_rank.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
 
# Computing numerical rank using SVD
def numerical_rank(A, tol=None):
    """
    Compute numerical rank using SVD.
    
    Parameters:
    -----------
    A : array-like
        Input matrix
    tol : float, optional
        Threshold for singular values. 
        Default: max(m,n) * max(singular_values) * machine_epsilon
    
    Returns:
    --------
    rank : int
        Numerical rank of A
    """
    A = np.asarray(A)
    s = np.linalg.svd(A, compute_uv=False)
    
    if tol is None:
        tol = max(A.shape) * np.max(s) * np.finfo(A.dtype).eps
    
    rank = np.sum(s > tol)
    return rank
 
# Example
A = np.array([[1, 2, 3],
              [2, 4, 6],   # Linearly dependent (2 × row 1)
              [1, 1, 1]])
 
print(f"Rank of A: {numerical_rank(A)}")
print(f"Singular values: {np.linalg.svd(A, compute_uv=False)}")
 
# Nearly rank-deficient matrix
B = np.array([[1, 2], [2, 4.0001]])  # Almost singular
print(f"\nRank of B: {np.linalg.matrix_rank(B)}")
print(f"Singular values of B: {np.linalg.svd(B, compute_uv=False)}")

Condition Number Warning

Summary: Rank and Nullity

Rank and nullity provide deep insight into what a linear transformation preserves and destroys. Here are the essential takeaways:

Key Takeaways

•Rank = dimension of output — The dimension of the column space; how many independent directions the transformation produces.
•Nullity = dimension of collapse — The dimension of the null space; how many directions map to zero.
•Rank + Nullity = Input dimension — The rank-nullity theorem: dimensions are conserved, just redistributed.
•Four fundamental subspaces — Column space, null space, row space, left null space. They come in orthogonal pairs.
•Rank determines solvability — Solutions exist iff $\mathbf{b}$ is in the column space. Uniqueness requires nullity = 0.
•Rank deficiency causes ML problems — Multicollinearity, underdetermined systems, singular covariances, gradient issues.
•Numerical rank is approximate — Use SVD with tolerance thresholds to handle floating-point reality.

Page Complete

4 / 5