Machine LearningVectors and Vector Spaces

Vectors and Vector Spaces: The Language of Machine Learning

LevelBeginner

Duration90 mins

TopicVectors and Vector Spaces

5 / 5

Vector Spaces and Subspaces

The Abstract Foundation

So far, we've worked with vectors as concrete objects—ordered lists of numbers in $\mathbb{R}^n$. But the true power of linear algebra emerges from abstraction: recognizing that many different mathematical objects behave like vectors.

Polynomials. Functions. Matrices. Signals. Probability distributions. All of these can form vector spaces—systems where addition and scalar multiplication make sense and follow the same fundamental rules.

This abstraction is not merely academic elegance. Machine learning algorithms are designed to work in vector spaces, and understanding the abstract structure explains why these algorithms apply equally well to image pixels, word embeddings, graph features, or any other vectorizable data.

This page formalizes the concept of vector spaces and subspaces, explores basis and dimension, and introduces the fundamental subspaces that arise from matrices—structures that directly determine the behavior of linear transformations in ML.

What You Will Learn

By the end of this page, you will understand the formal axioms defining vector spaces, recognize vector spaces beyond ℝⁿ (functions, polynomials, matrices), master the concepts of subspace, basis, and dimension, understand the four fundamental subspaces of a matrix, and connect these abstractions to concrete ML applications.

Vector Space Axioms: The Complete Definition

Formal definition:

A vector space (over the real numbers $\mathbb{R}$) is a set $V$ together with two operations:

Vector addition: $+: V \times V \to V$
Scalar multiplication: $\cdot: \mathbb{R} \times V \to V$

satisfying the following eight axioms for all vectors $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$ and all scalars $c, d \in \mathbb{R}$:

The Eight Vector Space Axioms
	Axiom Name	Formula	Meaning
1	Additive closure	$\mathbf{u} + \mathbf{v} \in V$	Sum of vectors is a vector
2	Additive commutativity	$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$	Order of addition doesn't matter
3	Additive associativity	$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$	Grouping doesn't matter
4	Additive identity	$\exists \mathbf{0}: \mathbf{v} + \mathbf{0} = \mathbf{v}$	Zero vector exists
5	Additive inverse	$\exists (-\mathbf{v}): \mathbf{v} + (-\mathbf{v}) = \mathbf{0}$	Negation exists
6	Scalar multiplicative closure	$c\mathbf{v} \in V$	Scaling produces a vector
7	Distributivity (scalars)	$(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$	Scalar addition distributes
8	Distributivity (vectors)	$c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$	Vector addition distributes

Additional axioms (sometimes listed as 9 and 10):

Scalar multiplication associativity: $(cd)\mathbf{v} = c(d\mathbf{v})$
Multiplicative identity: $1 \cdot \mathbf{v} = \mathbf{v}$

Why These Axioms?

These axioms capture exactly what's needed for linear algebra to work. They ensure that linear combinations, span, linear independence, and all the tools we've developed make sense. Any set satisfying these axioms inherits the entire theory of linear algebra—that's the power of abstraction.

Immediate consequences of the axioms:

Zero vector is unique: There's exactly one zero vector in any vector space
Additive inverse is unique: Each vector has exactly one negative
$0 \cdot \mathbf{v} = \mathbf{0}$: Scaling by zero gives the zero vector
$c \cdot \mathbf{0} = \mathbf{0}$: Scaling zero gives zero
$(-1)\mathbf{v} = -\mathbf{v}$: Scaling by -1 gives the additive inverse

These aren't axioms—they're theorems proven from the axioms.

Examples of Vector Spaces

The axioms apply far beyond lists of numbers. Here are important examples of vector spaces:

1. $\mathbb{R}^n$ — n-dimensional real coordinate space

The canonical example. Vectors are n-tuples $(x_1, \ldots, x_n)$, addition and scaling are component-wise. All axioms are satisfied.

2. $\mathbb{P}_n$ — Polynomials of degree at most n

Vectors are polynomials $p(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n$. Addition and scaling are the natural operations on polynomials. The zero vector is the zero polynomial.

3. $\mathcal{C}[a,b]$ — Continuous functions on $[a,b]$

Vectors are continuous functions. Addition: $(f+g)(x) = f(x) + g(x)$. Scaling: $(cf)(x) = c \cdot f(x)$. The zero vector is the function $f(x) = 0$.

4. $M_{m \times n}$ — Matrices of size $m \times n$

Vectors are matrices. Addition and scaling are entry-wise (same as viewing the matrix as a vector of $mn$ entries).

Why This Matters for ML

Neural networks learn functions. Kernel methods work in function spaces. Attention operates on sequences (which can be viewed as functions from position to embedding). Recognizing these as vector spaces means linear algebra tools—projection, orthogonality, basis expansion—apply directly.

5. Solution space of homogeneous linear system

The set of all solutions to $A\mathbf{x} = \mathbf{0}$ forms a vector space (the null space of $A$).

6. Sequences with finite support

Sequences $(a_0, a_1, a_2, \ldots)$ where only finitely many terms are non-zero. This is the foundation of some signal processing applications.

Non-examples (things that are NOT vector spaces):

Sets That Are NOT Vector Spaces
Set	Why It Fails
${(x, y) : x \geq 0}$ (first quadrant)	Not closed under negation: $(-1)(1, 0) = (-1, 0)$ not in set
${(x, y) : x^2 + y^2 = 1}$ (unit circle)	Not closed under addition: $(1,0) + (0,1) = (1,1)$ not on circle
${(x, y, z) : x + y + z = 1}$ (plane not through origin)	Zero vector $(0,0,0)$ not in set (would need $0+0+0=1$)
Polynomials of degree exactly n	Not closed under addition: $x^n + (-x^n) = 0$ has degree undefined

vector_space_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
from numpy.polynomial import polynomial as P
 
# Example 1: R^n as vector space
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
print("R^n example:")
print(f"v1 + v2 = {v1 + v2}")
print(f"2 * v1 = {2 * v1}")
 
# Example 2: Polynomials as vector space
# NumPy represents polynomials by coefficients [a0, a1, a2, ...]
p1 = np.array([1, 2, 3])  # 1 + 2x + 3x^2
p2 = np.array([0, 1, -1]) # 0 + x - x^2
 
print("\nPolynomial vector space:")
print(f"p1 = 1 + 2x + 3x^2")
print(f"p2 = x - x^2")
print(f"p1 + p2 = {p1 + p2}  ->  1 + 3x + 2x^2")
print(f"2 * p1 = {2 * p1}  ->  2 + 4x + 6x^2")
 
# Example 3: Matrices as vector space
M1 = np.array([[1, 2], [3, 4]])
M2 = np.array([[0, 1], [1, 0]])
 
print("\nMatrix vector space:")
print(f"M1 + M2:\n{M1 + M2}")
print(f"3 * M1:\n{3 * M1}")
 
# Example 4: Null space as vector space
A = np.array([[1, 2, 3], [4, 5, 6]])
# Null space: solutions to Ax = 0
from scipy.linalg import null_space
ns = null_space(A)
print(f"\nNull space of A (dimension {ns.shape[1]}):")
if ns.shape[1] > 0:
    print(f"Basis vector:\n{ns[:, 0]}")
    print(f"Verification A @ null_vector = {A @ ns[:, 0]}")

Subspaces: Vector Spaces Within Vector Spaces

Definition of subspace:

A subspace $W$ of a vector space $V$ is a non-empty subset of $V$ that is itself a vector space under the same operations.

Subspace test (simpler than checking all axioms):

To verify that $W$ is a subspace of $V$, check:

Non-empty: $W$ contains at least one vector (the zero vector suffices)
Closed under addition: If $\mathbf{u}, \mathbf{v} \in W$, then $\mathbf{u} + \mathbf{v} \in W$
Closed under scalar multiplication: If $\mathbf{v} \in W$ and $c \in \mathbb{R}$, then $c\mathbf{v} \in W$

Or equivalently (combining 2 and 3):

$W$ is non-empty
For all $\mathbf{u}, \mathbf{v} \in W$ and all $c, d \in \mathbb{R}$: $c\mathbf{u} + d\mathbf{v} \in W$

Zero Vector Is Required

Every subspace must contain the zero vector (set c=0 in the scalar multiplication closure). This is why planes through the origin are subspaces, but planes that don't pass through the origin are NOT subspaces—they lack the zero vector.

Examples of subspaces:

In $\mathbb{R}^3$:

${\mathbf{0}}$ — The trivial subspace (just the origin)
Any line through the origin — A 1D subspace
Any plane through the origin — A 2D subspace
$\mathbb{R}^3$ itself — The full space is a subspace of itself

In polynomial space $\mathbb{P}_n$:

Polynomials with $p(0) = 0$ — A subspace
Polynomials where the coefficient of $x^k$ is zero — A subspace

The span of any set of vectors is always a subspace. This is fundamental: given any vectors, their span is the smallest subspace containing them.

Subspace Examples and Non-Examples in ℝ³
Set	Subspace?	Reason
${(x, y, z) : x + y + z = 0}$	Yes	Plane through origin; closed under +, ×
${(x, y, z) : x + y + z = 1}$	No	Doesn't contain (0,0,0)
${(x, y, 0) : x, y \in \mathbb{R}}$ (xy-plane)	Yes	Plane through origin
${(x, x, x) : x \in \mathbb{R}}$ (line $x=y=z$)	Yes	Line through origin
${(x, y, z) : xyz = 0}$	No	Not closed: $(1,0,0) + (0,1,0) = (1,1,0)$ not in set

subspace_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
 
def is_likely_subspace(vectors, test_vectors=100, test_scalars=10):
    """
    Heuristic test if vectors seem to span a subspace.
    Tests random linear combinations stay in approximate span.
    (Not a proof, just a check!)
    """
    A = np.column_stack(vectors)
    dim = A.shape[0]
    
    # Generate random linear combinations
    for _ in range(test_vectors):
        coeffs = np.random.randn(len(vectors))
        lc = A @ coeffs
        
        # Check if lc can be expressed as linear combination
        # (should always be true for span)
        residual = np.linalg.lstsq(A, lc, rcond=None)[1]
        if len(residual) > 0 and residual[0] > 1e-10:
            return False
    
    return True
 
# The span of these vectors is a subspace
v1 = np.array([1, 0, 0])
v2 = np.array([0, 1, 0])
print(f"Span of v1, v2 likely subspace: {is_likely_subspace([v1, v2])}")
 
# Example: checking if a vector is in a subspace
def in_subspace(vector, basis_vectors, tol=1e-10):
    """Check if vector is in the subspace spanned by basis_vectors."""
    A = np.column_stack(basis_vectors)
    coeffs, residuals, rank, s = np.linalg.lstsq(A, vector, rcond=None)
    reconstruction = A @ coeffs
    error = np.linalg.norm(vector - reconstruction)
    return error < tol
 
# xy-plane is a subspace of R^3
xy_basis = [np.array([1, 0, 0]), np.array([0, 1, 0])]
 
test1 = np.array([3, 4, 0])  # In xy-plane
test2 = np.array([3, 4, 5])  # Not in xy-plane
 
print(f"\n{test1} in xy-plane: {in_subspace(test1, xy_basis)}")
print(f"{test2} in xy-plane: {in_subspace(test2, xy_basis)}")

Basis and Dimension

Definition of basis:

A basis for a vector space $V$ is a set of vectors ${\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_n}$ that is:

Linearly independent: No vector is a linear combination of the others
Spanning: Every vector in $V$ can be expressed as a linear combination of the basis vectors

In other words, a basis is a minimal spanning set or equivalently a maximal independent set.

Definition of dimension:

The dimension of a vector space $V$, denoted $\dim(V)$, is the number of vectors in any basis of $V$.

Remarkable fact: Every basis of the same vector space has the same number of vectors. This is why dimension is well-defined.

Basis = Coordinate System

A basis provides a coordinate system. Every vector has unique coordinates (coefficients) in a given basis. Changing basis changes coordinates but not the underlying vector. This is foundational for understanding transformations in ML.

Key theorem (Unique Representation):

If ${\mathbf{b}_1, \ldots, \mathbf{b}_n}$ is a basis for $V$, then every vector $\mathbf{v} \in V$ can be written uniquely as:

$$\mathbf{v} = c_1 \mathbf{b}_1 + c_2 \mathbf{b}_2 + \cdots + c_n \mathbf{b}_n$$

The scalars $(c_1, c_2, \ldots, c_n)$ are called the coordinates of $\mathbf{v}$ with respect to this basis.

Standard bases:

Standard Bases for Common Vector Spaces
Vector Space	Standard Basis	Dimension
$\mathbb{R}^n$	${\mathbf{e}_1, \ldots, \mathbf{e}_n}$	$n$
$\mathbb{P}_n$ (polynomials ≤ degree n)	${1, x, x^2, \ldots, x^n}$	$n+1$
$M_{m \times n}$ (matrices)	Matrices with single 1	$mn$
$\mathcal{C}[a,b]$ (continuous functions)	None finite!	$\infty$ (infinite-dimensional)

Properties of bases:

Any n independent vectors in $\mathbb{R}^n$ form a basis
Any n spanning vectors in $\mathbb{R}^n$ form a basis (if they span, there must be exactly n)
Fewer than n vectors cannot span $\mathbb{R}^n$
More than n vectors in $\mathbb{R}^n$ cannot be independent

Infinite-dimensional spaces:

Some vector spaces (like continuous functions) have no finite basis—they're infinite-dimensional. This is important for understanding function spaces in ML (e.g., Gaussian processes, kernel methods).

basis_dimension.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
 
# Standard basis for R^3
e1, e2, e3 = np.eye(3)
print("Standard basis for R^3:")
print(f"e1 = {e1}, e2 = {e2}, e3 = {e3}")
 
# Express a vector in standard basis coordinates
v = np.array([3, -2, 5])
# Trivially: v = 3*e1 + (-2)*e2 + 5*e3
# The coordinates ARE the components in standard basis
 
# Non-standard basis
b1 = np.array([1, 1, 0])
b2 = np.array([1, 0, 1])
b3 = np.array([0, 1, 1])
 
# Is this a basis? Check independence
B = np.column_stack([b1, b2, b3])
det_B = np.linalg.det(B)
print(f"\nNon-standard basis:")
print(f"det(B) = {det_B:.4f} (non-zero => independent => basis)")
 
# Find coordinates of v in this basis
# Solve: B @ coords = v
coords = np.linalg.solve(B, v)
print(f"\nVector v = {v}")
print(f"Coordinates in B-basis: {coords}")
print(f"Verification: {coords[0]}*b1 + {coords[1]}*b2 + {coords[2]}*b3 = {coords @ B.T}")
 
# Dimension of a subspace (e.g., column space)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])
rank = np.linalg.matrix_rank(A)
print(f"\nMatrix A is {A.shape[0]}x{A.shape[1]}")
print(f"Dimension of column space (rank): {rank}")
print(f"Dimension of null space: {A.shape[1] - rank}")

The Four Fundamental Subspaces of a Matrix

Every matrix $A \in \mathbb{R}^{m \times n}$ has four fundamental subspaces that completely characterize its behavior. Understanding these is essential for understanding linear transformations, least squares, SVD, and many ML algorithms.

The four subspaces:

1. Column Space (Range): $\mathcal{C}(A)$ or $\text{col}(A)$

Definition: Span of the columns of $A$
Lives in: $\mathbb{R}^m$
Dimension: $r$ (the rank of $A$)
Meaning: All vectors $\mathbf{b}$ for which $A\mathbf{x} = \mathbf{b}$ has a solution

2. Null Space (Kernel): $\mathcal{N}(A)$ or $\ker(A)$

Definition: All solutions to $A\mathbf{x} = \mathbf{0}$
Lives in: $\mathbb{R}^n$
Dimension: $n - r$ (nullity)
Meaning: Inputs that the transformation $A$ maps to zero

3. Row Space: $\mathcal{C}(A^\top)$

Definition: Span of the rows of $A$ (= columns of $A^\top$)
Lives in: $\mathbb{R}^n$
Dimension: $r$
Meaning: The "effective" input directions

4. Left Null Space: $\mathcal{N}(A^\top)$

Definition: All solutions to $A^\top\mathbf{y} = \mathbf{0}$
Lives in: $\mathbb{R}^m$
Dimension: $m - r$
Meaning: Output directions not reachable by $A$

The Fundamental Theorem of Linear Algebra

The row space and null space are orthogonal complements in ℝⁿ (they partition the input space). The column space and left null space are orthogonal complements in ℝᵐ (they partition the output space). This orthogonality is the foundation of least squares and pseudoinverses.

Dimension relationships:

$$\dim(\mathcal{C}(A)) + \dim(\mathcal{N}(A)) = n \quad \text{(number of columns)}$$ $$\dim(\mathcal{C}(A^\top)) + \dim(\mathcal{N}(A^\top)) = m \quad \text{(number of rows)}$$ $$\dim(\mathcal{C}(A)) = \dim(\mathcal{C}(A^\top)) = r \quad \text{(rank)}$$

ML interpretation:

For a data matrix $X$ or weight matrix $W$:

Column space: What outputs/predictions are achievable
Null space: What input variations have no effect (information lost)
Row space: What input directions matter for the output
Left null space: What output directions are unreachable

fundamental_subspaces.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
from scipy.linalg import null_space, orth
 
# Matrix for demonstration
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
 
m, n = A.shape
rank = np.linalg.matrix_rank(A)
 
print(f"Matrix A ({m}x{n}), rank = {rank}")
print(f"A:\n{A}")
 
# 1. Column space
col_space = orth(A)  # Orthonormal basis for column space
print(f"\n1. Column space basis (dimension {col_space.shape[1]}):")
print(f"{col_space}")
 
# 2. Null space
null_sp = null_space(A)
print(f"\n2. Null space basis (dimension {null_sp.shape[1]}):")
if null_sp.shape[1] > 0:
    print(f"{null_sp}")
    # Verify: A @ null_vector should be zero
    print(f"Verification A @ null = {A @ null_sp[:, 0]}")
 
# 3. Row space (= column space of A^T)
row_space = orth(A.T)
print(f"\n3. Row space basis (dimension {row_space.shape[1]}):")
print(f"{row_space}")
 
# 4. Left null space (= null space of A^T)
left_null = null_space(A.T)
print(f"\n4. Left null space basis (dimension {left_null.shape[1]}):")
if left_null.shape[1] > 0:
    print(f"{left_null}")
 
# Verify orthogonality: row space ⊥ null space
if null_sp.shape[1] > 0:
    dot = row_space.T @ null_sp
    print(f"\nRow space ⊥ Null space? Max dot product: {np.abs(dot).max():.2e}")
 
# Dimension check
print(f"\nDimension check:")
print(f"col_space + null_space = {col_space.shape[1]} + {null_sp.shape[1]} = {col_space.shape[1] + null_sp.shape[1]} = n = {n}")
print(f"row_space + left_null = {row_space.shape[1]} + {left_null.shape[1]} = {row_space.shape[1] + left_null.shape[1]} = m = {m}")

Change of Basis

The same vector has different coordinates in different bases. Change of basis describes how coordinates transform when we switch from one basis to another.

Setup:

Let $\mathcal{B} = {\mathbf{b}_1, \ldots, \mathbf{b}_n}$ and $\mathcal{C} = {\mathbf{c}_1, \ldots, \mathbf{c}_n}$ be two bases for a vector space $V$.

A vector $\mathbf{v}$ has:

Coordinates $[\mathbf{v}]_\mathcal{B} = (\alpha_1, \ldots, \alpha_n)$ in basis $\mathcal{B}$
Coordinates $[\mathbf{v}]_\mathcal{C} = (\beta_1, \ldots, \beta_n)$ in basis $\mathcal{C}$

The change of basis matrix:

$$[\mathbf{v}]\mathcal{C} = P{\mathcal{B} \to \mathcal{C}} \cdot [\mathbf{v}]_\mathcal{B}$$

where $P_{\mathcal{B} \to \mathcal{C}}$ is the change of basis matrix from $\mathcal{B}$ to $\mathcal{C}$.

To construct $P_{\mathcal{B} \to \mathcal{C}}$:

Express each basis vector $\mathbf{b}_j$ in terms of basis $\mathcal{C}$
These coordinates form the columns of $P$

Change of Basis in ML

PCA is a change of basis! It transforms data from the standard basis to the basis of principal components. The transform matrix contains the principal components as columns. This new basis aligns with the directions of maximum variance.

Special case: Standard basis to custom basis

If $\mathcal{B}$ is custom and $\mathcal{C}$ is the standard basis:

Place basis vectors of $\mathcal{B}$ as columns of matrix $B$
$[\mathbf{v}]\text{standard} = B \cdot [\mathbf{v}]\mathcal{B}$
$[\mathbf{v}]\mathcal{B} = B^{-1} \cdot [\mathbf{v}]\text{standard}$

Transformation in different bases:

A linear transformation $T$ has different matrix representations in different bases: $$[T]\mathcal{C} = P^{-1} [T]\mathcal{B} P$$

This is similarity transformation—the transformation is the same, but its matrix representation changes with basis.

change_of_basis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np
 
# Custom basis for R^2
b1 = np.array([1, 1])
b2 = np.array([1, -1])
B = np.column_stack([b1, b2])
 
print("Custom basis B:")
print(f"b1 = {b1}")
print(f"b2 = {b2}")
 
# A vector in standard coordinates
v_standard = np.array([3, 1])
 
# Find coordinates in B basis
# v = B @ coords => coords = B^(-1) @ v
v_B = np.linalg.solve(B, v_standard)
 
print(f"\nVector v in standard coords: {v_standard}")
print(f"Vector v in B coords: {v_B}")
print(f"Verification: {v_B[0]}*b1 + {v_B[1]}*b2 = {v_B[0]*b1 + v_B[1]*b2}")
 
# Change of basis matrix from standard to B
P_std_to_B = np.linalg.inv(B)
print(f"\nChange of basis matrix (std -> B):\n{P_std_to_B}")
print(f"P @ v_standard = {P_std_to_B @ v_standard}")
 
# PCA as change of basis
from sklearn.decomposition import PCA
 
# Generate some 2D data
np.random.seed(42)
data = np.random.randn(100, 2) @ np.array([[2, 1], [1, 2]])
 
pca = PCA()
pca.fit(data)
 
print(f"\nPCA as change of basis:")
print(f"Principal components (new basis vectors):")
print(pca.components_)
 
# Transform data to PC coordinates
data_pca = pca.transform(data)
print(f"Original data shape: {data.shape}")
print(f"PCA-transformed shape: {data_pca.shape}")
 
# The components form change-of-basis matrix
# data_pca = data @ components.T (approximately, after centering)

Vector Space Concepts in Machine Learning

The abstract framework of vector spaces illuminates many ML concepts.

1. Feature spaces:

Each data point lives in a feature space $\mathbb{R}^d$. The dimension $d$ is the number of features. Understanding this as a vector space explains:

Why linear models find hyperplanes (subspaces)
Why feature engineering changes the space structure
Why dimensionality reduction projects to subspaces

2. Weight spaces:

Neural network weights form vectors in high-dimensional spaces. The loss surface is defined on this weight space. Optimization navigates this space.

Vector Space Concepts in ML
Concept	Vector Space Term	ML Application
Data matrix columns	Span	What predictions are possible
Redundant features	Linear dependence	Multicollinearity, compress features
independent features	Dimension	Intrinsic complexity of data
PCA components	Orthonormal basis	New coordinates maximizing variance
Regularization	Null space constraint	Prefer solutions with zero components in null space
Embeddings	Subspace representation	Map high-dim to low-dim while preserving structure

Function Spaces in ML

Gaussian processes and kernel methods work in infinite-dimensional function spaces. The kernel defines an inner product, making it a Hilbert space. Understanding these as vector spaces (with infinite-dimensional bases) explains why linear algebra tools like projection and orthogonality still apply.

3. Null space and solutions:

For $A\mathbf{x} = \mathbf{b}$:

If solution exists, all solutions have form $\mathbf{x}_p + \mathbf{n}$ where $\mathbf{n}$ is in null space
Regularization picks among these solutions (prefer small $|\mathbf{x}|$)

4. Rank and expressivity:

Low-rank approximations compress data (SVD, matrix factorization)
Rank of weight matrix limits layer's expressive power
Rank of feature matrix determines degrees of freedom in fitting

ml_vector_spaces.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import numpy as np
from sklearn.linear_model import Ridge, LinearRegression
from scipy.linalg import null_space
 
# Underdetermined system: infinitely many solutions
# The null space determines the freedom we have
np.random.seed(42)
 
# More features than samples
n_samples, n_features = 50, 100
X = np.random.randn(n_samples, n_features)
y = np.random.randn(n_samples)
 
print(f"X shape: {X.shape} (underdetermined)")
print(f"Rank of X: {np.linalg.matrix_rank(X)}")
print(f"Null space dimension: {n_features - np.linalg.matrix_rank(X)}")
 
# Any solution + null space vector is also a solution
# Regularization picks one by minimizing ||w||
 
# Unregularized (picks minimum norm solution)
lr = LinearRegression(fit_intercept=False)
lr.fit(X, y)
print(f"\nLinear regression ||w||: {np.linalg.norm(lr.coef_):.4f}")
 
# Ridge regression (stronger regularization = smaller weights)
for alpha in [0.01, 0.1, 1.0, 10.0]:
    ridge = Ridge(alpha=alpha, fit_intercept=False)
    ridge.fit(X, y)
    print(f"Ridge (alpha={alpha}) ||w||: {np.linalg.norm(ridge.coef_):.4f}")
 
# Low-rank approximation: compress data
from sklearn.decomposition import TruncatedSVD
 
data = np.random.randn(1000, 100)
original_rank = np.linalg.matrix_rank(data)
 
for k in [10, 20, 50]:
    svd = TruncatedSVD(n_components=k)
    data_reduced = svd.fit_transform(data)
    data_reconstructed = svd.inverse_transform(data_reduced)
    
    reconstruction_error = np.linalg.norm(data - data_reconstructed, 'fro')
    print(f"\nRank-{k} approximation error: {reconstruction_error:.2f}")

Summary: The Abstract Framework

We've completed our journey through vectors and vector spaces—from concrete ordered lists to abstract mathematical structures. This framework underlies all of linear algebra and machine learning.

Key Takeaways

•Vector spaces are sets with addition and scalar multiplication satisfying eight axioms—enabling all linear algebra tools
•Subspaces are subsets that are vector spaces themselves—closed under addition and scalar multiplication, containing zero
•Basis is a minimal spanning set (independent and spanning)—provides unique coordinates for all vectors
•Dimension is the number of basis vectors—fixed for any vector space, measures 'degrees of freedom'
•The four fundamental subspaces (column, null, row, left null) completely characterize a matrix's behavior
•Change of basis transforms coordinates between bases—PCA, Fourier transforms, and whitening are examples
•In ML, these concepts explain feature spaces, dimensionality reduction, rank constraints, and regularization

Module complete!

You've now mastered the foundational concepts of vectors and vector spaces. This knowledge prepares you for the next modules on matrices, linear transformations, eigenvalues, and singular value decomposition—the tools that bring linear algebra to life in machine learning.

Coming next: Matrices and Linear Transformations—how matrices encode functions, transform spaces, and enable the computations that power machine learning.

Module Complete!

Congratulations! You've completed Module 1: Vectors and Vector Spaces. You now have the conceptual foundation for understanding how data is represented and transformed in machine learning—from simple feature vectors to the abstract structures that make algorithms work.

5 / 5

Loading learning content...

Machine LearningVectors and Vector Spaces

Vectors and Vector Spaces: The Language of Machine Learning

LevelBeginner

Duration90 mins

TopicVectors and Vector Spaces

5 / 5

Vector Spaces and Subspaces

The Abstract Foundation

What You Will Learn

Vector Space Axioms: The Complete Definition

Formal definition:

A vector space (over the real numbers $\mathbb{R}$) is a set $V$ together with two operations:

Vector addition: $+: V \times V \to V$
Scalar multiplication: $\cdot: \mathbb{R} \times V \to V$

satisfying the following eight axioms for all vectors $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$ and all scalars $c, d \in \mathbb{R}$:

The Eight Vector Space Axioms
	Axiom Name	Formula	Meaning
1	Additive closure	$\mathbf{u} + \mathbf{v} \in V$	Sum of vectors is a vector
2	Additive commutativity	$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$	Order of addition doesn't matter
3	Additive associativity	$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$	Grouping doesn't matter
4	Additive identity	$\exists \mathbf{0}: \mathbf{v} + \mathbf{0} = \mathbf{v}$	Zero vector exists
5	Additive inverse	$\exists (-\mathbf{v}): \mathbf{v} + (-\mathbf{v}) = \mathbf{0}$	Negation exists
6	Scalar multiplicative closure	$c\mathbf{v} \in V$	Scaling produces a vector
7	Distributivity (scalars)	$(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$	Scalar addition distributes
8	Distributivity (vectors)	$c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$	Vector addition distributes

Additional axioms (sometimes listed as 9 and 10):

Scalar multiplication associativity: $(cd)\mathbf{v} = c(d\mathbf{v})$
Multiplicative identity: $1 \cdot \mathbf{v} = \mathbf{v}$

Why These Axioms?

Immediate consequences of the axioms:

Zero vector is unique: There's exactly one zero vector in any vector space
Additive inverse is unique: Each vector has exactly one negative
$0 \cdot \mathbf{v} = \mathbf{0}$: Scaling by zero gives the zero vector
$c \cdot \mathbf{0} = \mathbf{0}$: Scaling zero gives zero
$(-1)\mathbf{v} = -\mathbf{v}$: Scaling by -1 gives the additive inverse

These aren't axioms—they're theorems proven from the axioms.

Examples of Vector Spaces

The axioms apply far beyond lists of numbers. Here are important examples of vector spaces:

1. $\mathbb{R}^n$ — n-dimensional real coordinate space

The canonical example. Vectors are n-tuples $(x_1, \ldots, x_n)$, addition and scaling are component-wise. All axioms are satisfied.

2. $\mathbb{P}_n$ — Polynomials of degree at most n

Vectors are polynomials $p(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n$. Addition and scaling are the natural operations on polynomials. The zero vector is the zero polynomial.

3. $\mathcal{C}[a,b]$ — Continuous functions on $[a,b]$

Vectors are continuous functions. Addition: $(f+g)(x) = f(x) + g(x)$. Scaling: $(cf)(x) = c \cdot f(x)$. The zero vector is the function $f(x) = 0$.

4. $M_{m \times n}$ — Matrices of size $m \times n$

Vectors are matrices. Addition and scaling are entry-wise (same as viewing the matrix as a vector of $mn$ entries).

Why This Matters for ML

5. Solution space of homogeneous linear system

The set of all solutions to $A\mathbf{x} = \mathbf{0}$ forms a vector space (the null space of $A$).

6. Sequences with finite support

Sequences $(a_0, a_1, a_2, \ldots)$ where only finitely many terms are non-zero. This is the foundation of some signal processing applications.

Non-examples (things that are NOT vector spaces):

Sets That Are NOT Vector Spaces
Set	Why It Fails
${(x, y) : x \geq 0}$ (first quadrant)	Not closed under negation: $(-1)(1, 0) = (-1, 0)$ not in set
${(x, y) : x^2 + y^2 = 1}$ (unit circle)	Not closed under addition: $(1,0) + (0,1) = (1,1)$ not on circle
${(x, y, z) : x + y + z = 1}$ (plane not through origin)	Zero vector $(0,0,0)$ not in set (would need $0+0+0=1$)
Polynomials of degree exactly n	Not closed under addition: $x^n + (-x^n) = 0$ has degree undefined

vector_space_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
from numpy.polynomial import polynomial as P
 
# Example 1: R^n as vector space
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
print("R^n example:")
print(f"v1 + v2 = {v1 + v2}")
print(f"2 * v1 = {2 * v1}")
 
# Example 2: Polynomials as vector space
# NumPy represents polynomials by coefficients [a0, a1, a2, ...]
p1 = np.array([1, 2, 3])  # 1 + 2x + 3x^2
p2 = np.array([0, 1, -1]) # 0 + x - x^2
 
print("\nPolynomial vector space:")
print(f"p1 = 1 + 2x + 3x^2")
print(f"p2 = x - x^2")
print(f"p1 + p2 = {p1 + p2}  ->  1 + 3x + 2x^2")
print(f"2 * p1 = {2 * p1}  ->  2 + 4x + 6x^2")
 
# Example 3: Matrices as vector space
M1 = np.array([[1, 2], [3, 4]])
M2 = np.array([[0, 1], [1, 0]])
 
print("\nMatrix vector space:")
print(f"M1 + M2:\n{M1 + M2}")
print(f"3 * M1:\n{3 * M1}")
 
# Example 4: Null space as vector space
A = np.array([[1, 2, 3], [4, 5, 6]])
# Null space: solutions to Ax = 0
from scipy.linalg import null_space
ns = null_space(A)
print(f"\nNull space of A (dimension {ns.shape[1]}):")
if ns.shape[1] > 0:
    print(f"Basis vector:\n{ns[:, 0]}")
    print(f"Verification A @ null_vector = {A @ ns[:, 0]}")

Subspaces: Vector Spaces Within Vector Spaces

Definition of subspace:

A subspace $W$ of a vector space $V$ is a non-empty subset of $V$ that is itself a vector space under the same operations.

Subspace test (simpler than checking all axioms):

To verify that $W$ is a subspace of $V$, check:

Non-empty: $W$ contains at least one vector (the zero vector suffices)
Closed under addition: If $\mathbf{u}, \mathbf{v} \in W$, then $\mathbf{u} + \mathbf{v} \in W$
Closed under scalar multiplication: If $\mathbf{v} \in W$ and $c \in \mathbb{R}$, then $c\mathbf{v} \in W$

Or equivalently (combining 2 and 3):

$W$ is non-empty
For all $\mathbf{u}, \mathbf{v} \in W$ and all $c, d \in \mathbb{R}$: $c\mathbf{u} + d\mathbf{v} \in W$

Zero Vector Is Required

Examples of subspaces:

In $\mathbb{R}^3$:

${\mathbf{0}}$ — The trivial subspace (just the origin)
Any line through the origin — A 1D subspace
Any plane through the origin — A 2D subspace
$\mathbb{R}^3$ itself — The full space is a subspace of itself

In polynomial space $\mathbb{P}_n$:

Polynomials with $p(0) = 0$ — A subspace
Polynomials where the coefficient of $x^k$ is zero — A subspace

The span of any set of vectors is always a subspace. This is fundamental: given any vectors, their span is the smallest subspace containing them.

Subspace Examples and Non-Examples in ℝ³
Set	Subspace?	Reason
${(x, y, z) : x + y + z = 0}$	Yes	Plane through origin; closed under +, ×
${(x, y, z) : x + y + z = 1}$	No	Doesn't contain (0,0,0)
${(x, y, 0) : x, y \in \mathbb{R}}$ (xy-plane)	Yes	Plane through origin
${(x, x, x) : x \in \mathbb{R}}$ (line $x=y=z$)	Yes	Line through origin
${(x, y, z) : xyz = 0}$	No	Not closed: $(1,0,0) + (0,1,0) = (1,1,0)$ not in set

subspace_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
 
def is_likely_subspace(vectors, test_vectors=100, test_scalars=10):
    """
    Heuristic test if vectors seem to span a subspace.
    Tests random linear combinations stay in approximate span.
    (Not a proof, just a check!)
    """
    A = np.column_stack(vectors)
    dim = A.shape[0]
    
    # Generate random linear combinations
    for _ in range(test_vectors):
        coeffs = np.random.randn(len(vectors))
        lc = A @ coeffs
        
        # Check if lc can be expressed as linear combination
        # (should always be true for span)
        residual = np.linalg.lstsq(A, lc, rcond=None)[1]
        if len(residual) > 0 and residual[0] > 1e-10:
            return False
    
    return True
 
# The span of these vectors is a subspace
v1 = np.array([1, 0, 0])
v2 = np.array([0, 1, 0])
print(f"Span of v1, v2 likely subspace: {is_likely_subspace([v1, v2])}")
 
# Example: checking if a vector is in a subspace
def in_subspace(vector, basis_vectors, tol=1e-10):
    """Check if vector is in the subspace spanned by basis_vectors."""
    A = np.column_stack(basis_vectors)
    coeffs, residuals, rank, s = np.linalg.lstsq(A, vector, rcond=None)
    reconstruction = A @ coeffs
    error = np.linalg.norm(vector - reconstruction)
    return error < tol
 
# xy-plane is a subspace of R^3
xy_basis = [np.array([1, 0, 0]), np.array([0, 1, 0])]
 
test1 = np.array([3, 4, 0])  # In xy-plane
test2 = np.array([3, 4, 5])  # Not in xy-plane
 
print(f"\n{test1} in xy-plane: {in_subspace(test1, xy_basis)}")
print(f"{test2} in xy-plane: {in_subspace(test2, xy_basis)}")

Basis and Dimension

Definition of basis:

A basis for a vector space $V$ is a set of vectors ${\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_n}$ that is:

Linearly independent: No vector is a linear combination of the others
Spanning: Every vector in $V$ can be expressed as a linear combination of the basis vectors

In other words, a basis is a minimal spanning set or equivalently a maximal independent set.

Definition of dimension:

The dimension of a vector space $V$, denoted $\dim(V)$, is the number of vectors in any basis of $V$.

Remarkable fact: Every basis of the same vector space has the same number of vectors. This is why dimension is well-defined.

Basis = Coordinate System

Key theorem (Unique Representation):

If ${\mathbf{b}_1, \ldots, \mathbf{b}_n}$ is a basis for $V$, then every vector $\mathbf{v} \in V$ can be written uniquely as:

$$\mathbf{v} = c_1 \mathbf{b}_1 + c_2 \mathbf{b}_2 + \cdots + c_n \mathbf{b}_n$$

The scalars $(c_1, c_2, \ldots, c_n)$ are called the coordinates of $\mathbf{v}$ with respect to this basis.

Standard bases:

Standard Bases for Common Vector Spaces
Vector Space	Standard Basis	Dimension
$\mathbb{R}^n$	${\mathbf{e}_1, \ldots, \mathbf{e}_n}$	$n$
$\mathbb{P}_n$ (polynomials ≤ degree n)	${1, x, x^2, \ldots, x^n}$	$n+1$
$M_{m \times n}$ (matrices)	Matrices with single 1	$mn$
$\mathcal{C}[a,b]$ (continuous functions)	None finite!	$\infty$ (infinite-dimensional)

Properties of bases:

Any n independent vectors in $\mathbb{R}^n$ form a basis
Any n spanning vectors in $\mathbb{R}^n$ form a basis (if they span, there must be exactly n)
Fewer than n vectors cannot span $\mathbb{R}^n$
More than n vectors in $\mathbb{R}^n$ cannot be independent

Infinite-dimensional spaces:

basis_dimension.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
 
# Standard basis for R^3
e1, e2, e3 = np.eye(3)
print("Standard basis for R^3:")
print(f"e1 = {e1}, e2 = {e2}, e3 = {e3}")
 
# Express a vector in standard basis coordinates
v = np.array([3, -2, 5])
# Trivially: v = 3*e1 + (-2)*e2 + 5*e3
# The coordinates ARE the components in standard basis
 
# Non-standard basis
b1 = np.array([1, 1, 0])
b2 = np.array([1, 0, 1])
b3 = np.array([0, 1, 1])
 
# Is this a basis? Check independence
B = np.column_stack([b1, b2, b3])
det_B = np.linalg.det(B)
print(f"\nNon-standard basis:")
print(f"det(B) = {det_B:.4f} (non-zero => independent => basis)")
 
# Find coordinates of v in this basis
# Solve: B @ coords = v
coords = np.linalg.solve(B, v)
print(f"\nVector v = {v}")
print(f"Coordinates in B-basis: {coords}")
print(f"Verification: {coords[0]}*b1 + {coords[1]}*b2 + {coords[2]}*b3 = {coords @ B.T}")
 
# Dimension of a subspace (e.g., column space)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])
rank = np.linalg.matrix_rank(A)
print(f"\nMatrix A is {A.shape[0]}x{A.shape[1]}")
print(f"Dimension of column space (rank): {rank}")
print(f"Dimension of null space: {A.shape[1] - rank}")

The Four Fundamental Subspaces of a Matrix

The four subspaces:

1. Column Space (Range): $\mathcal{C}(A)$ or $\text{col}(A)$

Definition: Span of the columns of $A$
Lives in: $\mathbb{R}^m$
Dimension: $r$ (the rank of $A$)
Meaning: All vectors $\mathbf{b}$ for which $A\mathbf{x} = \mathbf{b}$ has a solution

2. Null Space (Kernel): $\mathcal{N}(A)$ or $\ker(A)$

Definition: All solutions to $A\mathbf{x} = \mathbf{0}$
Lives in: $\mathbb{R}^n$
Dimension: $n - r$ (nullity)
Meaning: Inputs that the transformation $A$ maps to zero

3. Row Space: $\mathcal{C}(A^\top)$

Definition: Span of the rows of $A$ (= columns of $A^\top$)
Lives in: $\mathbb{R}^n$
Dimension: $r$
Meaning: The "effective" input directions

4. Left Null Space: $\mathcal{N}(A^\top)$

Definition: All solutions to $A^\top\mathbf{y} = \mathbf{0}$
Lives in: $\mathbb{R}^m$
Dimension: $m - r$
Meaning: Output directions not reachable by $A$

The Fundamental Theorem of Linear Algebra

Dimension relationships:

ML interpretation:

For a data matrix $X$ or weight matrix $W$:

Column space: What outputs/predictions are achievable
Null space: What input variations have no effect (information lost)
Row space: What input directions matter for the output
Left null space: What output directions are unreachable

fundamental_subspaces.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
from scipy.linalg import null_space, orth
 
# Matrix for demonstration
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
 
m, n = A.shape
rank = np.linalg.matrix_rank(A)
 
print(f"Matrix A ({m}x{n}), rank = {rank}")
print(f"A:\n{A}")
 
# 1. Column space
col_space = orth(A)  # Orthonormal basis for column space
print(f"\n1. Column space basis (dimension {col_space.shape[1]}):")
print(f"{col_space}")
 
# 2. Null space
null_sp = null_space(A)
print(f"\n2. Null space basis (dimension {null_sp.shape[1]}):")
if null_sp.shape[1] > 0:
    print(f"{null_sp}")
    # Verify: A @ null_vector should be zero
    print(f"Verification A @ null = {A @ null_sp[:, 0]}")
 
# 3. Row space (= column space of A^T)
row_space = orth(A.T)
print(f"\n3. Row space basis (dimension {row_space.shape[1]}):")
print(f"{row_space}")
 
# 4. Left null space (= null space of A^T)
left_null = null_space(A.T)
print(f"\n4. Left null space basis (dimension {left_null.shape[1]}):")
if left_null.shape[1] > 0:
    print(f"{left_null}")
 
# Verify orthogonality: row space ⊥ null space
if null_sp.shape[1] > 0:
    dot = row_space.T @ null_sp
    print(f"\nRow space ⊥ Null space? Max dot product: {np.abs(dot).max():.2e}")
 
# Dimension check
print(f"\nDimension check:")
print(f"col_space + null_space = {col_space.shape[1]} + {null_sp.shape[1]} = {col_space.shape[1] + null_sp.shape[1]} = n = {n}")
print(f"row_space + left_null = {row_space.shape[1]} + {left_null.shape[1]} = {row_space.shape[1] + left_null.shape[1]} = m = {m}")

Change of Basis

The same vector has different coordinates in different bases. Change of basis describes how coordinates transform when we switch from one basis to another.

Setup:

Let $\mathcal{B} = {\mathbf{b}_1, \ldots, \mathbf{b}_n}$ and $\mathcal{C} = {\mathbf{c}_1, \ldots, \mathbf{c}_n}$ be two bases for a vector space $V$.

A vector $\mathbf{v}$ has:

Coordinates $[\mathbf{v}]_\mathcal{B} = (\alpha_1, \ldots, \alpha_n)$ in basis $\mathcal{B}$
Coordinates $[\mathbf{v}]_\mathcal{C} = (\beta_1, \ldots, \beta_n)$ in basis $\mathcal{C}$

The change of basis matrix:

$$[\mathbf{v}]\mathcal{C} = P{\mathcal{B} \to \mathcal{C}} \cdot [\mathbf{v}]_\mathcal{B}$$

where $P_{\mathcal{B} \to \mathcal{C}}$ is the change of basis matrix from $\mathcal{B}$ to $\mathcal{C}$.

To construct $P_{\mathcal{B} \to \mathcal{C}}$:

Express each basis vector $\mathbf{b}_j$ in terms of basis $\mathcal{C}$
These coordinates form the columns of $P$

Change of Basis in ML

Special case: Standard basis to custom basis

If $\mathcal{B}$ is custom and $\mathcal{C}$ is the standard basis:

Place basis vectors of $\mathcal{B}$ as columns of matrix $B$
$[\mathbf{v}]\text{standard} = B \cdot [\mathbf{v}]\mathcal{B}$
$[\mathbf{v}]\mathcal{B} = B^{-1} \cdot [\mathbf{v}]\text{standard}$

Transformation in different bases:

A linear transformation $T$ has different matrix representations in different bases: $$[T]\mathcal{C} = P^{-1} [T]\mathcal{B} P$$

This is similarity transformation—the transformation is the same, but its matrix representation changes with basis.

change_of_basis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np
 
# Custom basis for R^2
b1 = np.array([1, 1])
b2 = np.array([1, -1])
B = np.column_stack([b1, b2])
 
print("Custom basis B:")
print(f"b1 = {b1}")
print(f"b2 = {b2}")
 
# A vector in standard coordinates
v_standard = np.array([3, 1])
 
# Find coordinates in B basis
# v = B @ coords => coords = B^(-1) @ v
v_B = np.linalg.solve(B, v_standard)
 
print(f"\nVector v in standard coords: {v_standard}")
print(f"Vector v in B coords: {v_B}")
print(f"Verification: {v_B[0]}*b1 + {v_B[1]}*b2 = {v_B[0]*b1 + v_B[1]*b2}")
 
# Change of basis matrix from standard to B
P_std_to_B = np.linalg.inv(B)
print(f"\nChange of basis matrix (std -> B):\n{P_std_to_B}")
print(f"P @ v_standard = {P_std_to_B @ v_standard}")
 
# PCA as change of basis
from sklearn.decomposition import PCA
 
# Generate some 2D data
np.random.seed(42)
data = np.random.randn(100, 2) @ np.array([[2, 1], [1, 2]])
 
pca = PCA()
pca.fit(data)
 
print(f"\nPCA as change of basis:")
print(f"Principal components (new basis vectors):")
print(pca.components_)
 
# Transform data to PC coordinates
data_pca = pca.transform(data)
print(f"Original data shape: {data.shape}")
print(f"PCA-transformed shape: {data_pca.shape}")
 
# The components form change-of-basis matrix
# data_pca = data @ components.T (approximately, after centering)

Vector Space Concepts in Machine Learning

The abstract framework of vector spaces illuminates many ML concepts.

1. Feature spaces:

Each data point lives in a feature space $\mathbb{R}^d$. The dimension $d$ is the number of features. Understanding this as a vector space explains:

Why linear models find hyperplanes (subspaces)
Why feature engineering changes the space structure
Why dimensionality reduction projects to subspaces

2. Weight spaces:

Neural network weights form vectors in high-dimensional spaces. The loss surface is defined on this weight space. Optimization navigates this space.

Vector Space Concepts in ML
Concept	Vector Space Term	ML Application
Data matrix columns	Span	What predictions are possible
Redundant features	Linear dependence	Multicollinearity, compress features
independent features	Dimension	Intrinsic complexity of data
PCA components	Orthonormal basis	New coordinates maximizing variance
Regularization	Null space constraint	Prefer solutions with zero components in null space
Embeddings	Subspace representation	Map high-dim to low-dim while preserving structure

Function Spaces in ML

3. Null space and solutions:

For $A\mathbf{x} = \mathbf{b}$:

If solution exists, all solutions have form $\mathbf{x}_p + \mathbf{n}$ where $\mathbf{n}$ is in null space
Regularization picks among these solutions (prefer small $|\mathbf{x}|$)

4. Rank and expressivity:

Low-rank approximations compress data (SVD, matrix factorization)
Rank of weight matrix limits layer's expressive power
Rank of feature matrix determines degrees of freedom in fitting

ml_vector_spaces.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import numpy as np
from sklearn.linear_model import Ridge, LinearRegression
from scipy.linalg import null_space
 
# Underdetermined system: infinitely many solutions
# The null space determines the freedom we have
np.random.seed(42)
 
# More features than samples
n_samples, n_features = 50, 100
X = np.random.randn(n_samples, n_features)
y = np.random.randn(n_samples)
 
print(f"X shape: {X.shape} (underdetermined)")
print(f"Rank of X: {np.linalg.matrix_rank(X)}")
print(f"Null space dimension: {n_features - np.linalg.matrix_rank(X)}")
 
# Any solution + null space vector is also a solution
# Regularization picks one by minimizing ||w||
 
# Unregularized (picks minimum norm solution)
lr = LinearRegression(fit_intercept=False)
lr.fit(X, y)
print(f"\nLinear regression ||w||: {np.linalg.norm(lr.coef_):.4f}")
 
# Ridge regression (stronger regularization = smaller weights)
for alpha in [0.01, 0.1, 1.0, 10.0]:
    ridge = Ridge(alpha=alpha, fit_intercept=False)
    ridge.fit(X, y)
    print(f"Ridge (alpha={alpha}) ||w||: {np.linalg.norm(ridge.coef_):.4f}")
 
# Low-rank approximation: compress data
from sklearn.decomposition import TruncatedSVD
 
data = np.random.randn(1000, 100)
original_rank = np.linalg.matrix_rank(data)
 
for k in [10, 20, 50]:
    svd = TruncatedSVD(n_components=k)
    data_reduced = svd.fit_transform(data)
    data_reconstructed = svd.inverse_transform(data_reduced)
    
    reconstruction_error = np.linalg.norm(data - data_reconstructed, 'fro')
    print(f"\nRank-{k} approximation error: {reconstruction_error:.2f}")

Summary: The Abstract Framework

We've completed our journey through vectors and vector spaces—from concrete ordered lists to abstract mathematical structures. This framework underlies all of linear algebra and machine learning.

Key Takeaways

•Vector spaces are sets with addition and scalar multiplication satisfying eight axioms—enabling all linear algebra tools
•Subspaces are subsets that are vector spaces themselves—closed under addition and scalar multiplication, containing zero
•Basis is a minimal spanning set (independent and spanning)—provides unique coordinates for all vectors
•Dimension is the number of basis vectors—fixed for any vector space, measures 'degrees of freedom'
•The four fundamental subspaces (column, null, row, left null) completely characterize a matrix's behavior
•Change of basis transforms coordinates between bases—PCA, Fourier transforms, and whitening are examples
•In ML, these concepts explain feature spaces, dimensionality reduction, rank constraints, and regularization

Module complete!

Coming next: Matrices and Linear Transformations—how matrices encode functions, transform spaces, and enable the computations that power machine learning.

Module Complete!

5 / 5

Vectors and Vector Spaces: The Language of Machine Learning

Vector Spaces and Subspaces

independent features

Vectors and Vector Spaces: The Language of Machine Learning

Vector Spaces and Subspaces

independent features