Loading learning content...
So far, we've worked with vectors as concrete objects—ordered lists of numbers in $\mathbb{R}^n$. But the true power of linear algebra emerges from abstraction: recognizing that many different mathematical objects behave like vectors.
Polynomials. Functions. Matrices. Signals. Probability distributions. All of these can form vector spaces—systems where addition and scalar multiplication make sense and follow the same fundamental rules.
This abstraction is not merely academic elegance. Machine learning algorithms are designed to work in vector spaces, and understanding the abstract structure explains why these algorithms apply equally well to image pixels, word embeddings, graph features, or any other vectorizable data.
This page formalizes the concept of vector spaces and subspaces, explores basis and dimension, and introduces the fundamental subspaces that arise from matrices—structures that directly determine the behavior of linear transformations in ML.
By the end of this page, you will understand the formal axioms defining vector spaces, recognize vector spaces beyond ℝⁿ (functions, polynomials, matrices), master the concepts of subspace, basis, and dimension, understand the four fundamental subspaces of a matrix, and connect these abstractions to concrete ML applications.
Formal definition:
A vector space (over the real numbers $\mathbb{R}$) is a set $V$ together with two operations:
satisfying the following eight axioms for all vectors $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$ and all scalars $c, d \in \mathbb{R}$:
| Axiom Name | Formula | Meaning | |
|---|---|---|---|
| 1 | Additive closure | $\mathbf{u} + \mathbf{v} \in V$ | Sum of vectors is a vector |
| 2 | Additive commutativity | $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ | Order of addition doesn't matter |
| 3 | Additive associativity | $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ | Grouping doesn't matter |
| 4 | Additive identity | $\exists \mathbf{0}: \mathbf{v} + \mathbf{0} = \mathbf{v}$ | Zero vector exists |
| 5 | Additive inverse | $\exists (-\mathbf{v}): \mathbf{v} + (-\mathbf{v}) = \mathbf{0}$ | Negation exists |
| 6 | Scalar multiplicative closure | $c\mathbf{v} \in V$ | Scaling produces a vector |
| 7 | Distributivity (scalars) | $(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$ | Scalar addition distributes |
| 8 | Distributivity (vectors) | $c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$ | Vector addition distributes |
Additional axioms (sometimes listed as 9 and 10):
These axioms capture exactly what's needed for linear algebra to work. They ensure that linear combinations, span, linear independence, and all the tools we've developed make sense. Any set satisfying these axioms inherits the entire theory of linear algebra—that's the power of abstraction.
Immediate consequences of the axioms:
These aren't axioms—they're theorems proven from the axioms.
The axioms apply far beyond lists of numbers. Here are important examples of vector spaces:
1. $\mathbb{R}^n$ — n-dimensional real coordinate space
The canonical example. Vectors are n-tuples $(x_1, \ldots, x_n)$, addition and scaling are component-wise. All axioms are satisfied.
2. $\mathbb{P}_n$ — Polynomials of degree at most n
Vectors are polynomials $p(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n$. Addition and scaling are the natural operations on polynomials. The zero vector is the zero polynomial.
3. $\mathcal{C}[a,b]$ — Continuous functions on $[a,b]$
Vectors are continuous functions. Addition: $(f+g)(x) = f(x) + g(x)$. Scaling: $(cf)(x) = c \cdot f(x)$. The zero vector is the function $f(x) = 0$.
4. $M_{m \times n}$ — Matrices of size $m \times n$
Vectors are matrices. Addition and scaling are entry-wise (same as viewing the matrix as a vector of $mn$ entries).
Neural networks learn functions. Kernel methods work in function spaces. Attention operates on sequences (which can be viewed as functions from position to embedding). Recognizing these as vector spaces means linear algebra tools—projection, orthogonality, basis expansion—apply directly.
5. Solution space of homogeneous linear system
The set of all solutions to $A\mathbf{x} = \mathbf{0}$ forms a vector space (the null space of $A$).
6. Sequences with finite support
Sequences $(a_0, a_1, a_2, \ldots)$ where only finitely many terms are non-zero. This is the foundation of some signal processing applications.
Non-examples (things that are NOT vector spaces):
| Set | Why It Fails |
|---|---|
| ${(x, y) : x \geq 0}$ (first quadrant) | Not closed under negation: $(-1)(1, 0) = (-1, 0)$ not in set |
| ${(x, y) : x^2 + y^2 = 1}$ (unit circle) | Not closed under addition: $(1,0) + (0,1) = (1,1)$ not on circle |
| ${(x, y, z) : x + y + z = 1}$ (plane not through origin) | Zero vector $(0,0,0)$ not in set (would need $0+0+0=1$) |
| Polynomials of degree exactly n | Not closed under addition: $x^n + (-x^n) = 0$ has degree undefined |
1234567891011121314151617181920212223242526272829303132333435363738
import numpy as npfrom numpy.polynomial import polynomial as P # Example 1: R^n as vector spacev1 = np.array([1, 2, 3])v2 = np.array([4, 5, 6])print("R^n example:")print(f"v1 + v2 = {v1 + v2}")print(f"2 * v1 = {2 * v1}") # Example 2: Polynomials as vector space# NumPy represents polynomials by coefficients [a0, a1, a2, ...]p1 = np.array([1, 2, 3]) # 1 + 2x + 3x^2p2 = np.array([0, 1, -1]) # 0 + x - x^2 print("\nPolynomial vector space:")print(f"p1 = 1 + 2x + 3x^2")print(f"p2 = x - x^2")print(f"p1 + p2 = {p1 + p2} -> 1 + 3x + 2x^2")print(f"2 * p1 = {2 * p1} -> 2 + 4x + 6x^2") # Example 3: Matrices as vector spaceM1 = np.array([[1, 2], [3, 4]])M2 = np.array([[0, 1], [1, 0]]) print("\nMatrix vector space:")print(f"M1 + M2:\n{M1 + M2}")print(f"3 * M1:\n{3 * M1}") # Example 4: Null space as vector spaceA = np.array([[1, 2, 3], [4, 5, 6]])# Null space: solutions to Ax = 0from scipy.linalg import null_spacens = null_space(A)print(f"\nNull space of A (dimension {ns.shape[1]}):")if ns.shape[1] > 0: print(f"Basis vector:\n{ns[:, 0]}") print(f"Verification A @ null_vector = {A @ ns[:, 0]}")Definition of subspace:
A subspace $W$ of a vector space $V$ is a non-empty subset of $V$ that is itself a vector space under the same operations.
Subspace test (simpler than checking all axioms):
To verify that $W$ is a subspace of $V$, check:
Or equivalently (combining 2 and 3):
Every subspace must contain the zero vector (set c=0 in the scalar multiplication closure). This is why planes through the origin are subspaces, but planes that don't pass through the origin are NOT subspaces—they lack the zero vector.
Examples of subspaces:
In $\mathbb{R}^3$:
In polynomial space $\mathbb{P}_n$:
The span of any set of vectors is always a subspace. This is fundamental: given any vectors, their span is the smallest subspace containing them.
| Set | Subspace? | Reason |
|---|---|---|
| ${(x, y, z) : x + y + z = 0}$ | Yes | Plane through origin; closed under +, × |
| ${(x, y, z) : x + y + z = 1}$ | No | Doesn't contain (0,0,0) |
| ${(x, y, 0) : x, y \in \mathbb{R}}$ (xy-plane) | Yes | Plane through origin |
| ${(x, x, x) : x \in \mathbb{R}}$ (line $x=y=z$) | Yes | Line through origin |
| ${(x, y, z) : xyz = 0}$ | No | Not closed: $(1,0,0) + (0,1,0) = (1,1,0)$ not in set |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np def is_likely_subspace(vectors, test_vectors=100, test_scalars=10): """ Heuristic test if vectors seem to span a subspace. Tests random linear combinations stay in approximate span. (Not a proof, just a check!) """ A = np.column_stack(vectors) dim = A.shape[0] # Generate random linear combinations for _ in range(test_vectors): coeffs = np.random.randn(len(vectors)) lc = A @ coeffs # Check if lc can be expressed as linear combination # (should always be true for span) residual = np.linalg.lstsq(A, lc, rcond=None)[1] if len(residual) > 0 and residual[0] > 1e-10: return False return True # The span of these vectors is a subspacev1 = np.array([1, 0, 0])v2 = np.array([0, 1, 0])print(f"Span of v1, v2 likely subspace: {is_likely_subspace([v1, v2])}") # Example: checking if a vector is in a subspacedef in_subspace(vector, basis_vectors, tol=1e-10): """Check if vector is in the subspace spanned by basis_vectors.""" A = np.column_stack(basis_vectors) coeffs, residuals, rank, s = np.linalg.lstsq(A, vector, rcond=None) reconstruction = A @ coeffs error = np.linalg.norm(vector - reconstruction) return error < tol # xy-plane is a subspace of R^3xy_basis = [np.array([1, 0, 0]), np.array([0, 1, 0])] test1 = np.array([3, 4, 0]) # In xy-planetest2 = np.array([3, 4, 5]) # Not in xy-plane print(f"\n{test1} in xy-plane: {in_subspace(test1, xy_basis)}")print(f"{test2} in xy-plane: {in_subspace(test2, xy_basis)}")Definition of basis:
A basis for a vector space $V$ is a set of vectors ${\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_n}$ that is:
In other words, a basis is a minimal spanning set or equivalently a maximal independent set.
Definition of dimension:
The dimension of a vector space $V$, denoted $\dim(V)$, is the number of vectors in any basis of $V$.
Remarkable fact: Every basis of the same vector space has the same number of vectors. This is why dimension is well-defined.
A basis provides a coordinate system. Every vector has unique coordinates (coefficients) in a given basis. Changing basis changes coordinates but not the underlying vector. This is foundational for understanding transformations in ML.
Key theorem (Unique Representation):
If ${\mathbf{b}_1, \ldots, \mathbf{b}_n}$ is a basis for $V$, then every vector $\mathbf{v} \in V$ can be written uniquely as:
$$\mathbf{v} = c_1 \mathbf{b}_1 + c_2 \mathbf{b}_2 + \cdots + c_n \mathbf{b}_n$$
The scalars $(c_1, c_2, \ldots, c_n)$ are called the coordinates of $\mathbf{v}$ with respect to this basis.
Standard bases:
| Vector Space | Standard Basis | Dimension |
|---|---|---|
| $\mathbb{R}^n$ | ${\mathbf{e}_1, \ldots, \mathbf{e}_n}$ | $n$ |
| $\mathbb{P}_n$ (polynomials ≤ degree n) | ${1, x, x^2, \ldots, x^n}$ | $n+1$ |
| $M_{m \times n}$ (matrices) | Matrices with single 1 | $mn$ |
| $\mathcal{C}[a,b]$ (continuous functions) | None finite! | $\infty$ (infinite-dimensional) |
Properties of bases:
Infinite-dimensional spaces:
Some vector spaces (like continuous functions) have no finite basis—they're infinite-dimensional. This is important for understanding function spaces in ML (e.g., Gaussian processes, kernel methods).
1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np # Standard basis for R^3e1, e2, e3 = np.eye(3)print("Standard basis for R^3:")print(f"e1 = {e1}, e2 = {e2}, e3 = {e3}") # Express a vector in standard basis coordinatesv = np.array([3, -2, 5])# Trivially: v = 3*e1 + (-2)*e2 + 5*e3# The coordinates ARE the components in standard basis # Non-standard basisb1 = np.array([1, 1, 0])b2 = np.array([1, 0, 1])b3 = np.array([0, 1, 1]) # Is this a basis? Check independenceB = np.column_stack([b1, b2, b3])det_B = np.linalg.det(B)print(f"\nNon-standard basis:")print(f"det(B) = {det_B:.4f} (non-zero => independent => basis)") # Find coordinates of v in this basis# Solve: B @ coords = vcoords = np.linalg.solve(B, v)print(f"\nVector v = {v}")print(f"Coordinates in B-basis: {coords}")print(f"Verification: {coords[0]}*b1 + {coords[1]}*b2 + {coords[2]}*b3 = {coords @ B.T}") # Dimension of a subspace (e.g., column space)A = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])rank = np.linalg.matrix_rank(A)print(f"\nMatrix A is {A.shape[0]}x{A.shape[1]}")print(f"Dimension of column space (rank): {rank}")print(f"Dimension of null space: {A.shape[1] - rank}")Every matrix $A \in \mathbb{R}^{m \times n}$ has four fundamental subspaces that completely characterize its behavior. Understanding these is essential for understanding linear transformations, least squares, SVD, and many ML algorithms.
The four subspaces:
1. Column Space (Range): $\mathcal{C}(A)$ or $\text{col}(A)$
2. Null Space (Kernel): $\mathcal{N}(A)$ or $\ker(A)$
3. Row Space: $\mathcal{C}(A^\top)$
4. Left Null Space: $\mathcal{N}(A^\top)$
The row space and null space are orthogonal complements in ℝⁿ (they partition the input space). The column space and left null space are orthogonal complements in ℝᵐ (they partition the output space). This orthogonality is the foundation of least squares and pseudoinverses.
Dimension relationships:
$$\dim(\mathcal{C}(A)) + \dim(\mathcal{N}(A)) = n \quad \text{(number of columns)}$$ $$\dim(\mathcal{C}(A^\top)) + \dim(\mathcal{N}(A^\top)) = m \quad \text{(number of rows)}$$ $$\dim(\mathcal{C}(A)) = \dim(\mathcal{C}(A^\top)) = r \quad \text{(rank)}$$
ML interpretation:
For a data matrix $X$ or weight matrix $W$:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as npfrom scipy.linalg import null_space, orth # Matrix for demonstrationA = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9]]) m, n = A.shaperank = np.linalg.matrix_rank(A) print(f"Matrix A ({m}x{n}), rank = {rank}")print(f"A:\n{A}") # 1. Column spacecol_space = orth(A) # Orthonormal basis for column spaceprint(f"\n1. Column space basis (dimension {col_space.shape[1]}):")print(f"{col_space}") # 2. Null spacenull_sp = null_space(A)print(f"\n2. Null space basis (dimension {null_sp.shape[1]}):")if null_sp.shape[1] > 0: print(f"{null_sp}") # Verify: A @ null_vector should be zero print(f"Verification A @ null = {A @ null_sp[:, 0]}") # 3. Row space (= column space of A^T)row_space = orth(A.T)print(f"\n3. Row space basis (dimension {row_space.shape[1]}):")print(f"{row_space}") # 4. Left null space (= null space of A^T)left_null = null_space(A.T)print(f"\n4. Left null space basis (dimension {left_null.shape[1]}):")if left_null.shape[1] > 0: print(f"{left_null}") # Verify orthogonality: row space ⊥ null spaceif null_sp.shape[1] > 0: dot = row_space.T @ null_sp print(f"\nRow space ⊥ Null space? Max dot product: {np.abs(dot).max():.2e}") # Dimension checkprint(f"\nDimension check:")print(f"col_space + null_space = {col_space.shape[1]} + {null_sp.shape[1]} = {col_space.shape[1] + null_sp.shape[1]} = n = {n}")print(f"row_space + left_null = {row_space.shape[1]} + {left_null.shape[1]} = {row_space.shape[1] + left_null.shape[1]} = m = {m}")The same vector has different coordinates in different bases. Change of basis describes how coordinates transform when we switch from one basis to another.
Setup:
Let $\mathcal{B} = {\mathbf{b}_1, \ldots, \mathbf{b}_n}$ and $\mathcal{C} = {\mathbf{c}_1, \ldots, \mathbf{c}_n}$ be two bases for a vector space $V$.
A vector $\mathbf{v}$ has:
The change of basis matrix:
$$[\mathbf{v}]\mathcal{C} = P{\mathcal{B} \to \mathcal{C}} \cdot [\mathbf{v}]_\mathcal{B}$$
where $P_{\mathcal{B} \to \mathcal{C}}$ is the change of basis matrix from $\mathcal{B}$ to $\mathcal{C}$.
To construct $P_{\mathcal{B} \to \mathcal{C}}$:
PCA is a change of basis! It transforms data from the standard basis to the basis of principal components. The transform matrix contains the principal components as columns. This new basis aligns with the directions of maximum variance.
Special case: Standard basis to custom basis
If $\mathcal{B}$ is custom and $\mathcal{C}$ is the standard basis:
Transformation in different bases:
A linear transformation $T$ has different matrix representations in different bases: $$[T]\mathcal{C} = P^{-1} [T]\mathcal{B} P$$
This is similarity transformation—the transformation is the same, but its matrix representation changes with basis.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
import numpy as np # Custom basis for R^2b1 = np.array([1, 1])b2 = np.array([1, -1])B = np.column_stack([b1, b2]) print("Custom basis B:")print(f"b1 = {b1}")print(f"b2 = {b2}") # A vector in standard coordinatesv_standard = np.array([3, 1]) # Find coordinates in B basis# v = B @ coords => coords = B^(-1) @ vv_B = np.linalg.solve(B, v_standard) print(f"\nVector v in standard coords: {v_standard}")print(f"Vector v in B coords: {v_B}")print(f"Verification: {v_B[0]}*b1 + {v_B[1]}*b2 = {v_B[0]*b1 + v_B[1]*b2}") # Change of basis matrix from standard to BP_std_to_B = np.linalg.inv(B)print(f"\nChange of basis matrix (std -> B):\n{P_std_to_B}")print(f"P @ v_standard = {P_std_to_B @ v_standard}") # PCA as change of basisfrom sklearn.decomposition import PCA # Generate some 2D datanp.random.seed(42)data = np.random.randn(100, 2) @ np.array([[2, 1], [1, 2]]) pca = PCA()pca.fit(data) print(f"\nPCA as change of basis:")print(f"Principal components (new basis vectors):")print(pca.components_) # Transform data to PC coordinatesdata_pca = pca.transform(data)print(f"Original data shape: {data.shape}")print(f"PCA-transformed shape: {data_pca.shape}") # The components form change-of-basis matrix# data_pca = data @ components.T (approximately, after centering)The abstract framework of vector spaces illuminates many ML concepts.
1. Feature spaces:
Each data point lives in a feature space $\mathbb{R}^d$. The dimension $d$ is the number of features. Understanding this as a vector space explains:
2. Weight spaces:
Neural network weights form vectors in high-dimensional spaces. The loss surface is defined on this weight space. Optimization navigates this space.
| Concept | Vector Space Term | ML Application |
|---|---|---|
| Data matrix columns | Span | What predictions are possible |
| Redundant features | Linear dependence | Multicollinearity, compress features |
independent features | Dimension | Intrinsic complexity of data |
| PCA components | Orthonormal basis | New coordinates maximizing variance |
| Regularization | Null space constraint | Prefer solutions with zero components in null space |
| Embeddings | Subspace representation | Map high-dim to low-dim while preserving structure |
Gaussian processes and kernel methods work in infinite-dimensional function spaces. The kernel defines an inner product, making it a Hilbert space. Understanding these as vector spaces (with infinite-dimensional bases) explains why linear algebra tools like projection and orthogonality still apply.
3. Null space and solutions:
For $A\mathbf{x} = \mathbf{b}$:
4. Rank and expressivity:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
import numpy as npfrom sklearn.linear_model import Ridge, LinearRegressionfrom scipy.linalg import null_space # Underdetermined system: infinitely many solutions# The null space determines the freedom we havenp.random.seed(42) # More features than samplesn_samples, n_features = 50, 100X = np.random.randn(n_samples, n_features)y = np.random.randn(n_samples) print(f"X shape: {X.shape} (underdetermined)")print(f"Rank of X: {np.linalg.matrix_rank(X)}")print(f"Null space dimension: {n_features - np.linalg.matrix_rank(X)}") # Any solution + null space vector is also a solution# Regularization picks one by minimizing ||w|| # Unregularized (picks minimum norm solution)lr = LinearRegression(fit_intercept=False)lr.fit(X, y)print(f"\nLinear regression ||w||: {np.linalg.norm(lr.coef_):.4f}") # Ridge regression (stronger regularization = smaller weights)for alpha in [0.01, 0.1, 1.0, 10.0]: ridge = Ridge(alpha=alpha, fit_intercept=False) ridge.fit(X, y) print(f"Ridge (alpha={alpha}) ||w||: {np.linalg.norm(ridge.coef_):.4f}") # Low-rank approximation: compress datafrom sklearn.decomposition import TruncatedSVD data = np.random.randn(1000, 100)original_rank = np.linalg.matrix_rank(data) for k in [10, 20, 50]: svd = TruncatedSVD(n_components=k) data_reduced = svd.fit_transform(data) data_reconstructed = svd.inverse_transform(data_reduced) reconstruction_error = np.linalg.norm(data - data_reconstructed, 'fro') print(f"\nRank-{k} approximation error: {reconstruction_error:.2f}")We've completed our journey through vectors and vector spaces—from concrete ordered lists to abstract mathematical structures. This framework underlies all of linear algebra and machine learning.
Module complete!
You've now mastered the foundational concepts of vectors and vector spaces. This knowledge prepares you for the next modules on matrices, linear transformations, eigenvalues, and singular value decomposition—the tools that bring linear algebra to life in machine learning.
Coming next: Matrices and Linear Transformations—how matrices encode functions, transform spaces, and enable the computations that power machine learning.
Congratulations! You've completed Module 1: Vectors and Vector Spaces. You now have the conceptual foundation for understanding how data is represented and transformed in machine learning—from simple feature vectors to the abstract structures that make algorithms work.