Loading learning content...
Given a set of vectors, two fundamental questions arise:
The answers to these questions are span and linear independence—two of the most important concepts in all of linear algebra.
Span tells us the reach of our vectors: the complete set of destinations accessible via linear combinations. Linear independence tells us about efficiency: whether we're using the minimum number of vectors to achieve that reach, or whether we have unnecessary redundancy.
These concepts directly determine whether systems of equations have solutions, whether transformations are invertible, whether datasets have redundant features, and how neural networks can (or cannot) represent data.
By the end of this page, you will understand span as the set of all reachable vectors, master linear independence and dependence with geometric intuition, learn to test for linear independence computationally, connect these concepts to machine learning (feature redundancy, rank, representability), and understand why these concepts are essential for basis and dimension.
Definition of span:
The span of a set of vectors ${\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k}$ is the set of all possible linear combinations of those vectors:
$$\text{span}(\mathbf{v}_1, \ldots, \mathbf{v}_k) = \left{ c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k ;|; c_1, c_2, \ldots, c_k \in \mathbb{R} \right}$$
In other words, span is every vector you can "reach" by combining the given vectors with any coefficients.
Key insight: The span is always a subspace—it includes the zero vector and is closed under addition and scalar multiplication.
If someone gives you vectors v₁ and v₂, the span is the answer to: 'What destinations can I reach by traveling any amount in the v₁ direction and any amount in the v₂ direction?' The span is your complete travel map.
Examples of span:
Single non-zero vector: The span of $\mathbf{v} = (1, 2)$ is all vectors of the form $c(1, 2) = (c, 2c)$—a line through the origin in the direction of $\mathbf{v}$.
Two non-parallel vectors in $\mathbb{R}^2$: The span of $\mathbf{e}_1 = (1, 0)$ and $\mathbf{e}_2 = (0, 1)$ is all of $\mathbb{R}^2$—any 2D vector can be written as $c_1\mathbf{e}_1 + c_2\mathbf{e}_2$.
Two parallel vectors: The span of $(1, 2)$ and $(2, 4)$ is just the line through $(1, 2)$—the second vector adds no new reach because it's a scalar multiple of the first.
Three vectors in $\mathbb{R}^3$ that aren't coplanar: The span is all of $\mathbb{R}^3$.
Three vectors in $\mathbb{R}^3$ that ARE coplanar: The span is just the plane containing them—less than all of $\mathbb{R}^3$.
| Vectors | Space | Span | Dimension of Span |
|---|---|---|---|
| $(1, 2)$ | $\mathbb{R}^2$ | Line through origin | 1 |
| $(1, 0), (0, 1)$ | $\mathbb{R}^2$ | All of $\mathbb{R}^2$ | 2 |
| $(1, 2), (2, 4)$ | $\mathbb{R}^2$ | Line (parallel vectors) | 1 |
| $(1, 0, 0), (0, 1, 0)$ | $\mathbb{R}^3$ | xy-plane | 2 |
| $(1, 0, 0), (0, 1, 0), (0, 0, 1)$ | $\mathbb{R}^3$ | All of $\mathbb{R}^3$ | 3 |
123456789101112131415161718192021222324252627282930313233343536373839404142
import numpy as np def is_in_span(target, vectors, tol=1e-10): """ Check if target vector is in the span of given vectors. Returns (is_in_span, coefficients). """ # Stack vectors as columns of matrix A A = np.column_stack(vectors) # Solve A @ c = target using least squares coeffs, residuals, rank, s = np.linalg.lstsq(A, target, rcond=None) # Check if residual is essentially zero reconstruction = A @ coeffs error = np.linalg.norm(target - reconstruction) return error < tol, coeffs # Example 1: v is in span of e1, e2e1 = np.array([1, 0])e2 = np.array([0, 1])v = np.array([3, 4]) in_span, coeffs = is_in_span(v, [e1, e2])print(f"Is {v} in span of e1, e2? {in_span}")print(f"Coefficients: {coeffs}") # [3, 4]print(f"Verification: {coeffs[0]}*e1 + {coeffs[1]}*e2 = {coeffs[0]*e1 + coeffs[1]*e2}") # Example 2: Check if [1, 1, 1] is in span of [1,0,0] and [0,1,0]v1 = np.array([1, 0, 0])v2 = np.array([0, 1, 0])target = np.array([1, 1, 1]) in_span, coeffs = is_in_span(target, [v1, v2])print(f"\nIs {target} in span of xy-plane? {in_span}")# It's not! The z-component can't be reached. target2 = np.array([1, 1, 0]) # In the xy-planein_span2, coeffs2 = is_in_span(target2, [v1, v2])print(f"Is {target2} in span of xy-plane? {in_span2}")print(f"Coefficients: {coeffs2}")Definition of linear independence:
A set of vectors ${\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k}$ is linearly independent if the only solution to:
$$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k = \mathbf{0}$$
is the trivial solution $c_1 = c_2 = \cdots = c_k = 0$.
If there exists a non-trivial solution (at least one $c_i \neq 0$), the vectors are linearly dependent.
Intuitive meaning:
Linear dependence means there's redundancy: some vector is 'saying' what could already be said with the others. Linear independence means every vector contributes something unique—the set is minimal for its span.
Equivalent characterizations of linear independence:
These are all equivalent ways to say vectors are linearly independent:
Geometric interpretation:
| Vectors | Independent? | Reason |
|---|---|---|
| $(1, 0), (0, 1)$ | Yes | Standard basis, not parallel |
| $(1, 2), (2, 4)$ | No | $(2, 4) = 2(1, 2)$—parallel |
| $(1, 0), (0, 1), (1, 1)$ | No | $(1, 1) = (1, 0) + (0, 1)$—third is sum of first two |
| $(1, 0, 0), (0, 1, 0), (0, 0, 1)$ | Yes | Standard basis of $\mathbb{R}^3$ |
| $(1, 2, 3), (4, 5, 6), (7, 8, 9)$ | No | Third is linear combination of first two |
123456789101112131415161718192021222324252627282930313233343536373839404142
import numpy as np def check_linear_independence(vectors, tol=1e-10): """ Check if vectors are linearly independent using matrix rank. Returns (is_independent, rank, num_vectors). """ # Stack vectors as columns A = np.column_stack(vectors) rank = np.linalg.matrix_rank(A, tol=tol) num_vectors = len(vectors) is_independent = (rank == num_vectors) return is_independent, rank, num_vectors # Example 1: Standard basis (independent)e1, e2 = np.array([1, 0]), np.array([0, 1])ind, rank, n = check_linear_independence([e1, e2])print(f"Standard basis: independent={ind}, rank={rank}, n={n}") # Example 2: Parallel vectors (dependent)v1, v2 = np.array([1, 2]), np.array([2, 4])ind, rank, n = check_linear_independence([v1, v2])print(f"Parallel vectors: independent={ind}, rank={rank}, n={n}") # Example 3: Three vectors in R^2 (must be dependent)a, b, c = np.array([1, 0]), np.array([0, 1]), np.array([1, 1])ind, rank, n = check_linear_independence([a, b, c])print(f"3 vectors in R^2: independent={ind}, rank={rank}, n={n}") # Example 4: Check if arithmetic sequence is independentv1 = np.array([1, 2, 3])v2 = np.array([4, 5, 6])v3 = np.array([7, 8, 9])ind, rank, n = check_linear_independence([v1, v2, v3])print(f"Arithmetic sequence: independent={ind}, rank={rank}, n={n}") # Find the dependence relationA = np.column_stack([v1, v2, v3])# v3 = -1*v1 + 2*v2? Let's checkprint(f"\nv3 = {v3}")print(f"-1*v1 + 2*v2 = {-1*v1 + 2*v2}")# So v1 - 2*v2 + v3 = 0 (non-trivial solution!)Span and linear independence are deeply connected—they're two sides of the same coin.
Key relationship:
Together, they determine the concept of a basis: a set of vectors that is both:
The dimension connection:
For a vector space of dimension $n$:
Adding more vectors to a set can fail in two ways: (1) The new vector is already in the span of existing ones—no new reach, creates dependence. (2) You're trying to fit more than n vectors in an n-dimensional space—impossible to keep independence. Both failures indicate redundancy.
Practical implications:
For solving equations $A\mathbf{x} = \mathbf{b}$:
For machine learning:
| Situation | Span Implications | Independence Implications |
|---|---|---|
| Feature matrix | What predictions can be made | Are all features necessary? |
| Weight matrix | What outputs are reachable | Are weights uniquely determined? |
| Embedding space | What concepts can be represented | Is dimensionality appropriate? |
| PCA components | How much variance is captured | Each component is orthogonal (independent) |
Computational detection:
The rank of a matrix equals:
If a matrix has fewer independent columns than total columns, the columns are dependent.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
import numpy as np def analyze_vectors(vectors, space_dim): """Analyze span and independence of a set of vectors.""" A = np.column_stack(vectors) rank = np.linalg.matrix_rank(A) num_vectors = len(vectors) is_independent = (rank == num_vectors) spans_space = (rank == space_dim) is_basis = is_independent and spans_space print(f"Number of vectors: {num_vectors}") print(f"Ambient space dimension: {space_dim}") print(f"Rank (dimension of span): {rank}") print(f"Linearly independent: {is_independent}") print(f"Spans the space: {spans_space}") print(f"Forms a basis: {is_basis}") return rank, is_independent, spans_space, is_basis # Case 1: Fewer vectors than dimension (can't span)print("Case 1: Two vectors in R^3")v1, v2 = np.array([1, 0, 0]), np.array([0, 1, 0])analyze_vectors([v1, v2], 3) # Case 2: Right number, but dependent (can't do either)print("\nCase 2: Three dependent vectors in R^3")v1 = np.array([1, 0, 0])v2 = np.array([0, 1, 0])v3 = np.array([1, 1, 0]) # v3 = v1 + v2analyze_vectors([v1, v2, v3], 3) # Case 3: Right number, independent = basisprint("\nCase 3: Three independent vectors in R^3 (basis)")v1 = np.array([1, 0, 0])v2 = np.array([0, 1, 0])v3 = np.array([0, 0, 1])analyze_vectors([v1, v2, v3], 3) # Case 4: More vectors than dimension (must be dependent)print("\nCase 4: Four vectors in R^3")v1, v2, v3, v4 = (np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1]), np.array([1, 1, 1]))analyze_vectors([v1, v2, v3, v4], 3)Several methods exist to test whether vectors are linearly independent.
Method 1: Determinant (square case)
For $n$ vectors in $\mathbb{R}^n$, form the matrix with vectors as columns:
This only works when the number of vectors equals the space dimension.
Method 2: Rank comparison
For any number of vectors:
Method 3: Row reduction (Gaussian elimination)
Reduce the matrix to row echelon form:
Method 4: Solve the homogeneous system
Find all solutions to $A\mathbf{c} = \mathbf{0}$:
In practice, determinants and ranks are affected by floating-point errors. A determinant might be 1e-15 instead of exactly 0. Always use tolerance thresholds (e.g., np.linalg.matrix_rank uses SVD with tolerance). Near-dependent vectors cause numerical instability.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as npfrom scipy.linalg import null_space # Test vectorsv1 = np.array([1, 2, 3])v2 = np.array([4, 5, 6])v3 = np.array([7, 8, 9]) A = np.column_stack([v1, v2, v3]) # Method 1: Determinant (only for square matrices)det = np.linalg.det(A)print(f"Method 1 - Determinant: {det:.6f}")print(f"Independent (det != 0): {not np.isclose(det, 0)}") # Method 2: Rank comparisonrank = np.linalg.matrix_rank(A)n_vectors = A.shape[1]print(f"\nMethod 2 - Rank: {rank}, Number of vectors: {n_vectors}")print(f"Independent (rank == n): {rank == n_vectors}") # Method 3: Row reduction (via QR or similar)# The rank reveals this automatically # Method 4: Null space# If null space has dimension > 0, vectors are dependentnull = null_space(A)print(f"\nMethod 4 - Null space:")print(f"Null space dimension: {null.shape[1]}")if null.shape[1] > 0: print(f"Non-trivial null vector: {null[:, 0]}") # This means: null[0]*v1 + null[1]*v2 + null[2]*v3 = 0 c = null[:, 0] verification = c[0]*v1 + c[1]*v2 + c[2]*v3 print(f"Verification (should be ~0): {verification}") # Compare with truly independent vectorsprint("\n--- Independent vectors ---")u1 = np.array([1, 0, 0])u2 = np.array([0, 1, 0])u3 = np.array([0, 0, 1]) B = np.column_stack([u1, u2, u3])print(f"Determinant: {np.linalg.det(B)}")print(f"Rank: {np.linalg.matrix_rank(B)}")print(f"Null space dimension: {null_space(B).shape[1]}") # 0In machine learning, we work with vectors in hundreds or thousands of dimensions. Geometric intuition from 2D/3D still applies, but we must think more abstractly.
High-dimensional span:
In $\mathbb{R}^{1000}$, a set of 10 independent vectors spans a 10-dimensional subspace:
High-dimensional independence:
In $\mathbb{R}^n$, you can have at most $n$ independent vectors. Random vectors are almost always independent (probability 1 in continuous distributions) until you exceed the space dimension.
In 3D: 2 independent vectors span a plane (2D subspace). In 1000D: 2 independent vectors still span a 2D subspace—it's just a 2D 'plane' embedded in a much larger space. The concepts are identical; only our ability to visualize changes.
ML implications:
Feature spaces: If you have 100 features but only 20 are independent (80 can be expressed from the 20), your effective dimensionality is 20. This is what PCA exploits—finding the intrinsic dimension of data.
Overparameterized models: Neural networks often have more parameters than training samples. The weight vectors live in a space where many solutions exist (underdetermined system). Regularization helps pick among equivalent solutions.
Embedding spaces: Word embeddings in $\mathbb{R}^{300}$ represent a vocabulary of 100,000 words. The 100,000 word vectors must be dependent (can't have 100,000 independent vectors in $\mathbb{R}^{300}$). This isn't a bug—it's how embeddings capture relationships (similar words have similar vectors).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np # In high dimensions, random vectors are almost always independentnp.random.seed(42)dim = 100n_vectors = 50 # Generate random vectorsrandom_vectors = [np.random.randn(dim) for _ in range(n_vectors)]A = np.column_stack(random_vectors) rank = np.linalg.matrix_rank(A)print(f"Dimension: {dim}")print(f"Number of random vectors: {n_vectors}")print(f"Rank: {rank}")print(f"All independent: {rank == n_vectors}") # Almost certainly True # What happens at the boundary?n_vectors = 100 # Same as dimensionrandom_vectors = [np.random.randn(dim) for _ in range(n_vectors)]A = np.column_stack(random_vectors)rank = np.linalg.matrix_rank(A)print(f"\n{n_vectors} vectors in {dim} dimensions: rank = {rank}") # Exceeding dimension forces dependencen_vectors = 101 # More than dimensionrandom_vectors = [np.random.randn(dim) for _ in range(n_vectors)]A = np.column_stack(random_vectors)rank = np.linalg.matrix_rank(A)print(f"{n_vectors} vectors in {dim} dimensions: rank = {rank}")print(f"Must be dependent: {rank < n_vectors}") # Simulating redundant featuresprint("\n--- Feature Redundancy Example ---")# Create 50 'real' featuresreal_features = np.random.randn(1000, 50) # 1000 samples, 50 features # Create 30 'redundant' features as linear combinationsredundant_weights = np.random.randn(50, 30)redundant_features = real_features @ redundant_weights # Combined feature matrixall_features = np.hstack([real_features, redundant_features])print(f"Total features: {all_features.shape[1]}")print(f"Rank of feature matrix: {np.linalg.matrix_rank(all_features)}")print(f"Effective dimensionality: 50 (the rest are redundant)")Span and linear independence have direct, practical applications throughout machine learning.
1. Feature Selection and Multicollinearity
When features are linearly dependent (or nearly so), regression coefficients become unstable. This is called multicollinearity.
Detection: Check if feature matrix has full column rank, or examine condition number.
2. Dimensionality Reduction (PCA)
PCA finds a new set of independent directions (principal components) that:
The effective dimension is how many components explain most variance.
A 1000×100 feature matrix might have rank 50. This means only 50 features are truly independent—the other 50 are linear combinations. Understanding this prevents overfitting and guides feature engineering.
3. Neural Network Expressivity
A linear layer with weight matrix $W \in \mathbb{R}^{m \times n}$:
4. Embedding Quality
Good embeddings balance:
5. Model Identifiability
A model is identifiable if different parameter settings produce different outputs. Linear dependence in the feature/design matrix can make parameters non-identifiable (infinitely many equally good solutions).
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
import numpy as np # Example 1: Detecting multicollinearitynp.random.seed(42) # Create feature matrix with multicollinearityn_samples = 100x1 = np.random.randn(n_samples)x2 = np.random.randn(n_samples)x3 = 2*x1 + 3*x2 + 0.01*np.random.randn(n_samples) # Nearly dependent X = np.column_stack([x1, x2, x3])print("Multicollinearity detection:")print(f"Feature matrix rank: {np.linalg.matrix_rank(X)}")print(f"Number of features: {X.shape[1]}") # Condition number (large = ill-conditioned = multicollinearity)cond_num = np.linalg.cond(X)print(f"Condition number: {cond_num:.2f} (high indicates near-dependence)") # Example 2: Understanding linear layer expressivityinput_dim = 100hidden_dim = 50 # Weight matrix with potentially reduced rankW_full_rank = np.random.randn(hidden_dim, input_dim)W_low_rank = np.random.randn(hidden_dim, 10) @ np.random.randn(10, input_dim) # Rank <= 10 print(f"\nLinear layer expressivity:")print(f"Full rank W: rank = {np.linalg.matrix_rank(W_full_rank)}")print(f"Low rank W: rank = {np.linalg.matrix_rank(W_low_rank)}") # The low-rank layer can only produce outputs in a 10-dimensional subspace# even though the output space is 50-dimensional # Example 3: PCA and effective dimensionalityfrom sklearn.decomposition import PCA # Generate data with intrinsic dimension 5 in 20D spacetrue_dim = 5ambient_dim = 20n_samples = 500 # Data lies in a 5D subspacelow_dim_data = np.random.randn(n_samples, true_dim)projection = np.random.randn(true_dim, ambient_dim)data = low_dim_data @ projection + 0.1 * np.random.randn(n_samples, ambient_dim) pca = PCA()pca.fit(data) print(f"\nPCA variance explained:")cumulative_var = np.cumsum(pca.explained_variance_ratio_)for i, var in enumerate(cumulative_var[:10]): print(f" {i+1} components: {var:.4f} variance")print(f"Effective dimension (>99% variance): {np.searchsorted(cumulative_var, 0.99) + 1}")Several fundamental properties and theorems govern span and linear independence.
Key properties:
Key theorems:
Theorem (Dimension Bound): In $\mathbb{R}^n$, any set of more than $n$ vectors must be linearly dependent.
Theorem (Unique Representation): If ${\mathbf{v}_1, \ldots, \mathbf{v}_k}$ is linearly independent and $\mathbf{w}$ is in their span, then the coefficients expressing $\mathbf{w}$ as a linear combination are unique.
Theorem (Span Absorption): If $\mathbf{w}$ is in the span of ${\mathbf{v}_1, \ldots, \mathbf{v}_k}$, then: $$\text{span}(\mathbf{v}_1, \ldots, \mathbf{v}_k, \mathbf{w}) = \text{span}(\mathbf{v}_1, \ldots, \mathbf{v}_k)$$
Adding vectors already in the span doesn't expand it.
Theorem (Extension): Any independent set in a finite-dimensional vector space can be extended to a basis by adding vectors.
If S is an independent set and B is a spanning set, then |S| ≤ |B|. This implies all bases have the same size (defining the dimension), and that you can't have more independent vectors than a spanning set has.
Span and linear independence are the two pillars supporting all of linear algebra. Together, they define what vectors can represent and how efficiently.
What's next:
Span and independence come together in the concept of vector spaces and subspaces. A basis is a set that is both spanning and independent—the minimal complete description of a space. The number of vectors in any basis defines the dimension. These ideas give vector spaces their fundamental structure.
You now understand span (what vectors can reach) and linear independence (when vectors are non-redundant). These concepts are essential for understanding bases, dimension, rank, and the solvability of linear systems—all critical for machine learning.