Machine LearningVectors and Vector Spaces

Vectors and Vector Spaces: The Language of Machine Learning

LevelBeginner

Duration90 mins

TopicVectors and Vector Spaces

3 / 5

Linear Combinations

Building Vectors from Vectors

The true power of vectors emerges when we combine them. A linear combination takes multiple vectors, scales each by a coefficient, and adds them together to produce a new vector. This seemingly simple operation is the engine driving nearly all of linear algebra and machine learning.

When a neural network processes input, it computes linear combinations of features. When PCA reduces dimensionality, it finds optimal linear combinations. When we solve linear equations or fit regression models, we're searching for the right linear combination. Understanding linear combinations deeply is essential for understanding how data transforms and flows through machine learning systems.

This page explores linear combinations rigorously—their definition, computation, geometric meaning, and role as the foundation for span and linear independence.

What You Will Learn

By the end of this page, you will understand the formal definition and computation of linear combinations, develop geometric intuition for how vectors combine, connect linear combinations to machine learning computations (weighted sums, neural layers), recognize linear combinations in matrix-vector multiplication, and understand why linear combinations are 'linear' and what that constraint means.

Definition of Linear Combination

Formal definition:

Given vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k$ (all in the same vector space $\mathbb{R}^n$) and scalars $c_1, c_2, \ldots, c_k$ (real numbers), a linear combination of these vectors is:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}k = \sum{i=1}^{k} c_i \mathbf{v}_i$$

The scalars $c_1, c_2, \ldots, c_k$ are called coefficients, weights, or coordinates (depending on context).

Example:

Given $\mathbf{v}_1 = (1, 0, 2)$ and $\mathbf{v}_2 = (0, 1, -1)$ with coefficients $c_1 = 3$ and $c_2 = -2$:

$$\mathbf{w} = 3\mathbf{v}_1 + (-2)\mathbf{v}_2 = 3(1, 0, 2) + (-2)(0, 1, -1)$$ $$= (3, 0, 6) + (0, -2, 2) = (3, -2, 8)$$

Why 'Linear'?

The combination is called 'linear' because it involves only two operations: scaling (multiplication by scalars) and addition. No squaring, no products of components, no nonlinear functions. This restriction to linear operations is what makes linear algebra tractable and is why linearity is so important throughout mathematics.

Key observations:

Any number of vectors: We can form linear combinations of 2, 3, 100, or any number of vectors
Any coefficients: Coefficients can be positive, negative, zero, fractional, or irrational
Result is a vector: A linear combination of vectors is always another vector in the same space
Trivial combination: Setting all coefficients to zero gives $\mathbf{0} = 0\mathbf{v}_1 + 0\mathbf{v}_2 + \cdots$
Any vector is a linear combination of itself: $\mathbf{v} = 1 \cdot \mathbf{v}$

linear_combinations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import numpy as np
 
# Simple linear combination example
v1 = np.array([1, 0, 2])
v2 = np.array([0, 1, -1])
c1, c2 = 3, -2
 
# Compute linear combination
w = c1 * v1 + c2 * v2
print(f"v1 = {v1}")
print(f"v2 = {v2}")
print(f"3*v1 + (-2)*v2 = {w}")  # [3, -2, 8]
 
# Step by step
print(f"\nStep by step:")
print(f"3 * v1 = {3 * v1}")
print(f"-2 * v2 = {-2 * v2}")
print(f"Sum = {3 * v1 + (-2) * v2}")
 
# More vectors
v3 = np.array([2, 2, 0])
c3 = 0.5
w2 = c1 * v1 + c2 * v2 + c3 * v3
print(f"\n3*v1 - 2*v2 + 0.5*v3 = {w2}")
 
# General function for linear combinations
def linear_combination(vectors, coefficients):
    """Compute linear combination of vectors with given coefficients."""
    assert len(vectors) == len(coefficients), "Must have same number of vectors and coefficients"
    result = np.zeros_like(vectors[0], dtype=float)
    for v, c in zip(vectors, coefficients):
        result += c * v
    return result
 
# Test the function
vectors = [v1, v2, v3]
coeffs = [1, 1, 1]
print(f"\nv1 + v2 + v3 = {linear_combination(vectors, coeffs)}")
 
# Using matrix form (more efficient)
V = np.column_stack(vectors)  # Vectors as columns
c = np.array(coeffs)
print(f"Matrix form result: {V @ c}")

Geometric Interpretation

Linear combinations have a beautiful geometric interpretation that provides intuition even in high dimensions.

Two vectors in 2D:

Consider two non-parallel vectors $\mathbf{v}_1$ and $\mathbf{v}_2$ in $\mathbb{R}^2$. Their linear combination:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2$$

can reach any point in the plane by choosing appropriate $c_1$ and $c_2$.

Visualization process:

Start at the origin
Move $c_1$ times in the direction of $\mathbf{v}_1$ (scaled by $c_1$)
Then move $c_2$ times in the direction of $\mathbf{v}_2$ (scaled by $c_2$)
The final position is the linear combination

By varying $c_1$ and $c_2$ over all real numbers, we sweep out the entire 2D plane.

The Parallelogram Picture

For any target point, we can find coefficients by completing a parallelogram with sides parallel to v₁ and v₂. The coefficients tell us how many 'units' of each vector direction we need. This is why non-parallel vectors are essential—parallel vectors can only reach a line, not the full plane.

What linear combinations can reach:

The set of all vectors reachable by linear combinations of given vectors defines what we call the span (covered in detail next page). For now, key observations:

With 1 non-zero vector in $\mathbb{R}^n$:

Linear combinations form a line through the origin
We can move forward ($c > 0$) or backward ($c < 0$) along the vector's direction

With 2 non-parallel vectors in $\mathbb{R}^n$:

Linear combinations form a plane through the origin
We can reach any point on that 2D plane within the n-dimensional space

With 3 non-coplanar vectors in $\mathbb{R}^n$:

Linear combinations form a 3D subspace through the origin
And so on for more vectors...

Constraint: Through the origin

Linear combinations always pass through the origin (choosing all coefficients = 0). This is a defining property—we can't reach points "offset" from the origin using only linear combinations of vectors based at the origin.

Geometric Reach of Linear Combinations
Vectors	Condition	Reach (in ℝⁿ)
1 vector	Non-zero	Line through origin
2 vectors	Not parallel (linearly independent)	Plane through origin
3 vectors	Not coplanar (linearly independent)	3D subspace through origin
k vectors	Linearly independent	k-dimensional subspace through origin
n vectors in ℝⁿ	Linearly independent (basis)	Entire ℝⁿ

geometric_lincomb.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
import matplotlib.pyplot as plt
 
# Demonstration: expressing any 2D point as linear combination
v1 = np.array([1, 0])  # x-axis direction
v2 = np.array([0, 1])  # y-axis direction
 
# Any point (a, b) is exactly a*v1 + b*v2
target = np.array([3, 2])
c1, c2 = 3, 2  # Coefficients equal to coordinates!
result = c1 * v1 + c2 * v2
print(f"Target: {target}")
print(f"Reconstructed: {result}")
print(f"Match: {np.allclose(target, result)}")
 
# With non-standard basis vectors
v1 = np.array([2, 1])
v2 = np.array([1, 3])
 
# Find coefficients to reach (5, 11)
target = np.array([5, 11])
 
# We need to solve: c1*v1 + c2*v2 = target
# This is a system of linear equations
V = np.column_stack([v1, v2])
coeffs = np.linalg.solve(V, target)
print(f"\nTo reach {target} using v1={v1}, v2={v2}:")
print(f"Coefficients: c1={coeffs[0]:.4f}, c2={coeffs[1]:.4f}")
print(f"Verification: c1*v1 + c2*v2 = {coeffs[0]*v1 + coeffs[1]*v2}")
 
# What happens with parallel vectors?
v1 = np.array([1, 2])
v2 = np.array([2, 4])  # v2 = 2 * v1 (parallel!)
 
# Can only reach points on the line through v1
target_on_line = np.array([3, 6])  # = 3 * v1, reachable
target_off_line = np.array([1, 1])  # Not on line v1, unreachable!
 
print(f"\nWith parallel vectors v1={v1}, v2={v2}:")
print(f"{target_on_line} = 3*v1 (reachable)")
print(f"{target_off_line} is OFF the line v1 (unreachable by linear combo)")

Linear Combinations in Machine Learning

Linear combinations are everywhere in machine learning. Recognizing them helps you understand what models are actually computing.

Neural Network Layers:

Each neuron computes a linear combination of its inputs (followed by a nonlinear activation):

$$z = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b = \mathbf{w}^\top \mathbf{x} + b$$

The weights $\mathbf{w}$ are coefficients, and the inputs $\mathbf{x}$ are the vectors being combined (or vice versa, depending on perspective).

Fully Connected Layer:

A layer with multiple neurons computes multiple linear combinations simultaneously:

$$\mathbf{z} = W \mathbf{x} + \mathbf{b}$$

Each row of $W$ defines one linear combination—the weights for one output neuron.

Why We Need Nonlinearity

A linear combination of linear combinations is still just a linear combination! Without activation functions, stacking layers would collapse to a single linear transformation. Nonlinear activations break this collapse, enabling neural networks to approximate complex functions.

More ML contexts:

Linear Combinations Throughout Machine Learning
ML Context	What's Combined	Coefficients	Result
Linear Regression	Feature values	Model weights	Prediction
PCA	Original features	Principal component loadings	Reduced features
Word Embeddings	One-hot vectors	Embedding matrix rows	Dense word vector
Attention	Value vectors	Attention weights	Context vector
Ensemble Methods	Base model predictions	Ensemble weights	Final prediction
Kernel Methods	Training examples	Dual coefficients (α)	Decision boundary

Linear Regression as Linear Combination:

$$\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + \cdots + w_n x_n$$

This is a linear combination of features $[1, x_1, x_2, \ldots, x_n]$ (including the bias term as $x_0 = 1$) with coefficients $[w_0, w_1, \ldots, w_n]$.

Attention Mechanism:

Transformer attention computes a weighted average of value vectors:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right) V$$

The softmax output provides coefficients (attention weights) for linearly combining value vectors. Each output is a linear combination of all value vectors, weighted by relevance.

ml_linear_combinations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import numpy as np
 
# Linear regression prediction as linear combination
def linear_regression_predict(X, weights, bias):
    """Prediction is linear combination of features."""
    return X @ weights + bias
 
# Example: house price prediction
# Features: [sqft, bedrooms, age]
X = np.array([
    [1500, 3, 10],
    [2000, 4, 5],
    [1200, 2, 20],
])
weights = np.array([0.1, 10.0, -0.5])  # Learned weights
bias = 50  # Learned bias
 
predictions = linear_regression_predict(X, weights, bias)
print("Linear Regression as Linear Combination:")
print(f"Features shape: {X.shape}")
print(f"Weights: {weights}")
print(f"Predictions: {predictions}")
 
# Single prediction decomposed
x = X[0]
print(f"\nFor house with features {x}:")
print(f"  0.1 * {x[0]} (sqft term) = {0.1 * x[0]}")
print(f"  + 10 * {x[1]} (bedroom term) = {10 * x[1]}")
print(f"  + -0.5 * {x[2]} (age term) = {-0.5 * x[2]}")
print(f"  + {bias} (bias) = {bias}")
print(f"  = {predictions[0]}")
 
# Neural network layer as linear combinations
def dense_layer(X, W, b):
    """Each output neuron is a linear combination of inputs."""
    return X @ W + b  # Plus nonlinearity in practice
 
# Input: 3 features, Output: 4 neurons
X = np.array([[1, 2, 3]])  # 1 sample, 3 features
W = np.random.randn(3, 4)  # 3 input, 4 output
b = np.zeros(4)
 
output = dense_layer(X, W, b)
print(f"\nNeural Network Layer:")
print(f"Input shape: {X.shape}")
print(f"Weight shape: {W.shape}")
print(f"Output shape: {output.shape}")
print(f"Each output = linear combination of 3 inputs")
 
# Simplified attention: weighted combination of values
values = np.array([
    [1, 0],
    [0, 1],
    [1, 1],
])  # 3 value vectors
attention_weights = np.array([0.5, 0.3, 0.2])  # Softmax output
 
context = attention_weights @ values
print(f"\nAttention as Linear Combination:")
print(f"Values:\n{values}")
print(f"Attention weights: {attention_weights}")
print(f"Context vector (weighted sum): {context}")

Linear Combinations as Matrix-Vector Multiplication

There's a profound connection between linear combinations and matrix-vector multiplication. Understanding this connection illuminates both concepts.

Column view of matrix-vector multiplication:

When we multiply a matrix $A$ by a vector $\mathbf{x}$:

$$A\mathbf{x} = \begin{bmatrix} | & | & & | \ \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_n \ | & | & & | \end{bmatrix} \begin{bmatrix} x_1 \ x_2 \ \vdots \ x_n \end{bmatrix} = x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + \cdots + x_n \mathbf{a}_n$$

The result is a linear combination of the columns of $A$, with coefficients from $\mathbf{x}$.

This is called the column picture of matrix-vector multiplication.

This Is Fundamental

The insight that Ax is a linear combination of A's columns is one of the most important ideas in linear algebra. It means the columns of A determine what outputs are possible—the range of the transformation. This perspective is essential for understanding why matrices have ranks and why some systems have no solution.

Example:

$$\begin{bmatrix} 1 & 3 \ 2 & 1 \ 0 & 2 \end{bmatrix} \begin{bmatrix} 2 \ 4 \end{bmatrix} = 2 \begin{bmatrix} 1 \ 2 \ 0 \end{bmatrix} + 4 \begin{bmatrix} 3 \ 1 \ 2 \end{bmatrix} = \begin{bmatrix} 2 \ 4 \ 0 \end{bmatrix} + \begin{bmatrix} 12 \ 4 \ 8 \end{bmatrix} = \begin{bmatrix} 14 \ 8 \ 8 \end{bmatrix}$$

Row view (for comparison):

The more familiar "row view" computes each output as a dot product:

$$(A\mathbf{x})_i = \mathbf{a}_i^\top \cdot \mathbf{x} = \text{row } i \text{ dotted with } \mathbf{x}$$

Both views give the same answer, but the column view reveals the linear combination structure.

Why this matters:

Implications of the Column View

•Range of transformation: The set of all possible $A\mathbf{x}$ is exactly the set of all linear combinations of $A$'s columns
•Solvability: The equation $A\mathbf{x} = \mathbf{b}$ asks: 'Is $\mathbf{b}$ a linear combination of $A$'s columns?'
•Column space: The span of $A$'s columns is called the column space—a fundamental concept
•Neural network interpretation: Each layer's output is constrained to the column space of its weight matrix

matrix_vector_lincomb.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import numpy as np
 
# Matrix-vector multiplication as linear combination
A = np.array([
    [1, 3],
    [2, 1],
    [0, 2]
])
x = np.array([2, 4])
 
# Standard computation
result = A @ x
print(f"A @ x = {result}")
 
# Column view: linear combination of columns
col1 = A[:, 0]  # First column
col2 = A[:, 1]  # Second column
column_view = x[0] * col1 + x[1] * col2
print(f"\nColumn view:")
print(f"{x[0]} * {col1} + {x[1]} * {col2}")
print(f"= {x[0] * col1} + {x[1] * col2}")
print(f"= {column_view}")
 
# Row view: dot products
row_view = np.array([
    np.dot(A[0], x),
    np.dot(A[1], x),
    np.dot(A[2], x)
])
print(f"\nRow view:")
for i in range(3):
    print(f"Row {i} · x = {A[i]} · {x} = {row_view[i]}")
 
# All three methods give same answer
print(f"\nAll equal: {np.allclose(result, column_view) and np.allclose(result, row_view)}")
 
# The question: can we reach b with some linear combination?
b_reachable = result  # Same as A @ [2, 4]
b_unreachable = np.array([0, 0, 1])  # Is this in span of columns?
 
# For a 3x2 matrix, columns span at most a 2D plane in R^3
# Not all R^3 vectors are reachable!
print(f"\nColumn space of A spans a 2D plane in R^3")
print(f"{b_reachable} is reachable (linear combination with x={x})")
 
# Check if b is in column space (approximate)
# Full solution requires least squares or checking residual
x_approx, residuals, rank, s = np.linalg.lstsq(A, b_unreachable, rcond=None)
if len(residuals) > 0 and residuals[0] > 1e-10:
    print(f"{b_unreachable} is NOT exactly reachable (residual = {residuals[0]:.4f})")

Special Types of Linear Combinations

Certain restricted types of linear combinations have special names and properties that appear frequently in ML.

Affine combination:

An affine combination requires coefficients to sum to 1:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}k \quad \text{where } \sum{i=1}^{k} c_i = 1$$

Geometrically, affine combinations of two points give all points on the line through them (not just through the origin). Affine combinations of three non-collinear points give all points on the plane through them.

Example:

$(0.5)\mathbf{v}_1 + (0.5)\mathbf{v}_2$ is the midpoint
$(0.7)\mathbf{v}_1 + (0.3)\mathbf{v}_2$ is 70% of the way from $\mathbf{v}_2$ to $\mathbf{v}_1$

Linear vs. Affine

Linear combinations must pass through the origin. Affine combinations can pass through any point. When we add a bias term to a linear model (y = Wx + b), we're moving from linear to affine. The term 'linear regression' is technically a misnomer—it's affine regression!

Convex combination:

A convex combination requires coefficients to sum to 1 and all be non-negative:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}k \quad \text{where } \sum{i=1}^{k} c_i = 1 \text{ and } c_i \geq 0 \text{ for all } i$$

Geometrically, convex combinations give points between the original vectors—inside the convex hull.

Example:

The convex combinations of two points give the line segment between them
The convex combinations of three points give the triangle with those vertices

Special Linear Combinations
Type	Coefficient Constraint	Geometric Meaning	ML Example
Linear combination	None	All reachable points through origin	Neural network layer (without bias)
Affine combination	$\sum c_i = 1$	All reachable points (any location)	Linear regression with bias
Convex combination	$\sum c_i = 1$, $c_i \geq 0$	Interior of convex hull	Mixture models, attention weights
Conic combination	$c_i \geq 0$	Cone with apex at origin	Non-negative matrix factorization

In machine learning:

Softmax output produces convex combination weights (non-negative, sum to 1)
Attention mechanisms compute convex combinations of value vectors
Mixture models express data as convex combinations of component distributions
Data augmentation via mixup creates convex combinations of training examples

special_combinations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
 
v1 = np.array([1, 0])
v2 = np.array([0, 1])
v3 = np.array([1, 1])
 
# Linear combination (no constraints)
linear = 2 * v1 + (-3) * v2  # Any coefficients
print(f"Linear combination: 2*v1 - 3*v2 = {linear}")
 
# Affine combination (sum to 1)
affine = 0.3 * v1 + 0.7 * v2  # 0.3 + 0.7 = 1
print(f"Affine combination: 0.3*v1 + 0.7*v2 = {affine}")
 
# But affine allows negative (still sums to 1)
affine_neg = 1.5 * v1 + (-0.5) * v2  # 1.5 - 0.5 = 1
print(f"Affine with negative: 1.5*v1 - 0.5*v2 = {affine_neg}")
 
# Convex combination (sum to 1, all non-negative)
convex = 0.4 * v1 + 0.35 * v2 + 0.25 * v3  # All >= 0, sum = 1
print(f"Convex combination: 0.4*v1 + 0.35*v2 + 0.25*v3 = {convex}")
 
# Attention as convex combination
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / exp_x.sum()
 
# Value vectors (e.g., from transformer)
values = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
 
# Raw attention scores
raw_scores = np.array([2.0, 1.0, 0.5])
attention_weights = softmax(raw_scores)
 
print(f"\nAttention mechanism:")
print(f"Attention weights: {attention_weights}")
print(f"Sum of weights: {attention_weights.sum():.4f}")
print(f"All non-negative: {np.all(attention_weights >= 0)}")
print(f"=> This is a convex combination!")
 
context = attention_weights @ values
print(f"Context vector: {context}")
 
# Mixup data augmentation (convex combination of samples)
x1 = np.array([100, 200, 300])  # One training sample
x2 = np.array([10, 20, 30])     # Another sample
lambda_mix = 0.7  # Mixup coefficient
 
x_mixed = lambda_mix * x1 + (1 - lambda_mix) * x2
print(f"\nMixup augmentation:")
print(f"Mixed sample: {x_mixed}")

Properties and Closure

Linear combinations preserve certain structures—a property called closure that's fundamental to linear algebra.

Closure property:

The set of all linear combinations of a set of vectors is closed under:

Vector addition: If $\mathbf{u}$ and $\mathbf{v}$ are in the set, so is $\mathbf{u} + \mathbf{v}$
Scalar multiplication: If $\mathbf{v}$ is in the set and $c$ is any scalar, so is $c\mathbf{v}$

In other words, the set of all linear combinations of given vectors is itself a vector space (or subspace of the ambient space).

Why this matters:

Closure means working within the span is "safe"—we can add and scale without leaving the set. This is why subspaces are natural objects in linear algebra.

Non-examples (illustrating closure failure):

The unit sphere ${\mathbf{v} : |\mathbf{v}| = 1}$ is NOT closed under addition (adding two unit vectors can give a longer vector)
The positive orthant ${\mathbf{v} : v_i > 0}$ is NOT closed under scalar multiplication (negative scalars leave the set)

Subspaces vs. Arbitrary Sets

A subspace must be closed under addition and scalar multiplication, and must contain the zero vector. This is more restrictive than an arbitrary set of vectors. Understanding when a set is a subspace (and when it isn't) is crucial for linear algebra.

Linearity of the linear combination operation:

The map that takes coefficients to their linear combination is linear:

If we define $T: \mathbb{R}^k \to \mathbb{R}^n$ by: $$T(\mathbf{c}) = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k$$

Then $T$ is a linear transformation:

$T(\mathbf{c} + \mathbf{d}) = T(\mathbf{c}) + T(\mathbf{d})$
$T(\alpha \mathbf{c}) = \alpha T(\mathbf{c})$

This linearity is precisely what makes linear combinations behave so predictably.

closure_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import numpy as np
 
# Demonstrate closure of span
v1 = np.array([1, 0, 0])
v2 = np.array([0, 1, 0])
 
# Any linear combination of v1, v2 is in the xy-plane (z=0)
lc1 = 3 * v1 + 2 * v2  # [3, 2, 0]
lc2 = -1 * v1 + 4 * v2  # [-1, 4, 0]
 
print("Closure demonstration:")
print(f"lc1 = {lc1}")
print(f"lc2 = {lc2}")
 
# Sum of two linear combinations is still in span
lc_sum = lc1 + lc2
print(f"lc1 + lc2 = {lc_sum}")
print(f"z-component is still 0: {lc_sum[2] == 0}")
 
# Scalar multiple of linear combination is still in span
lc_scaled = 5 * lc1
print(f"5 * lc1 = {lc_scaled}")
print(f"z-component is still 0: {lc_scaled[2] == 0}")
 
# Non-example: unit vectors are NOT closed
u1 = np.array([1, 0])
u2 = np.array([0, 1])
u_sum = u1 + u2
print(f"\nUnit vector closure failure:")
print(f"||u1|| = {np.linalg.norm(u1)}, ||u2|| = {np.linalg.norm(u2)}")
print(f"u1 + u2 = {u_sum}")
print(f"||u1 + u2|| = {np.linalg.norm(u_sum):.4f} ≠ 1")
 
# The zero vector is always in the span (all coefficients = 0)
zero = 0 * v1 + 0 * v2
print(f"\nZero vector in span: {zero}")

Computational and Numerical Considerations

Implementing linear combinations efficiently is crucial for ML performance.

Vectorized computation:

Never compute linear combinations with explicit Python loops. NumPy's vectorized operations are orders of magnitude faster:

# SLOW: explicit loop
result = np.zeros(n)
for i, (v, c) in enumerate(zip(vectors, coeffs)):
    result += c * v

# FAST: matrix multiplication
V = np.column_stack(vectors)  # Vectors as columns
result = V @ coeffs

The matrix form leverages optimized BLAS routines and parallelism.

Numerical Stability

When coefficients vary widely in magnitude (e.g., 1e-10 and 1e10), floating-point errors can accumulate. In such cases, consider: (1) scaling/normalizing vectors, (2) using higher precision (float64), (3) sorting terms by magnitude before summing, or (4) using compensated summation algorithms like Kahan summation.

Memory considerations:

Forming a matrix from vectors requires memory for all vectors simultaneously
For very large problems, consider streaming/online computation
Sparse vectors can be handled more efficiently with sparse matrix formats

Broadcasting in NumPy:

NumPy's broadcasting rules allow elegant linear combination expressions:

efficient_lincomb.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
import time
 
n_vectors = 100
dim = 1000
 
# Generate random vectors and coefficients
vectors = [np.random.randn(dim) for _ in range(n_vectors)]
coeffs = np.random.randn(n_vectors)
 
# Method 1: Explicit loop (SLOW)
start = time.perf_counter()
result_loop = np.zeros(dim)
for v, c in zip(vectors, coeffs):
    result_loop += c * v
loop_time = time.perf_counter() - start
 
# Method 2: Matrix multiplication (FAST)
V = np.column_stack(vectors)  # Shape: (dim, n_vectors)
start = time.perf_counter()
result_matrix = V @ coeffs
matrix_time = time.perf_counter() - start
 
print(f"Loop time: {loop_time*1000:.4f} ms")
print(f"Matrix time: {matrix_time*1000:.4f} ms")
print(f"Speedup: {loop_time/matrix_time:.1f}x")
print(f"Results match: {np.allclose(result_loop, result_matrix)}")
 
# Broadcasting example: weight each vector by its coefficient
# coeffs[:, np.newaxis] broadcasts across columns
V_shaped = np.array(vectors).T  # (dim, n_vectors)
coeffs_shaped = coeffs[np.newaxis, :]  # (1, n_vectors) for broadcasting
weighted_vectors = V_shaped * coeffs_shaped  # Each column scaled
result_broadcast = weighted_vectors.sum(axis=1)
print(f"Broadcast result matches: {np.allclose(result_loop, result_broadcast)}")
 
# einsum for complex operations
result_einsum = np.einsum('ij,j->i', V, coeffs)
print(f"Einsum result matches: {np.allclose(result_loop, result_einsum)}")

Summary: The Foundation for What's Next

Linear combinations are the fundamental building blocks of linear algebra and machine learning. We've covered their definition, geometric meaning, and computational aspects.

Key Takeaways

•A linear combination is the sum of scaled vectors: $c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots$
•Geometrically, linear combinations sweep out lines, planes, or higher-dimensional subspaces through the origin
•In ML, linear combinations appear everywhere: neural layers, regression, attention, embeddings, and more
•Matrix-vector multiplication is precisely a linear combination of the matrix's columns
•Special combinations (affine, convex, conic) add constraints on coefficients for different geometric/probabilistic meanings
•Closure means the set of all linear combinations is itself a vector space (subspace)
•Computation should be vectorized using matrix operations, not explicit loops

What's next:

Now we ask the crucial question: given a set of vectors, what can we reach with linear combinations (the span)? And when do we have enough vectors to reach everything, without redundancy (the concept of linear independence)? These ideas determine when systems of equations have solutions and when transformations are invertible.

Page Complete

You now understand linear combinations—the fundamental operation for building vectors from vectors. This concept is the bridge to span and linear independence, which tell us what vectors can represent and when vectors are 'redundant.'

3 / 5

Loading learning content...

Machine LearningVectors and Vector Spaces

Vectors and Vector Spaces: The Language of Machine Learning

LevelBeginner

Duration90 mins

TopicVectors and Vector Spaces

3 / 5

Linear Combinations

Building Vectors from Vectors

This page explores linear combinations rigorously—their definition, computation, geometric meaning, and role as the foundation for span and linear independence.

What You Will Learn

Definition of Linear Combination

Formal definition:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}k = \sum{i=1}^{k} c_i \mathbf{v}_i$$

The scalars $c_1, c_2, \ldots, c_k$ are called coefficients, weights, or coordinates (depending on context).

Example:

Given $\mathbf{v}_1 = (1, 0, 2)$ and $\mathbf{v}_2 = (0, 1, -1)$ with coefficients $c_1 = 3$ and $c_2 = -2$:

$$\mathbf{w} = 3\mathbf{v}_1 + (-2)\mathbf{v}_2 = 3(1, 0, 2) + (-2)(0, 1, -1)$$ $$= (3, 0, 6) + (0, -2, 2) = (3, -2, 8)$$

Why 'Linear'?

Key observations:

Any number of vectors: We can form linear combinations of 2, 3, 100, or any number of vectors
Any coefficients: Coefficients can be positive, negative, zero, fractional, or irrational
Result is a vector: A linear combination of vectors is always another vector in the same space
Trivial combination: Setting all coefficients to zero gives $\mathbf{0} = 0\mathbf{v}_1 + 0\mathbf{v}_2 + \cdots$
Any vector is a linear combination of itself: $\mathbf{v} = 1 \cdot \mathbf{v}$

linear_combinations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import numpy as np
 
# Simple linear combination example
v1 = np.array([1, 0, 2])
v2 = np.array([0, 1, -1])
c1, c2 = 3, -2
 
# Compute linear combination
w = c1 * v1 + c2 * v2
print(f"v1 = {v1}")
print(f"v2 = {v2}")
print(f"3*v1 + (-2)*v2 = {w}")  # [3, -2, 8]
 
# Step by step
print(f"\nStep by step:")
print(f"3 * v1 = {3 * v1}")
print(f"-2 * v2 = {-2 * v2}")
print(f"Sum = {3 * v1 + (-2) * v2}")
 
# More vectors
v3 = np.array([2, 2, 0])
c3 = 0.5
w2 = c1 * v1 + c2 * v2 + c3 * v3
print(f"\n3*v1 - 2*v2 + 0.5*v3 = {w2}")
 
# General function for linear combinations
def linear_combination(vectors, coefficients):
    """Compute linear combination of vectors with given coefficients."""
    assert len(vectors) == len(coefficients), "Must have same number of vectors and coefficients"
    result = np.zeros_like(vectors[0], dtype=float)
    for v, c in zip(vectors, coefficients):
        result += c * v
    return result
 
# Test the function
vectors = [v1, v2, v3]
coeffs = [1, 1, 1]
print(f"\nv1 + v2 + v3 = {linear_combination(vectors, coeffs)}")
 
# Using matrix form (more efficient)
V = np.column_stack(vectors)  # Vectors as columns
c = np.array(coeffs)
print(f"Matrix form result: {V @ c}")

Geometric Interpretation

Linear combinations have a beautiful geometric interpretation that provides intuition even in high dimensions.

Two vectors in 2D:

Consider two non-parallel vectors $\mathbf{v}_1$ and $\mathbf{v}_2$ in $\mathbb{R}^2$. Their linear combination:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2$$

can reach any point in the plane by choosing appropriate $c_1$ and $c_2$.

Visualization process:

Start at the origin
Move $c_1$ times in the direction of $\mathbf{v}_1$ (scaled by $c_1$)
Then move $c_2$ times in the direction of $\mathbf{v}_2$ (scaled by $c_2$)
The final position is the linear combination

By varying $c_1$ and $c_2$ over all real numbers, we sweep out the entire 2D plane.

The Parallelogram Picture

What linear combinations can reach:

The set of all vectors reachable by linear combinations of given vectors defines what we call the span (covered in detail next page). For now, key observations:

With 1 non-zero vector in $\mathbb{R}^n$:

Linear combinations form a line through the origin
We can move forward ($c > 0$) or backward ($c < 0$) along the vector's direction

With 2 non-parallel vectors in $\mathbb{R}^n$:

Linear combinations form a plane through the origin
We can reach any point on that 2D plane within the n-dimensional space

With 3 non-coplanar vectors in $\mathbb{R}^n$:

Linear combinations form a 3D subspace through the origin
And so on for more vectors...

Constraint: Through the origin

Geometric Reach of Linear Combinations
Vectors	Condition	Reach (in ℝⁿ)
1 vector	Non-zero	Line through origin
2 vectors	Not parallel (linearly independent)	Plane through origin
3 vectors	Not coplanar (linearly independent)	3D subspace through origin
k vectors	Linearly independent	k-dimensional subspace through origin
n vectors in ℝⁿ	Linearly independent (basis)	Entire ℝⁿ

geometric_lincomb.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import numpy as np
import matplotlib.pyplot as plt
 
# Demonstration: expressing any 2D point as linear combination
v1 = np.array([1, 0])  # x-axis direction
v2 = np.array([0, 1])  # y-axis direction
 
# Any point (a, b) is exactly a*v1 + b*v2
target = np.array([3, 2])
c1, c2 = 3, 2  # Coefficients equal to coordinates!
result = c1 * v1 + c2 * v2
print(f"Target: {target}")
print(f"Reconstructed: {result}")
print(f"Match: {np.allclose(target, result)}")
 
# With non-standard basis vectors
v1 = np.array([2, 1])
v2 = np.array([1, 3])
 
# Find coefficients to reach (5, 11)
target = np.array([5, 11])
 
# We need to solve: c1*v1 + c2*v2 = target
# This is a system of linear equations
V = np.column_stack([v1, v2])
coeffs = np.linalg.solve(V, target)
print(f"\nTo reach {target} using v1={v1}, v2={v2}:")
print(f"Coefficients: c1={coeffs[0]:.4f}, c2={coeffs[1]:.4f}")
print(f"Verification: c1*v1 + c2*v2 = {coeffs[0]*v1 + coeffs[1]*v2}")
 
# What happens with parallel vectors?
v1 = np.array([1, 2])
v2 = np.array([2, 4])  # v2 = 2 * v1 (parallel!)
 
# Can only reach points on the line through v1
target_on_line = np.array([3, 6])  # = 3 * v1, reachable
target_off_line = np.array([1, 1])  # Not on line v1, unreachable!
 
print(f"\nWith parallel vectors v1={v1}, v2={v2}:")
print(f"{target_on_line} = 3*v1 (reachable)")
print(f"{target_off_line} is OFF the line v1 (unreachable by linear combo)")

Linear Combinations in Machine Learning

Linear combinations are everywhere in machine learning. Recognizing them helps you understand what models are actually computing.

Neural Network Layers:

Each neuron computes a linear combination of its inputs (followed by a nonlinear activation):

$$z = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b = \mathbf{w}^\top \mathbf{x} + b$$

The weights $\mathbf{w}$ are coefficients, and the inputs $\mathbf{x}$ are the vectors being combined (or vice versa, depending on perspective).

Fully Connected Layer:

A layer with multiple neurons computes multiple linear combinations simultaneously:

$$\mathbf{z} = W \mathbf{x} + \mathbf{b}$$

Each row of $W$ defines one linear combination—the weights for one output neuron.

Why We Need Nonlinearity

More ML contexts:

Linear Combinations Throughout Machine Learning
ML Context	What's Combined	Coefficients	Result
Linear Regression	Feature values	Model weights	Prediction
PCA	Original features	Principal component loadings	Reduced features
Word Embeddings	One-hot vectors	Embedding matrix rows	Dense word vector
Attention	Value vectors	Attention weights	Context vector
Ensemble Methods	Base model predictions	Ensemble weights	Final prediction
Kernel Methods	Training examples	Dual coefficients (α)	Decision boundary

Linear Regression as Linear Combination:

$$\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + \cdots + w_n x_n$$

This is a linear combination of features $[1, x_1, x_2, \ldots, x_n]$ (including the bias term as $x_0 = 1$) with coefficients $[w_0, w_1, \ldots, w_n]$.

Attention Mechanism:

Transformer attention computes a weighted average of value vectors:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right) V$$

The softmax output provides coefficients (attention weights) for linearly combining value vectors. Each output is a linear combination of all value vectors, weighted by relevance.

ml_linear_combinations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import numpy as np
 
# Linear regression prediction as linear combination
def linear_regression_predict(X, weights, bias):
    """Prediction is linear combination of features."""
    return X @ weights + bias
 
# Example: house price prediction
# Features: [sqft, bedrooms, age]
X = np.array([
    [1500, 3, 10],
    [2000, 4, 5],
    [1200, 2, 20],
])
weights = np.array([0.1, 10.0, -0.5])  # Learned weights
bias = 50  # Learned bias
 
predictions = linear_regression_predict(X, weights, bias)
print("Linear Regression as Linear Combination:")
print(f"Features shape: {X.shape}")
print(f"Weights: {weights}")
print(f"Predictions: {predictions}")
 
# Single prediction decomposed
x = X[0]
print(f"\nFor house with features {x}:")
print(f"  0.1 * {x[0]} (sqft term) = {0.1 * x[0]}")
print(f"  + 10 * {x[1]} (bedroom term) = {10 * x[1]}")
print(f"  + -0.5 * {x[2]} (age term) = {-0.5 * x[2]}")
print(f"  + {bias} (bias) = {bias}")
print(f"  = {predictions[0]}")
 
# Neural network layer as linear combinations
def dense_layer(X, W, b):
    """Each output neuron is a linear combination of inputs."""
    return X @ W + b  # Plus nonlinearity in practice
 
# Input: 3 features, Output: 4 neurons
X = np.array([[1, 2, 3]])  # 1 sample, 3 features
W = np.random.randn(3, 4)  # 3 input, 4 output
b = np.zeros(4)
 
output = dense_layer(X, W, b)
print(f"\nNeural Network Layer:")
print(f"Input shape: {X.shape}")
print(f"Weight shape: {W.shape}")
print(f"Output shape: {output.shape}")
print(f"Each output = linear combination of 3 inputs")
 
# Simplified attention: weighted combination of values
values = np.array([
    [1, 0],
    [0, 1],
    [1, 1],
])  # 3 value vectors
attention_weights = np.array([0.5, 0.3, 0.2])  # Softmax output
 
context = attention_weights @ values
print(f"\nAttention as Linear Combination:")
print(f"Values:\n{values}")
print(f"Attention weights: {attention_weights}")
print(f"Context vector (weighted sum): {context}")

Linear Combinations as Matrix-Vector Multiplication

There's a profound connection between linear combinations and matrix-vector multiplication. Understanding this connection illuminates both concepts.

Column view of matrix-vector multiplication:

When we multiply a matrix $A$ by a vector $\mathbf{x}$:

The result is a linear combination of the columns of $A$, with coefficients from $\mathbf{x}$.

This is called the column picture of matrix-vector multiplication.

This Is Fundamental

Example:

Row view (for comparison):

The more familiar "row view" computes each output as a dot product:

$$(A\mathbf{x})_i = \mathbf{a}_i^\top \cdot \mathbf{x} = \text{row } i \text{ dotted with } \mathbf{x}$$

Both views give the same answer, but the column view reveals the linear combination structure.

Why this matters:

Implications of the Column View

•Range of transformation: The set of all possible $A\mathbf{x}$ is exactly the set of all linear combinations of $A$'s columns
•Solvability: The equation $A\mathbf{x} = \mathbf{b}$ asks: 'Is $\mathbf{b}$ a linear combination of $A$'s columns?'
•Column space: The span of $A$'s columns is called the column space—a fundamental concept
•Neural network interpretation: Each layer's output is constrained to the column space of its weight matrix

matrix_vector_lincomb.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import numpy as np
 
# Matrix-vector multiplication as linear combination
A = np.array([
    [1, 3],
    [2, 1],
    [0, 2]
])
x = np.array([2, 4])
 
# Standard computation
result = A @ x
print(f"A @ x = {result}")
 
# Column view: linear combination of columns
col1 = A[:, 0]  # First column
col2 = A[:, 1]  # Second column
column_view = x[0] * col1 + x[1] * col2
print(f"\nColumn view:")
print(f"{x[0]} * {col1} + {x[1]} * {col2}")
print(f"= {x[0] * col1} + {x[1] * col2}")
print(f"= {column_view}")
 
# Row view: dot products
row_view = np.array([
    np.dot(A[0], x),
    np.dot(A[1], x),
    np.dot(A[2], x)
])
print(f"\nRow view:")
for i in range(3):
    print(f"Row {i} · x = {A[i]} · {x} = {row_view[i]}")
 
# All three methods give same answer
print(f"\nAll equal: {np.allclose(result, column_view) and np.allclose(result, row_view)}")
 
# The question: can we reach b with some linear combination?
b_reachable = result  # Same as A @ [2, 4]
b_unreachable = np.array([0, 0, 1])  # Is this in span of columns?
 
# For a 3x2 matrix, columns span at most a 2D plane in R^3
# Not all R^3 vectors are reachable!
print(f"\nColumn space of A spans a 2D plane in R^3")
print(f"{b_reachable} is reachable (linear combination with x={x})")
 
# Check if b is in column space (approximate)
# Full solution requires least squares or checking residual
x_approx, residuals, rank, s = np.linalg.lstsq(A, b_unreachable, rcond=None)
if len(residuals) > 0 and residuals[0] > 1e-10:
    print(f"{b_unreachable} is NOT exactly reachable (residual = {residuals[0]:.4f})")

Special Types of Linear Combinations

Certain restricted types of linear combinations have special names and properties that appear frequently in ML.

Affine combination:

An affine combination requires coefficients to sum to 1:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}k \quad \text{where } \sum{i=1}^{k} c_i = 1$$

Example:

$(0.5)\mathbf{v}_1 + (0.5)\mathbf{v}_2$ is the midpoint
$(0.7)\mathbf{v}_1 + (0.3)\mathbf{v}_2$ is 70% of the way from $\mathbf{v}_2$ to $\mathbf{v}_1$

Linear vs. Affine

Convex combination:

A convex combination requires coefficients to sum to 1 and all be non-negative:

$$\mathbf{w} = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}k \quad \text{where } \sum{i=1}^{k} c_i = 1 \text{ and } c_i \geq 0 \text{ for all } i$$

Geometrically, convex combinations give points between the original vectors—inside the convex hull.

Example:

The convex combinations of two points give the line segment between them
The convex combinations of three points give the triangle with those vertices

Special Linear Combinations
Type	Coefficient Constraint	Geometric Meaning	ML Example
Linear combination	None	All reachable points through origin	Neural network layer (without bias)
Affine combination	$\sum c_i = 1$	All reachable points (any location)	Linear regression with bias
Convex combination	$\sum c_i = 1$, $c_i \geq 0$	Interior of convex hull	Mixture models, attention weights
Conic combination	$c_i \geq 0$	Cone with apex at origin	Non-negative matrix factorization

In machine learning:

Softmax output produces convex combination weights (non-negative, sum to 1)
Attention mechanisms compute convex combinations of value vectors
Mixture models express data as convex combinations of component distributions
Data augmentation via mixup creates convex combinations of training examples

special_combinations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
 
v1 = np.array([1, 0])
v2 = np.array([0, 1])
v3 = np.array([1, 1])
 
# Linear combination (no constraints)
linear = 2 * v1 + (-3) * v2  # Any coefficients
print(f"Linear combination: 2*v1 - 3*v2 = {linear}")
 
# Affine combination (sum to 1)
affine = 0.3 * v1 + 0.7 * v2  # 0.3 + 0.7 = 1
print(f"Affine combination: 0.3*v1 + 0.7*v2 = {affine}")
 
# But affine allows negative (still sums to 1)
affine_neg = 1.5 * v1 + (-0.5) * v2  # 1.5 - 0.5 = 1
print(f"Affine with negative: 1.5*v1 - 0.5*v2 = {affine_neg}")
 
# Convex combination (sum to 1, all non-negative)
convex = 0.4 * v1 + 0.35 * v2 + 0.25 * v3  # All >= 0, sum = 1
print(f"Convex combination: 0.4*v1 + 0.35*v2 + 0.25*v3 = {convex}")
 
# Attention as convex combination
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / exp_x.sum()
 
# Value vectors (e.g., from transformer)
values = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
 
# Raw attention scores
raw_scores = np.array([2.0, 1.0, 0.5])
attention_weights = softmax(raw_scores)
 
print(f"\nAttention mechanism:")
print(f"Attention weights: {attention_weights}")
print(f"Sum of weights: {attention_weights.sum():.4f}")
print(f"All non-negative: {np.all(attention_weights >= 0)}")
print(f"=> This is a convex combination!")
 
context = attention_weights @ values
print(f"Context vector: {context}")
 
# Mixup data augmentation (convex combination of samples)
x1 = np.array([100, 200, 300])  # One training sample
x2 = np.array([10, 20, 30])     # Another sample
lambda_mix = 0.7  # Mixup coefficient
 
x_mixed = lambda_mix * x1 + (1 - lambda_mix) * x2
print(f"\nMixup augmentation:")
print(f"Mixed sample: {x_mixed}")

Properties and Closure

Linear combinations preserve certain structures—a property called closure that's fundamental to linear algebra.

Closure property:

The set of all linear combinations of a set of vectors is closed under:

Vector addition: If $\mathbf{u}$ and $\mathbf{v}$ are in the set, so is $\mathbf{u} + \mathbf{v}$
Scalar multiplication: If $\mathbf{v}$ is in the set and $c$ is any scalar, so is $c\mathbf{v}$

In other words, the set of all linear combinations of given vectors is itself a vector space (or subspace of the ambient space).

Why this matters:

Closure means working within the span is "safe"—we can add and scale without leaving the set. This is why subspaces are natural objects in linear algebra.

Non-examples (illustrating closure failure):

The unit sphere ${\mathbf{v} : |\mathbf{v}| = 1}$ is NOT closed under addition (adding two unit vectors can give a longer vector)
The positive orthant ${\mathbf{v} : v_i > 0}$ is NOT closed under scalar multiplication (negative scalars leave the set)

Subspaces vs. Arbitrary Sets

Linearity of the linear combination operation:

The map that takes coefficients to their linear combination is linear:

If we define $T: \mathbb{R}^k \to \mathbb{R}^n$ by: $$T(\mathbf{c}) = c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k$$

Then $T$ is a linear transformation:

$T(\mathbf{c} + \mathbf{d}) = T(\mathbf{c}) + T(\mathbf{d})$
$T(\alpha \mathbf{c}) = \alpha T(\mathbf{c})$

This linearity is precisely what makes linear combinations behave so predictably.

closure_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import numpy as np
 
# Demonstrate closure of span
v1 = np.array([1, 0, 0])
v2 = np.array([0, 1, 0])
 
# Any linear combination of v1, v2 is in the xy-plane (z=0)
lc1 = 3 * v1 + 2 * v2  # [3, 2, 0]
lc2 = -1 * v1 + 4 * v2  # [-1, 4, 0]
 
print("Closure demonstration:")
print(f"lc1 = {lc1}")
print(f"lc2 = {lc2}")
 
# Sum of two linear combinations is still in span
lc_sum = lc1 + lc2
print(f"lc1 + lc2 = {lc_sum}")
print(f"z-component is still 0: {lc_sum[2] == 0}")
 
# Scalar multiple of linear combination is still in span
lc_scaled = 5 * lc1
print(f"5 * lc1 = {lc_scaled}")
print(f"z-component is still 0: {lc_scaled[2] == 0}")
 
# Non-example: unit vectors are NOT closed
u1 = np.array([1, 0])
u2 = np.array([0, 1])
u_sum = u1 + u2
print(f"\nUnit vector closure failure:")
print(f"||u1|| = {np.linalg.norm(u1)}, ||u2|| = {np.linalg.norm(u2)}")
print(f"u1 + u2 = {u_sum}")
print(f"||u1 + u2|| = {np.linalg.norm(u_sum):.4f} ≠ 1")
 
# The zero vector is always in the span (all coefficients = 0)
zero = 0 * v1 + 0 * v2
print(f"\nZero vector in span: {zero}")

Computational and Numerical Considerations

Implementing linear combinations efficiently is crucial for ML performance.

Vectorized computation:

Never compute linear combinations with explicit Python loops. NumPy's vectorized operations are orders of magnitude faster:

# SLOW: explicit loop
result = np.zeros(n)
for i, (v, c) in enumerate(zip(vectors, coeffs)):
    result += c * v

# FAST: matrix multiplication
V = np.column_stack(vectors)  # Vectors as columns
result = V @ coeffs

The matrix form leverages optimized BLAS routines and parallelism.

Numerical Stability

Memory considerations:

Forming a matrix from vectors requires memory for all vectors simultaneously
For very large problems, consider streaming/online computation
Sparse vectors can be handled more efficiently with sparse matrix formats

Broadcasting in NumPy:

NumPy's broadcasting rules allow elegant linear combination expressions:

efficient_lincomb.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
import time
 
n_vectors = 100
dim = 1000
 
# Generate random vectors and coefficients
vectors = [np.random.randn(dim) for _ in range(n_vectors)]
coeffs = np.random.randn(n_vectors)
 
# Method 1: Explicit loop (SLOW)
start = time.perf_counter()
result_loop = np.zeros(dim)
for v, c in zip(vectors, coeffs):
    result_loop += c * v
loop_time = time.perf_counter() - start
 
# Method 2: Matrix multiplication (FAST)
V = np.column_stack(vectors)  # Shape: (dim, n_vectors)
start = time.perf_counter()
result_matrix = V @ coeffs
matrix_time = time.perf_counter() - start
 
print(f"Loop time: {loop_time*1000:.4f} ms")
print(f"Matrix time: {matrix_time*1000:.4f} ms")
print(f"Speedup: {loop_time/matrix_time:.1f}x")
print(f"Results match: {np.allclose(result_loop, result_matrix)}")
 
# Broadcasting example: weight each vector by its coefficient
# coeffs[:, np.newaxis] broadcasts across columns
V_shaped = np.array(vectors).T  # (dim, n_vectors)
coeffs_shaped = coeffs[np.newaxis, :]  # (1, n_vectors) for broadcasting
weighted_vectors = V_shaped * coeffs_shaped  # Each column scaled
result_broadcast = weighted_vectors.sum(axis=1)
print(f"Broadcast result matches: {np.allclose(result_loop, result_broadcast)}")
 
# einsum for complex operations
result_einsum = np.einsum('ij,j->i', V, coeffs)
print(f"Einsum result matches: {np.allclose(result_loop, result_einsum)}")

Summary: The Foundation for What's Next

Linear combinations are the fundamental building blocks of linear algebra and machine learning. We've covered their definition, geometric meaning, and computational aspects.

Key Takeaways

•A linear combination is the sum of scaled vectors: $c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots$
•Geometrically, linear combinations sweep out lines, planes, or higher-dimensional subspaces through the origin
•In ML, linear combinations appear everywhere: neural layers, regression, attention, embeddings, and more
•Matrix-vector multiplication is precisely a linear combination of the matrix's columns
•Special combinations (affine, convex, conic) add constraints on coefficients for different geometric/probabilistic meanings
•Closure means the set of all linear combinations is itself a vector space (subspace)
•Computation should be vectorized using matrix operations, not explicit loops

What's next:

Page Complete

3 / 5