Machine LearningVectors and Vector Spaces

Vectors and Vector Spaces: The Language of Machine Learning

LevelBeginner

Duration90 mins

TopicVectors and Vector Spaces

2 / 5

Vector Operations

Computing with Vectors

Vectors become powerful not just through their representation of data, but through the operations we can perform on them. These operations—addition, scaling, and the dot product—are the computational primitives underlying nearly every machine learning algorithm.

When a neural network combines inputs, it uses vector addition. When gradient descent updates weights, it uses scalar multiplication. When we measure similarity between embeddings, we use dot products. Understanding these operations deeply—both algebraically and geometrically—is essential for understanding how machine learning actually works.

This page establishes the fundamental vector operations, their properties, and their geometric interpretations.

What You Will Learn

By the end of this page, you will master vector addition and subtraction with geometric intuition, understand scalar multiplication and its effect on vectors, deeply understand the dot product—the most important operation in ML, connect these operations to machine learning computations, and implement all operations efficiently in NumPy.

Vector Addition

Vector addition combines two vectors of the same dimension into a single vector by adding corresponding components.

Algebraic definition:

Given two vectors $\mathbf{u} = (u_1, u_2, \ldots, u_n)$ and $\mathbf{v} = (v_1, v_2, \ldots, v_n)$, their sum is:

$$\mathbf{u} + \mathbf{v} = (u_1 + v_1, u_2 + v_2, \ldots, u_n + v_n)$$

For example: $$(3, -1, 4) + (2, 5, -1) = (3+2, -1+5, 4+(-1)) = (5, 4, 3)$$

Critical constraint: Vectors must have the same dimension to be added. Adding a 3D vector to a 2D vector is undefined.

Geometric interpretation:

Geometrically, vector addition follows the parallelogram rule (or equivalently, the tip-to-tail method):

Place both vectors at the same starting point (origin)
The sum is the diagonal of the parallelogram formed by the two vectors

Alternatively (tip-to-tail):

Draw vector $\mathbf{u}$ from the origin
Draw vector $\mathbf{v}$ starting from the tip of $\mathbf{u}$
The sum $\mathbf{u} + \mathbf{v}$ goes from the origin to the tip of $\mathbf{v}$

This corresponds to sequential displacement: first move by $\mathbf{u}$, then move by $\mathbf{v}$.

Addition in ML Context

In neural networks, addition combines information from different sources. When we add a bias vector to a weighted sum, or when residual connections add skip connections, we're using vector addition to merge representations. Understanding addition geometrically helps visualize how neural networks combine information.

Properties of vector addition:

Properties of Vector Addition
Property	Formula	Meaning
Commutativity	$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$	Order doesn't matter
Associativity	$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$	Grouping doesn't matter
Identity	$\mathbf{v} + \mathbf{0} = \mathbf{v}$	Adding zero changes nothing
Inverse	$\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$	Every vector has an opposite

Vector subtraction:

Subtraction is defined in terms of addition: $$\mathbf{u} - \mathbf{v} = \mathbf{u} + (-\mathbf{v})$$

where $-\mathbf{v} = (-v_1, -v_2, \ldots, -v_n)$ is the negation of $\mathbf{v}$.

Geometrically, $\mathbf{u} - \mathbf{v}$ is the vector from $\mathbf{v}$ to $\mathbf{u}$—the displacement needed to go from the point represented by $\mathbf{v}$ to the point represented by $\mathbf{u}$.

vector_addition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
 
# Vector addition
u = np.array([3, -1, 4])
v = np.array([2, 5, -1])
 
# Addition is component-wise
w = u + v
print(f"u = {u}")
print(f"v = {v}")
print(f"u + v = {w}")  # [5, 4, 3]
 
# Properties demonstration
zero = np.zeros(3)
print(f"\nIdentity: u + 0 = {u + zero}")
print(f"Inverse: u + (-u) = {u + (-u)}")
print(f"Commutativity: u + v = v + u? {np.array_equal(u + v, v + u)}")
 
# Subtraction
diff = u - v
print(f"\nu - v = {diff}")  # [1, -6, 5]
 
# Geometric: distance between points
point_a = np.array([1, 2])
point_b = np.array([4, 6])
displacement = point_b - point_a
print(f"\nDisplacement from A to B: {displacement}")  # [3, 4]
print(f"Distance: {np.linalg.norm(displacement)}")  # 5.0
 
# Dimension mismatch (will raise error - commented for safety)
# u_2d = np.array([1, 2])
# u_3d = np.array([1, 2, 3])
# invalid = u_2d + u_3d  # ValueError: operands could not be broadcast

Scalar Multiplication

Scalar multiplication multiplies a vector by a scalar (a single number), scaling each component uniformly.

Algebraic definition:

Given a scalar $c \in \mathbb{R}$ and a vector $\mathbf{v} = (v_1, v_2, \ldots, v_n)$, their product is:

$$c\mathbf{v} = (cv_1, cv_2, \ldots, cv_n)$$

For example: $$3 \cdot (2, -1, 5) = (6, -3, 15)$$

Geometric interpretation:

Scalar multiplication scales the vector:

If $|c| > 1$: the vector is stretched (made longer)
If $|c| < 1$: the vector is compressed (made shorter)
If $c < 0$: the vector is reversed in direction
If $c = 0$: the result is the zero vector
If $c = 1$: the vector is unchanged

The direction is preserved for positive $c$ and reversed for negative $c$. The magnitude is multiplied by $|c|$: $$|c\mathbf{v}| = |c| \cdot |\mathbf{v}|$$

Scaling in ML

Scalar multiplication is ubiquitous in ML. Learning rates scale gradients. Regularization coefficients scale penalty terms. Attention scores scale value vectors. Batch normalization scales activations. Every weight in a neural network performs scalar multiplication on individual components.

Properties of scalar multiplication:

Properties of Scalar Multiplication
Property	Formula	Meaning
Distributive (vectors)	$c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$	Scale distributes over addition
Distributive (scalars)	$(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$	Scalars add, then multiply
Associative	$c(d\mathbf{v}) = (cd)\mathbf{v}$	Multiply scalars, then scale once
Identity	$1 \cdot \mathbf{v} = \mathbf{v}$	Scaling by 1 changes nothing
Zero	$0 \cdot \mathbf{v} = \mathbf{0}$	Scaling by 0 gives zero vector

Combining addition and scalar multiplication:

With both operations, we can form linear combinations: $$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k$$

This is the most fundamental compound operation in linear algebra and the basis for nearly all computations in machine learning. We'll explore linear combinations deeply in the next section.

scalar_multiplication.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
 
# Scalar multiplication
v = np.array([2, -1, 5])
 
print(f"v = {v}")
print(f"3 * v = {3 * v}")      # [6, -3, 15]
print(f"0.5 * v = {0.5 * v}")  # [1, -0.5, 2.5]
print(f"-1 * v = {-1 * v}")    # [-2, 1, -5]
print(f"0 * v = {0 * v}")      # [0, 0, 0]
 
# Effect on magnitude
print(f"\n||v|| = {np.linalg.norm(v):.4f}")
print(f"||3v|| = {np.linalg.norm(3 * v):.4f}")  # 3 times larger
print(f"||0.5v|| = {np.linalg.norm(0.5 * v):.4f}")  # half
print(f"||-2v|| = {np.linalg.norm(-2 * v):.4f}")  # 2 times larger (not negative!)
 
# Normalization: scaling to unit length
v_normalized = v / np.linalg.norm(v)
print(f"\nNormalized v: {v_normalized}")
print(f"||normalized v|| = {np.linalg.norm(v_normalized):.10f}")
 
# Properties
u = np.array([1, 2, 3])
c, d = 2, 3
 
print(f"\nDistributive (vectors):")
print(f"c(u + v) = {c * (u + v)}")
print(f"cu + cv = {c * u + c * v}")
print(f"Equal? {np.allclose(c * (u + v), c * u + c * v)}")
 
print(f"\nDistributive (scalars):")
print(f"(c + d)v = {(c + d) * v}")
print(f"cv + dv = {c * v + d * v}")
print(f"Equal? {np.allclose((c + d) * v, c * v + d * v)}")

The Dot Product: The Most Important Operation

The dot product (also called the inner product or scalar product) is arguably the most important single operation in machine learning. It appears in matrix multiplication, neural network layers, attention mechanisms, similarity measures, and projections.

Algebraic definition:

Given two vectors $\mathbf{u} = (u_1, \ldots, u_n)$ and $\mathbf{v} = (v_1, \ldots, v_n)$ of the same dimension, their dot product is:

$$\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n$$

Crucially, the result is a scalar, not a vector.

For example: $$(3, 2, -1) \cdot (1, -2, 4) = 3(1) + 2(-2) + (-1)(4) = 3 - 4 - 4 = -5$$

Alternative notation:

The dot product is also written as:

$\langle \mathbf{u}, \mathbf{v} \rangle$ — Bracket notation (common in functional analysis)
$\mathbf{u}^\top \mathbf{v}$ — Matrix notation (transpose of row times column)

Dimension Must Match

Like addition, the dot product requires vectors of the same dimension. Computing the dot product of a 2D and 3D vector is undefined and will raise an error in NumPy.

Geometric interpretation:

The dot product has a beautiful geometric meaning:

$$\mathbf{u} \cdot \mathbf{v} = |\mathbf{u}| |\mathbf{v}| \cos(\theta)$$

where $\theta$ is the angle between the vectors.

This formula reveals that the dot product measures:

Alignment: How much the vectors point in the same direction
Magnitude influence: Longer vectors contribute more

Interpretation by sign:

$\mathbf{u} \cdot \mathbf{v} > 0$: Vectors point in similar directions (acute angle, $\theta < 90°$)
$\mathbf{u} \cdot \mathbf{v} = 0$: Vectors are orthogonal (perpendicular, $\theta = 90°$)
$\mathbf{u} \cdot \mathbf{v} < 0$: Vectors point in opposite directions (obtuse angle, $\theta > 90°$)

Why the Dot Product Matters in ML

•Similarity measurement: Cosine similarity (dot product of unit vectors) measures how similar two vectors are
•Neural network layers: Each neuron computes a dot product between weights and inputs
•Attention mechanisms: Query-key dot products determine attention scores in transformers
•Matrix multiplication: Can be viewed as rows dotted with columns
•Projections: The dot product enables projecting vectors onto subspaces
•Optimization: Gradients are checked for orthogonality using dot products

Properties of the dot product:

Properties of the Dot Product
Property	Formula	Meaning
Commutativity	$\mathbf{u} \cdot \mathbf{v} = \mathbf{v} \cdot \mathbf{u}$	Order doesn't matter
Distributivity	$\mathbf{u} \cdot (\mathbf{v} + \mathbf{w}) = \mathbf{u} \cdot \mathbf{v} + \mathbf{u} \cdot \mathbf{w}$	Distributes over addition
Scalar factoring	$(c\mathbf{u}) \cdot \mathbf{v} = c(\mathbf{u} \cdot \mathbf{v})$	Scalars factor out
Self-dot product	$\mathbf{v} \cdot \mathbf{v} = \|\mathbf{v}\|^2$	Dot with self gives squared magnitude
Zero with zero vector	$\mathbf{v} \cdot \mathbf{0} = 0$	Dotting with zero gives zero

dot_product.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np
 
# Dot product computation
u = np.array([3, 2, -1])
v = np.array([1, -2, 4])
 
# Method 1: np.dot
dot1 = np.dot(u, v)
print(f"u = {u}")
print(f"v = {v}")
print(f"u · v (np.dot) = {dot1}")  # -5
 
# Method 2: @ operator (matrix multiplication)
dot2 = u @ v
print(f"u · v (@ operator) = {dot2}")  # -5
 
# Method 3: Explicit sum
dot3 = np.sum(u * v)  # Element-wise multiply, then sum
print(f"u · v (sum of products) = {dot3}")  # -5
 
# Relationship to magnitude
print(f"\nv · v = {np.dot(v, v)}")
print(f"||v||² = {np.linalg.norm(v)**2}")
print(f"Equal? {np.isclose(np.dot(v, v), np.linalg.norm(v)**2)}")
 
# Computing angle between vectors
def angle_between(u, v):
    """Compute angle between vectors in radians."""
    cos_theta = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
    # Clamp to handle numerical errors
    cos_theta = np.clip(cos_theta, -1, 1)
    return np.arccos(cos_theta)
 
a = np.array([1, 0])
b = np.array([0, 1])
c = np.array([1, 1])
 
print(f"\nAngle between [1,0] and [0,1]: {np.degrees(angle_between(a, b))}°")  # 90°
print(f"Angle between [1,0] and [1,1]: {np.degrees(angle_between(a, c)):.2f}°")  # 45°
 
# Orthogonality check
print(f"\na · b = {np.dot(a, b)}")  # 0 - orthogonal!
 
# Sign indicates direction relationship
same_dir = np.array([1, 2])
opposite_dir = np.array([-2, -4])
print(f"\nVectors pointing similarly: {np.dot(np.array([1, 1]), np.array([2, 3]))} (positive)")
print(f"Vectors pointing oppositely: {np.dot(same_dir, opposite_dir)} (negative)")

Cosine Similarity: Measuring Vector Alignment

Cosine similarity is one of the most important similarity measures in machine learning. It measures the cosine of the angle between two vectors, ignoring their magnitudes.

Definition:

$$\text{cos_sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{u}| |\mathbf{v}|} = \cos(\theta)$$

This is equivalent to the dot product of the normalized (unit-length) versions of the vectors: $$\text{cos_sim}(\mathbf{u}, \mathbf{v}) = \hat{\mathbf{u}} \cdot \hat{\mathbf{v}}$$

Range and interpretation:

1: Vectors point in exactly the same direction (identical orientation)
0: Vectors are orthogonal (perpendicular, no correlation)
-1: Vectors point in exactly opposite directions

Why ignore magnitude?

In many ML contexts, we care about the direction of a vector more than its length. For example:

In text analysis, a document with 100 words and one with 1000 words might have similar topics—they differ in length but not meaning
In recommendation systems, users who rate on different scales (1-10 vs 1-5) should still be compared by preference patterns, not rating magnitude

Cosine Similarity in NLP

Word embeddings (like Word2Vec) encode semantic meaning in vector direction. The famous example 'king - man + woman ≈ queen' works because these concepts are encoded as directions. Cosine similarity between word vectors often correlates with human judgments of semantic similarity.

Cosine similarity vs. Euclidean distance:

Both measure vector relationships, but differently:

Cosine Similarity vs. Euclidean Distance
Aspect	Cosine Similarity	Euclidean Distance
Measures	Orientation/direction	Absolute distance in space
Range	[-1, 1]	[0, ∞)
High value means	Similar direction	Far apart
Sensitive to magnitude	No	Yes
Best for	Comparing directions, text, embeddings	Comparing points in space

cosine_similarity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity as sklearn_cosine
 
def cosine_similarity(u, v):
    """Compute cosine similarity between two vectors."""
    dot_product = np.dot(u, v)
    norm_u = np.linalg.norm(u)
    norm_v = np.linalg.norm(v)
    return dot_product / (norm_u * norm_v)
 
# Example: Similar direction, different magnitudes
v1 = np.array([1, 2, 3])
v2 = np.array([2, 4, 6])  # Same direction, 2x magnitude
 
print(f"v1 = {v1}")
print(f"v2 = {v2}")
print(f"Cosine similarity: {cosine_similarity(v1, v2):.4f}")  # 1.0 (identical direction)
print(f"Euclidean distance: {np.linalg.norm(v1 - v2):.4f}")  # Non-zero!
 
# Orthogonal vectors
v3 = np.array([1, 0, 0])
v4 = np.array([0, 1, 0])
print(f"\nOrthogonal vectors:")
print(f"Cosine similarity: {cosine_similarity(v3, v4):.4f}")  # 0
 
# Opposite vectors
v5 = np.array([1, 2, 3])
v6 = np.array([-1, -2, -3])
print(f"\nOpposite vectors:")
print(f"Cosine similarity: {cosine_similarity(v5, v6):.4f}")  # -1
 
# Practical NLP example: simple word vectors
# (In reality, these would be learned embeddings)
word_vectors = {
    'king': np.array([0.2, 0.5, 0.1, 0.8, 0.3]),
    'queen': np.array([0.3, 0.5, 0.2, 0.7, 0.4]),
    'man': np.array([0.1, 0.2, 0.0, 0.9, 0.1]),
    'woman': np.array([0.2, 0.2, 0.1, 0.8, 0.2]),
    'apple': np.array([0.8, 0.1, 0.9, 0.1, 0.7]),
}
 
print("\nWord similarity example:")
print(f"king-queen: {cosine_similarity(word_vectors['king'], word_vectors['queen']):.4f}")
print(f"king-man: {cosine_similarity(word_vectors['king'], word_vectors['man']):.4f}")
print(f"king-apple: {cosine_similarity(word_vectors['king'], word_vectors['apple']):.4f}")
 
# Cosine distance (for clustering algorithms)
def cosine_distance(u, v):
    return 1 - cosine_similarity(u, v)
 
print(f"\nCosine distance (king-queen): {cosine_distance(word_vectors['king'], word_vectors['queen']):.4f}")

Vector Projection

Projection finds the component of one vector in the direction of another. This operation is fundamental in machine learning for dimensionality reduction, regression, and understanding feature contributions.

Scalar projection:

The scalar projection of $\mathbf{u}$ onto $\mathbf{v}$ (also called the component of $\mathbf{u}$ along $\mathbf{v}$) is:

$$\text{comp}_{\mathbf{v}}\mathbf{u} = |\mathbf{u}| \cos(\theta) = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{v}|}$$

This is a scalar representing how much of $\mathbf{u}$ lies in the direction of $\mathbf{v}$.

Vector projection:

The vector projection of $\mathbf{u}$ onto $\mathbf{v}$ is the actual vector:

$$\text{proj}_{\mathbf{v}}\mathbf{u} = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{v}|^2} \mathbf{v} = \frac{\mathbf{u} \cdot \mathbf{v}}{\mathbf{v} \cdot \mathbf{v}} \mathbf{v}$$

This is the vector that:

Lies in the direction of $\mathbf{v}$
Is the closest point to $\mathbf{u}$ on the line through $\mathbf{v}$

Geometric Intuition

Imagine shining a light perpendicular to vector v. The shadow of vector u on the line containing v is the vector projection. This 'shadow' is the best approximation of u using only the direction of v.

The orthogonal component:

Any vector $\mathbf{u}$ can be decomposed into two parts:

Parallel component: The projection onto $\mathbf{v}$
Perpendicular component: What's left over, orthogonal to $\mathbf{v}$

$$\mathbf{u} = \text{proj}{\mathbf{v}}\mathbf{u} + \mathbf{u}{\perp}$$

where $\mathbf{u}{\perp} = \mathbf{u} - \text{proj}{\mathbf{v}}\mathbf{u}$ is perpendicular to $\mathbf{v}$.

Applications in ML:

Where Projections Appear in ML

•PCA (Principal Component Analysis): Projects data onto principal component directions
•Linear Regression: The least squares solution is a projection onto the column space
•Gram-Schmidt Orthogonalization: Builds orthonormal bases using projections
•Vector Rejection: The perpendicular component after projection
•Feature Attribution: How much each feature direction contributes to a prediction

vector_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
 
def scalar_projection(u, v):
    """Scalar projection of u onto v."""
    return np.dot(u, v) / np.linalg.norm(v)
 
def vector_projection(u, v):
    """Vector projection of u onto v."""
    return (np.dot(u, v) / np.dot(v, v)) * v
 
def orthogonal_component(u, v):
    """Component of u perpendicular to v."""
    return u - vector_projection(u, v)
 
# Example
u = np.array([3, 4])
v = np.array([1, 0])  # Unit vector along x-axis
 
print(f"u = {u}")
print(f"v = {v}")
print(f"\nScalar projection of u onto v: {scalar_projection(u, v)}")  # 3
print(f"Vector projection of u onto v: {vector_projection(u, v)}")  # [3, 0]
print(f"Orthogonal component: {orthogonal_component(u, v)}")  # [0, 4]
 
# Verify decomposition
proj = vector_projection(u, v)
perp = orthogonal_component(u, v)
print(f"\nReconstruction: proj + perp = {proj + perp}")
print(f"Original u: {u}")
print(f"Equal? {np.allclose(proj + perp, u)}")
 
# Verify orthogonality
print(f"\nperp · v = {np.dot(perp, v)}")  # Should be 0 (or very close)
 
# More complex example
u = np.array([1, 2, 3])
v = np.array([1, 1, 0])
 
proj_u_v = vector_projection(u, v)
perp_u_v = orthogonal_component(u, v)
 
print(f"\n3D Example:")
print(f"u = {u}")
print(f"v = {v}")
print(f"proj_v(u) = {proj_u_v}")
print(f"u_perp = {perp_u_v}")
print(f"Orthogonal check (should be ~0): {np.dot(perp_u_v, v):.2e}")

Element-wise (Hadamard) Operations

Beyond the core linear algebra operations, element-wise operations (also called Hadamard operations) are computationally essential in machine learning.

Element-wise product (Hadamard product):

The element-wise product multiplies corresponding components:

$$\mathbf{u} \odot \mathbf{v} = (u_1 v_1, u_2 v_2, \ldots, u_n v_n)$$

Unlike the dot product (which returns a scalar), the Hadamard product returns a vector of the same dimension.

Example: $$(2, 3, 4) \odot (5, 2, 3) = (10, 6, 12)$$

Hadamard Product in Deep Learning

The Hadamard product is crucial in neural networks. LSTM gates use element-wise multiplication to control information flow. Attention mechanisms multiply attention scores with values element-wise. Dropout uses element-wise multiplication with binary masks. Understanding this operation is essential for understanding modern architectures.

Other element-wise operations:

Any scalar function can be applied element-wise to vectors:

Common Element-wise Operations in ML
Operation	Notation	Formula	Use in ML
Multiplication	$\mathbf{u} \odot \mathbf{v}$	$(u_i \cdot v_i)$	Gating, masking, attention
Division	$\mathbf{u} \oslash \mathbf{v}$	$(u_i / v_i)$	Normalization
Exponentiation	$\exp(\mathbf{v})$	$(e^{v_i})$	Softmax computation
Logarithm	$\log(\mathbf{v})$	$(\log v_i)$	Log-likelihood
Power	$\mathbf{v}^p$	$(v_i^p)$	Polynomial features
Activation	$\sigma(\mathbf{v})$	$(\sigma(v_i))$	ReLU, sigmoid, tanh

Broadcasting:

In NumPy (and most ML frameworks), operations between vectors and scalars are broadcast—the scalar is implicitly expanded to match the vector's shape:

$$\mathbf{v} + c = (v_1 + c, v_2 + c, \ldots, v_n + c)$$

This is technically an element-wise operation with an "expanded" scalar vector $(c, c, \ldots, c)$.

elementwise_operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
 
u = np.array([2, 3, 4])
v = np.array([5, 2, 3])
 
# Element-wise multiplication (Hadamard product)
hadamard = u * v
print(f"u = {u}")
print(f"v = {v}")
print(f"u ⊙ v (element-wise mult) = {hadamard}")  # [10, 6, 12]
 
# Compare with dot product
print(f"u · v (dot product) = {np.dot(u, v)}")  # 28 (scalar)
 
# Element-wise division
print(f"u ⊘ v (element-wise div) = {u / v}")
 
# Common element-wise functions
x = np.array([1, 2, 3, 4])
print(f"\nx = {x}")
print(f"x^2 = {x ** 2}")
print(f"sqrt(x) = {np.sqrt(x)}")
print(f"exp(x) = {np.exp(x)}")
print(f"log(x) = {np.log(x)}")
 
# Activation functions (element-wise)
def relu(x):
    return np.maximum(x, 0)
 
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
 
z = np.array([-2, -1, 0, 1, 2])
print(f"\nz = {z}")
print(f"ReLU(z) = {relu(z)}")
print(f"sigmoid(z) = {sigmoid(z)}")
 
# Broadcasting: scalar with vector
print(f"\nBroadcasting:")
print(f"x + 10 = {x + 10}")  # 10 is broadcast to [10, 10, 10, 10]
print(f"x * 2 = {x * 2}")
 
# Practical: feature normalization
features = np.array([100, 50, 25])  # Different scales
mean = np.mean(features)
std = np.std(features)
normalized = (features - mean) / std
print(f"\nOriginal features: {features}")
print(f"Normalized: {normalized}")

Cross Product (3D Only)

The cross product is a operation specific to 3D vectors that produces a vector perpendicular to both inputs. While less common in ML than the dot product, it appears in computer graphics, robotics, and physics simulations.

Definition (3D only):

Given $\mathbf{u} = (u_1, u_2, u_3)$ and $\mathbf{v} = (v_1, v_2, v_3)$:

$$\mathbf{u} \times \mathbf{v} = \begin{bmatrix} u_2 v_3 - u_3 v_2 \ u_3 v_1 - u_1 v_3 \ u_1 v_2 - u_2 v_1 \end{bmatrix}$$

Key properties:

Result is a vector (unlike dot product, which is scalar)
Perpendicular to both inputs: $(\mathbf{u} \times \mathbf{v}) \cdot \mathbf{u} = 0$ and $(\mathbf{u} \times \mathbf{v}) \cdot \mathbf{v} = 0$
Anti-commutative: $\mathbf{u} \times \mathbf{v} = -(\mathbf{v} \times \mathbf{u})$
Magnitude: $|\mathbf{u} \times \mathbf{v}| = |\mathbf{u}| |\mathbf{v}| \sin(\theta)$

The magnitude equals the area of the parallelogram formed by the two vectors.

Cross Product in ML

The cross product is less common in pure ML but appears in 3D vision and graphics (computing surface normals), physics simulations (torque, angular momentum), robotics (orientation and rotation). In higher dimensions, the wedge product from geometric algebra generalizes this concept.

cross_product.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
 
# Cross product in 3D
u = np.array([1, 0, 0])  # x-axis
v = np.array([0, 1, 0])  # y-axis
 
cross = np.cross(u, v)
print(f"u = {u}")
print(f"v = {v}")
print(f"u × v = {cross}")  # [0, 0, 1] - z-axis!
 
# Verify perpendicularity
print(f"\n(u × v) · u = {np.dot(cross, u)}")  # 0
print(f"(u × v) · v = {np.dot(cross, v)}")  # 0
 
# Anti-commutativity
cross_vu = np.cross(v, u)
print(f"\nv × u = {cross_vu}")  # [0, 0, -1]
print(f"u × v = -(v × u)? {np.allclose(cross, -cross_vu)}")
 
# General example
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(f"\na = {a}")
print(f"b = {b}")
print(f"a × b = {np.cross(a, b)}")
 
# Magnitude = area of parallelogram
print(f"||a × b|| = {np.linalg.norm(np.cross(a, b)):.4f}")
 
# Parallel vectors have cross product = 0
parallel = np.array([2, 4, 6])  # 2 * a
print(f"\na × (2a) = {np.cross(a, parallel)}")  # [0, 0, 0]

Summary: The Language of Vector Computation

We've covered the fundamental operations that make vectors computational objects, not just data containers. These operations form the vocabulary of machine learning computation.

Key Takeaways

•Vector addition combines vectors component-wise; geometrically, it's sequential displacement (tip-to-tail)
•Scalar multiplication scales all components uniformly; it stretches/compresses and can reverse direction
•The dot product is the most important operation—it measures alignment and underlies similarity, neural layers, and projections
•Cosine similarity normalizes the dot product to measure direction similarity independent of magnitude
•Vector projection decomposes vectors into parallel and perpendicular components—fundamental for regression and dimensionality reduction
•Element-wise operations apply functions component-by-component—essential for activation functions, gating, and masking
•The cross product (3D only) produces perpendicular vectors—useful in graphics and physics

What's next:

With vectors and their operations understood, we're ready to explore linear combinations—how vectors can be combined to form new vectors, and the profound concepts of span and linear independence that determine what vectors can represent.

Page Complete

You now understand the fundamental operations on vectors: addition, scalar multiplication, dot products, cosine similarity, projections, and element-wise operations. These operations are the computational primitives of machine learning—every algorithm builds on them.

2 / 5

Loading learning content...

Machine LearningVectors and Vector Spaces

Vectors and Vector Spaces: The Language of Machine Learning

LevelBeginner

Duration90 mins

TopicVectors and Vector Spaces

2 / 5

Vector Operations

Computing with Vectors

This page establishes the fundamental vector operations, their properties, and their geometric interpretations.

What You Will Learn

Vector Addition

Vector addition combines two vectors of the same dimension into a single vector by adding corresponding components.

Algebraic definition:

Given two vectors $\mathbf{u} = (u_1, u_2, \ldots, u_n)$ and $\mathbf{v} = (v_1, v_2, \ldots, v_n)$, their sum is:

$$\mathbf{u} + \mathbf{v} = (u_1 + v_1, u_2 + v_2, \ldots, u_n + v_n)$$

For example: $$(3, -1, 4) + (2, 5, -1) = (3+2, -1+5, 4+(-1)) = (5, 4, 3)$$

Critical constraint: Vectors must have the same dimension to be added. Adding a 3D vector to a 2D vector is undefined.

Geometric interpretation:

Geometrically, vector addition follows the parallelogram rule (or equivalently, the tip-to-tail method):

Place both vectors at the same starting point (origin)
The sum is the diagonal of the parallelogram formed by the two vectors

Alternatively (tip-to-tail):

Draw vector $\mathbf{u}$ from the origin
Draw vector $\mathbf{v}$ starting from the tip of $\mathbf{u}$
The sum $\mathbf{u} + \mathbf{v}$ goes from the origin to the tip of $\mathbf{v}$

This corresponds to sequential displacement: first move by $\mathbf{u}$, then move by $\mathbf{v}$.

Addition in ML Context

Properties of vector addition:

Properties of Vector Addition
Property	Formula	Meaning
Commutativity	$\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$	Order doesn't matter
Associativity	$(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$	Grouping doesn't matter
Identity	$\mathbf{v} + \mathbf{0} = \mathbf{v}$	Adding zero changes nothing
Inverse	$\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$	Every vector has an opposite

Vector subtraction:

Subtraction is defined in terms of addition: $$\mathbf{u} - \mathbf{v} = \mathbf{u} + (-\mathbf{v})$$

where $-\mathbf{v} = (-v_1, -v_2, \ldots, -v_n)$ is the negation of $\mathbf{v}$.

vector_addition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
 
# Vector addition
u = np.array([3, -1, 4])
v = np.array([2, 5, -1])
 
# Addition is component-wise
w = u + v
print(f"u = {u}")
print(f"v = {v}")
print(f"u + v = {w}")  # [5, 4, 3]
 
# Properties demonstration
zero = np.zeros(3)
print(f"\nIdentity: u + 0 = {u + zero}")
print(f"Inverse: u + (-u) = {u + (-u)}")
print(f"Commutativity: u + v = v + u? {np.array_equal(u + v, v + u)}")
 
# Subtraction
diff = u - v
print(f"\nu - v = {diff}")  # [1, -6, 5]
 
# Geometric: distance between points
point_a = np.array([1, 2])
point_b = np.array([4, 6])
displacement = point_b - point_a
print(f"\nDisplacement from A to B: {displacement}")  # [3, 4]
print(f"Distance: {np.linalg.norm(displacement)}")  # 5.0
 
# Dimension mismatch (will raise error - commented for safety)
# u_2d = np.array([1, 2])
# u_3d = np.array([1, 2, 3])
# invalid = u_2d + u_3d  # ValueError: operands could not be broadcast

Scalar Multiplication

Scalar multiplication multiplies a vector by a scalar (a single number), scaling each component uniformly.

Algebraic definition:

Given a scalar $c \in \mathbb{R}$ and a vector $\mathbf{v} = (v_1, v_2, \ldots, v_n)$, their product is:

$$c\mathbf{v} = (cv_1, cv_2, \ldots, cv_n)$$

For example: $$3 \cdot (2, -1, 5) = (6, -3, 15)$$

Geometric interpretation:

Scalar multiplication scales the vector:

If $|c| > 1$: the vector is stretched (made longer)
If $|c| < 1$: the vector is compressed (made shorter)
If $c < 0$: the vector is reversed in direction
If $c = 0$: the result is the zero vector
If $c = 1$: the vector is unchanged

The direction is preserved for positive $c$ and reversed for negative $c$. The magnitude is multiplied by $|c|$: $$|c\mathbf{v}| = |c| \cdot |\mathbf{v}|$$

Scaling in ML

Properties of scalar multiplication:

Properties of Scalar Multiplication
Property	Formula	Meaning
Distributive (vectors)	$c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$	Scale distributes over addition
Distributive (scalars)	$(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$	Scalars add, then multiply
Associative	$c(d\mathbf{v}) = (cd)\mathbf{v}$	Multiply scalars, then scale once
Identity	$1 \cdot \mathbf{v} = \mathbf{v}$	Scaling by 1 changes nothing
Zero	$0 \cdot \mathbf{v} = \mathbf{0}$	Scaling by 0 gives zero vector

Combining addition and scalar multiplication:

With both operations, we can form linear combinations: $$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k$$

This is the most fundamental compound operation in linear algebra and the basis for nearly all computations in machine learning. We'll explore linear combinations deeply in the next section.

scalar_multiplication.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
 
# Scalar multiplication
v = np.array([2, -1, 5])
 
print(f"v = {v}")
print(f"3 * v = {3 * v}")      # [6, -3, 15]
print(f"0.5 * v = {0.5 * v}")  # [1, -0.5, 2.5]
print(f"-1 * v = {-1 * v}")    # [-2, 1, -5]
print(f"0 * v = {0 * v}")      # [0, 0, 0]
 
# Effect on magnitude
print(f"\n||v|| = {np.linalg.norm(v):.4f}")
print(f"||3v|| = {np.linalg.norm(3 * v):.4f}")  # 3 times larger
print(f"||0.5v|| = {np.linalg.norm(0.5 * v):.4f}")  # half
print(f"||-2v|| = {np.linalg.norm(-2 * v):.4f}")  # 2 times larger (not negative!)
 
# Normalization: scaling to unit length
v_normalized = v / np.linalg.norm(v)
print(f"\nNormalized v: {v_normalized}")
print(f"||normalized v|| = {np.linalg.norm(v_normalized):.10f}")
 
# Properties
u = np.array([1, 2, 3])
c, d = 2, 3
 
print(f"\nDistributive (vectors):")
print(f"c(u + v) = {c * (u + v)}")
print(f"cu + cv = {c * u + c * v}")
print(f"Equal? {np.allclose(c * (u + v), c * u + c * v)}")
 
print(f"\nDistributive (scalars):")
print(f"(c + d)v = {(c + d) * v}")
print(f"cv + dv = {c * v + d * v}")
print(f"Equal? {np.allclose((c + d) * v, c * v + d * v)}")

The Dot Product: The Most Important Operation

Algebraic definition:

Given two vectors $\mathbf{u} = (u_1, \ldots, u_n)$ and $\mathbf{v} = (v_1, \ldots, v_n)$ of the same dimension, their dot product is:

$$\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n$$

Crucially, the result is a scalar, not a vector.

For example: $$(3, 2, -1) \cdot (1, -2, 4) = 3(1) + 2(-2) + (-1)(4) = 3 - 4 - 4 = -5$$

Alternative notation:

The dot product is also written as:

$\langle \mathbf{u}, \mathbf{v} \rangle$ — Bracket notation (common in functional analysis)
$\mathbf{u}^\top \mathbf{v}$ — Matrix notation (transpose of row times column)

Dimension Must Match

Like addition, the dot product requires vectors of the same dimension. Computing the dot product of a 2D and 3D vector is undefined and will raise an error in NumPy.

Geometric interpretation:

The dot product has a beautiful geometric meaning:

$$\mathbf{u} \cdot \mathbf{v} = |\mathbf{u}| |\mathbf{v}| \cos(\theta)$$

where $\theta$ is the angle between the vectors.

This formula reveals that the dot product measures:

Alignment: How much the vectors point in the same direction
Magnitude influence: Longer vectors contribute more

Interpretation by sign:

$\mathbf{u} \cdot \mathbf{v} > 0$: Vectors point in similar directions (acute angle, $\theta < 90°$)
$\mathbf{u} \cdot \mathbf{v} = 0$: Vectors are orthogonal (perpendicular, $\theta = 90°$)
$\mathbf{u} \cdot \mathbf{v} < 0$: Vectors point in opposite directions (obtuse angle, $\theta > 90°$)

Why the Dot Product Matters in ML

•Similarity measurement: Cosine similarity (dot product of unit vectors) measures how similar two vectors are
•Neural network layers: Each neuron computes a dot product between weights and inputs
•Attention mechanisms: Query-key dot products determine attention scores in transformers
•Matrix multiplication: Can be viewed as rows dotted with columns
•Projections: The dot product enables projecting vectors onto subspaces
•Optimization: Gradients are checked for orthogonality using dot products

Properties of the dot product:

Properties of the Dot Product
Property	Formula	Meaning
Commutativity	$\mathbf{u} \cdot \mathbf{v} = \mathbf{v} \cdot \mathbf{u}$	Order doesn't matter
Distributivity	$\mathbf{u} \cdot (\mathbf{v} + \mathbf{w}) = \mathbf{u} \cdot \mathbf{v} + \mathbf{u} \cdot \mathbf{w}$	Distributes over addition
Scalar factoring	$(c\mathbf{u}) \cdot \mathbf{v} = c(\mathbf{u} \cdot \mathbf{v})$	Scalars factor out
Self-dot product	$\mathbf{v} \cdot \mathbf{v} = \|\mathbf{v}\|^2$	Dot with self gives squared magnitude
Zero with zero vector	$\mathbf{v} \cdot \mathbf{0} = 0$	Dotting with zero gives zero

dot_product.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import numpy as np
 
# Dot product computation
u = np.array([3, 2, -1])
v = np.array([1, -2, 4])
 
# Method 1: np.dot
dot1 = np.dot(u, v)
print(f"u = {u}")
print(f"v = {v}")
print(f"u · v (np.dot) = {dot1}")  # -5
 
# Method 2: @ operator (matrix multiplication)
dot2 = u @ v
print(f"u · v (@ operator) = {dot2}")  # -5
 
# Method 3: Explicit sum
dot3 = np.sum(u * v)  # Element-wise multiply, then sum
print(f"u · v (sum of products) = {dot3}")  # -5
 
# Relationship to magnitude
print(f"\nv · v = {np.dot(v, v)}")
print(f"||v||² = {np.linalg.norm(v)**2}")
print(f"Equal? {np.isclose(np.dot(v, v), np.linalg.norm(v)**2)}")
 
# Computing angle between vectors
def angle_between(u, v):
    """Compute angle between vectors in radians."""
    cos_theta = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
    # Clamp to handle numerical errors
    cos_theta = np.clip(cos_theta, -1, 1)
    return np.arccos(cos_theta)
 
a = np.array([1, 0])
b = np.array([0, 1])
c = np.array([1, 1])
 
print(f"\nAngle between [1,0] and [0,1]: {np.degrees(angle_between(a, b))}°")  # 90°
print(f"Angle between [1,0] and [1,1]: {np.degrees(angle_between(a, c)):.2f}°")  # 45°
 
# Orthogonality check
print(f"\na · b = {np.dot(a, b)}")  # 0 - orthogonal!
 
# Sign indicates direction relationship
same_dir = np.array([1, 2])
opposite_dir = np.array([-2, -4])
print(f"\nVectors pointing similarly: {np.dot(np.array([1, 1]), np.array([2, 3]))} (positive)")
print(f"Vectors pointing oppositely: {np.dot(same_dir, opposite_dir)} (negative)")

Cosine Similarity: Measuring Vector Alignment

Cosine similarity is one of the most important similarity measures in machine learning. It measures the cosine of the angle between two vectors, ignoring their magnitudes.

Definition:

$$\text{cos_sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{u}| |\mathbf{v}|} = \cos(\theta)$$

This is equivalent to the dot product of the normalized (unit-length) versions of the vectors: $$\text{cos_sim}(\mathbf{u}, \mathbf{v}) = \hat{\mathbf{u}} \cdot \hat{\mathbf{v}}$$

Range and interpretation:

1: Vectors point in exactly the same direction (identical orientation)
0: Vectors are orthogonal (perpendicular, no correlation)
-1: Vectors point in exactly opposite directions

Why ignore magnitude?

In many ML contexts, we care about the direction of a vector more than its length. For example:

In text analysis, a document with 100 words and one with 1000 words might have similar topics—they differ in length but not meaning
In recommendation systems, users who rate on different scales (1-10 vs 1-5) should still be compared by preference patterns, not rating magnitude

Cosine Similarity in NLP

Cosine similarity vs. Euclidean distance:

Both measure vector relationships, but differently:

Cosine Similarity vs. Euclidean Distance
Aspect	Cosine Similarity	Euclidean Distance
Measures	Orientation/direction	Absolute distance in space
Range	[-1, 1]	[0, ∞)
High value means	Similar direction	Far apart
Sensitive to magnitude	No	Yes
Best for	Comparing directions, text, embeddings	Comparing points in space

cosine_similarity.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity as sklearn_cosine
 
def cosine_similarity(u, v):
    """Compute cosine similarity between two vectors."""
    dot_product = np.dot(u, v)
    norm_u = np.linalg.norm(u)
    norm_v = np.linalg.norm(v)
    return dot_product / (norm_u * norm_v)
 
# Example: Similar direction, different magnitudes
v1 = np.array([1, 2, 3])
v2 = np.array([2, 4, 6])  # Same direction, 2x magnitude
 
print(f"v1 = {v1}")
print(f"v2 = {v2}")
print(f"Cosine similarity: {cosine_similarity(v1, v2):.4f}")  # 1.0 (identical direction)
print(f"Euclidean distance: {np.linalg.norm(v1 - v2):.4f}")  # Non-zero!
 
# Orthogonal vectors
v3 = np.array([1, 0, 0])
v4 = np.array([0, 1, 0])
print(f"\nOrthogonal vectors:")
print(f"Cosine similarity: {cosine_similarity(v3, v4):.4f}")  # 0
 
# Opposite vectors
v5 = np.array([1, 2, 3])
v6 = np.array([-1, -2, -3])
print(f"\nOpposite vectors:")
print(f"Cosine similarity: {cosine_similarity(v5, v6):.4f}")  # -1
 
# Practical NLP example: simple word vectors
# (In reality, these would be learned embeddings)
word_vectors = {
    'king': np.array([0.2, 0.5, 0.1, 0.8, 0.3]),
    'queen': np.array([0.3, 0.5, 0.2, 0.7, 0.4]),
    'man': np.array([0.1, 0.2, 0.0, 0.9, 0.1]),
    'woman': np.array([0.2, 0.2, 0.1, 0.8, 0.2]),
    'apple': np.array([0.8, 0.1, 0.9, 0.1, 0.7]),
}
 
print("\nWord similarity example:")
print(f"king-queen: {cosine_similarity(word_vectors['king'], word_vectors['queen']):.4f}")
print(f"king-man: {cosine_similarity(word_vectors['king'], word_vectors['man']):.4f}")
print(f"king-apple: {cosine_similarity(word_vectors['king'], word_vectors['apple']):.4f}")
 
# Cosine distance (for clustering algorithms)
def cosine_distance(u, v):
    return 1 - cosine_similarity(u, v)
 
print(f"\nCosine distance (king-queen): {cosine_distance(word_vectors['king'], word_vectors['queen']):.4f}")

Vector Projection

Scalar projection:

The scalar projection of $\mathbf{u}$ onto $\mathbf{v}$ (also called the component of $\mathbf{u}$ along $\mathbf{v}$) is:

$$\text{comp}_{\mathbf{v}}\mathbf{u} = |\mathbf{u}| \cos(\theta) = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{v}|}$$

This is a scalar representing how much of $\mathbf{u}$ lies in the direction of $\mathbf{v}$.

Vector projection:

The vector projection of $\mathbf{u}$ onto $\mathbf{v}$ is the actual vector:

$$\text{proj}_{\mathbf{v}}\mathbf{u} = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{v}|^2} \mathbf{v} = \frac{\mathbf{u} \cdot \mathbf{v}}{\mathbf{v} \cdot \mathbf{v}} \mathbf{v}$$

This is the vector that:

Lies in the direction of $\mathbf{v}$
Is the closest point to $\mathbf{u}$ on the line through $\mathbf{v}$

Geometric Intuition

The orthogonal component:

Any vector $\mathbf{u}$ can be decomposed into two parts:

Parallel component: The projection onto $\mathbf{v}$
Perpendicular component: What's left over, orthogonal to $\mathbf{v}$

$$\mathbf{u} = \text{proj}{\mathbf{v}}\mathbf{u} + \mathbf{u}{\perp}$$

where $\mathbf{u}{\perp} = \mathbf{u} - \text{proj}{\mathbf{v}}\mathbf{u}$ is perpendicular to $\mathbf{v}$.

Applications in ML:

Where Projections Appear in ML

•PCA (Principal Component Analysis): Projects data onto principal component directions
•Linear Regression: The least squares solution is a projection onto the column space
•Gram-Schmidt Orthogonalization: Builds orthonormal bases using projections
•Vector Rejection: The perpendicular component after projection
•Feature Attribution: How much each feature direction contributes to a prediction

vector_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
 
def scalar_projection(u, v):
    """Scalar projection of u onto v."""
    return np.dot(u, v) / np.linalg.norm(v)
 
def vector_projection(u, v):
    """Vector projection of u onto v."""
    return (np.dot(u, v) / np.dot(v, v)) * v
 
def orthogonal_component(u, v):
    """Component of u perpendicular to v."""
    return u - vector_projection(u, v)
 
# Example
u = np.array([3, 4])
v = np.array([1, 0])  # Unit vector along x-axis
 
print(f"u = {u}")
print(f"v = {v}")
print(f"\nScalar projection of u onto v: {scalar_projection(u, v)}")  # 3
print(f"Vector projection of u onto v: {vector_projection(u, v)}")  # [3, 0]
print(f"Orthogonal component: {orthogonal_component(u, v)}")  # [0, 4]
 
# Verify decomposition
proj = vector_projection(u, v)
perp = orthogonal_component(u, v)
print(f"\nReconstruction: proj + perp = {proj + perp}")
print(f"Original u: {u}")
print(f"Equal? {np.allclose(proj + perp, u)}")
 
# Verify orthogonality
print(f"\nperp · v = {np.dot(perp, v)}")  # Should be 0 (or very close)
 
# More complex example
u = np.array([1, 2, 3])
v = np.array([1, 1, 0])
 
proj_u_v = vector_projection(u, v)
perp_u_v = orthogonal_component(u, v)
 
print(f"\n3D Example:")
print(f"u = {u}")
print(f"v = {v}")
print(f"proj_v(u) = {proj_u_v}")
print(f"u_perp = {perp_u_v}")
print(f"Orthogonal check (should be ~0): {np.dot(perp_u_v, v):.2e}")

Element-wise (Hadamard) Operations

Beyond the core linear algebra operations, element-wise operations (also called Hadamard operations) are computationally essential in machine learning.

Element-wise product (Hadamard product):

The element-wise product multiplies corresponding components:

$$\mathbf{u} \odot \mathbf{v} = (u_1 v_1, u_2 v_2, \ldots, u_n v_n)$$

Unlike the dot product (which returns a scalar), the Hadamard product returns a vector of the same dimension.

Example: $$(2, 3, 4) \odot (5, 2, 3) = (10, 6, 12)$$

Hadamard Product in Deep Learning

Other element-wise operations:

Any scalar function can be applied element-wise to vectors:

Common Element-wise Operations in ML
Operation	Notation	Formula	Use in ML
Multiplication	$\mathbf{u} \odot \mathbf{v}$	$(u_i \cdot v_i)$	Gating, masking, attention
Division	$\mathbf{u} \oslash \mathbf{v}$	$(u_i / v_i)$	Normalization
Exponentiation	$\exp(\mathbf{v})$	$(e^{v_i})$	Softmax computation
Logarithm	$\log(\mathbf{v})$	$(\log v_i)$	Log-likelihood
Power	$\mathbf{v}^p$	$(v_i^p)$	Polynomial features
Activation	$\sigma(\mathbf{v})$	$(\sigma(v_i))$	ReLU, sigmoid, tanh

Broadcasting:

In NumPy (and most ML frameworks), operations between vectors and scalars are broadcast—the scalar is implicitly expanded to match the vector's shape:

$$\mathbf{v} + c = (v_1 + c, v_2 + c, \ldots, v_n + c)$$

This is technically an element-wise operation with an "expanded" scalar vector $(c, c, \ldots, c)$.

elementwise_operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
 
u = np.array([2, 3, 4])
v = np.array([5, 2, 3])
 
# Element-wise multiplication (Hadamard product)
hadamard = u * v
print(f"u = {u}")
print(f"v = {v}")
print(f"u ⊙ v (element-wise mult) = {hadamard}")  # [10, 6, 12]
 
# Compare with dot product
print(f"u · v (dot product) = {np.dot(u, v)}")  # 28 (scalar)
 
# Element-wise division
print(f"u ⊘ v (element-wise div) = {u / v}")
 
# Common element-wise functions
x = np.array([1, 2, 3, 4])
print(f"\nx = {x}")
print(f"x^2 = {x ** 2}")
print(f"sqrt(x) = {np.sqrt(x)}")
print(f"exp(x) = {np.exp(x)}")
print(f"log(x) = {np.log(x)}")
 
# Activation functions (element-wise)
def relu(x):
    return np.maximum(x, 0)
 
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
 
z = np.array([-2, -1, 0, 1, 2])
print(f"\nz = {z}")
print(f"ReLU(z) = {relu(z)}")
print(f"sigmoid(z) = {sigmoid(z)}")
 
# Broadcasting: scalar with vector
print(f"\nBroadcasting:")
print(f"x + 10 = {x + 10}")  # 10 is broadcast to [10, 10, 10, 10]
print(f"x * 2 = {x * 2}")
 
# Practical: feature normalization
features = np.array([100, 50, 25])  # Different scales
mean = np.mean(features)
std = np.std(features)
normalized = (features - mean) / std
print(f"\nOriginal features: {features}")
print(f"Normalized: {normalized}")

Cross Product (3D Only)

Definition (3D only):

Given $\mathbf{u} = (u_1, u_2, u_3)$ and $\mathbf{v} = (v_1, v_2, v_3)$:

$$\mathbf{u} \times \mathbf{v} = \begin{bmatrix} u_2 v_3 - u_3 v_2 \ u_3 v_1 - u_1 v_3 \ u_1 v_2 - u_2 v_1 \end{bmatrix}$$

Key properties:

Result is a vector (unlike dot product, which is scalar)
Perpendicular to both inputs: $(\mathbf{u} \times \mathbf{v}) \cdot \mathbf{u} = 0$ and $(\mathbf{u} \times \mathbf{v}) \cdot \mathbf{v} = 0$
Anti-commutative: $\mathbf{u} \times \mathbf{v} = -(\mathbf{v} \times \mathbf{u})$
Magnitude: $|\mathbf{u} \times \mathbf{v}| = |\mathbf{u}| |\mathbf{v}| \sin(\theta)$

The magnitude equals the area of the parallelogram formed by the two vectors.

Cross Product in ML

cross_product.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
 
# Cross product in 3D
u = np.array([1, 0, 0])  # x-axis
v = np.array([0, 1, 0])  # y-axis
 
cross = np.cross(u, v)
print(f"u = {u}")
print(f"v = {v}")
print(f"u × v = {cross}")  # [0, 0, 1] - z-axis!
 
# Verify perpendicularity
print(f"\n(u × v) · u = {np.dot(cross, u)}")  # 0
print(f"(u × v) · v = {np.dot(cross, v)}")  # 0
 
# Anti-commutativity
cross_vu = np.cross(v, u)
print(f"\nv × u = {cross_vu}")  # [0, 0, -1]
print(f"u × v = -(v × u)? {np.allclose(cross, -cross_vu)}")
 
# General example
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(f"\na = {a}")
print(f"b = {b}")
print(f"a × b = {np.cross(a, b)}")
 
# Magnitude = area of parallelogram
print(f"||a × b|| = {np.linalg.norm(np.cross(a, b)):.4f}")
 
# Parallel vectors have cross product = 0
parallel = np.array([2, 4, 6])  # 2 * a
print(f"\na × (2a) = {np.cross(a, parallel)}")  # [0, 0, 0]

Summary: The Language of Vector Computation

We've covered the fundamental operations that make vectors computational objects, not just data containers. These operations form the vocabulary of machine learning computation.

Key Takeaways

•Vector addition combines vectors component-wise; geometrically, it's sequential displacement (tip-to-tail)
•Scalar multiplication scales all components uniformly; it stretches/compresses and can reverse direction
•The dot product is the most important operation—it measures alignment and underlies similarity, neural layers, and projections
•Cosine similarity normalizes the dot product to measure direction similarity independent of magnitude
•Vector projection decomposes vectors into parallel and perpendicular components—fundamental for regression and dimensionality reduction
•Element-wise operations apply functions component-by-component—essential for activation functions, gating, and masking
•The cross product (3D only) produces perpendicular vectors—useful in graphics and physics

What's next:

Page Complete

2 / 5