Loading learning content...
Vectors become powerful not just through their representation of data, but through the operations we can perform on them. These operations—addition, scaling, and the dot product—are the computational primitives underlying nearly every machine learning algorithm.
When a neural network combines inputs, it uses vector addition. When gradient descent updates weights, it uses scalar multiplication. When we measure similarity between embeddings, we use dot products. Understanding these operations deeply—both algebraically and geometrically—is essential for understanding how machine learning actually works.
This page establishes the fundamental vector operations, their properties, and their geometric interpretations.
By the end of this page, you will master vector addition and subtraction with geometric intuition, understand scalar multiplication and its effect on vectors, deeply understand the dot product—the most important operation in ML, connect these operations to machine learning computations, and implement all operations efficiently in NumPy.
Vector addition combines two vectors of the same dimension into a single vector by adding corresponding components.
Algebraic definition:
Given two vectors $\mathbf{u} = (u_1, u_2, \ldots, u_n)$ and $\mathbf{v} = (v_1, v_2, \ldots, v_n)$, their sum is:
$$\mathbf{u} + \mathbf{v} = (u_1 + v_1, u_2 + v_2, \ldots, u_n + v_n)$$
For example: $$(3, -1, 4) + (2, 5, -1) = (3+2, -1+5, 4+(-1)) = (5, 4, 3)$$
Critical constraint: Vectors must have the same dimension to be added. Adding a 3D vector to a 2D vector is undefined.
Geometric interpretation:
Geometrically, vector addition follows the parallelogram rule (or equivalently, the tip-to-tail method):
Alternatively (tip-to-tail):
This corresponds to sequential displacement: first move by $\mathbf{u}$, then move by $\mathbf{v}$.
In neural networks, addition combines information from different sources. When we add a bias vector to a weighted sum, or when residual connections add skip connections, we're using vector addition to merge representations. Understanding addition geometrically helps visualize how neural networks combine information.
Properties of vector addition:
| Property | Formula | Meaning |
|---|---|---|
| Commutativity | $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ | Order doesn't matter |
| Associativity | $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ | Grouping doesn't matter |
| Identity | $\mathbf{v} + \mathbf{0} = \mathbf{v}$ | Adding zero changes nothing |
| Inverse | $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$ | Every vector has an opposite |
Vector subtraction:
Subtraction is defined in terms of addition: $$\mathbf{u} - \mathbf{v} = \mathbf{u} + (-\mathbf{v})$$
where $-\mathbf{v} = (-v_1, -v_2, \ldots, -v_n)$ is the negation of $\mathbf{v}$.
Geometrically, $\mathbf{u} - \mathbf{v}$ is the vector from $\mathbf{v}$ to $\mathbf{u}$—the displacement needed to go from the point represented by $\mathbf{v}$ to the point represented by $\mathbf{u}$.
123456789101112131415161718192021222324252627282930313233
import numpy as np # Vector additionu = np.array([3, -1, 4])v = np.array([2, 5, -1]) # Addition is component-wisew = u + vprint(f"u = {u}")print(f"v = {v}")print(f"u + v = {w}") # [5, 4, 3] # Properties demonstrationzero = np.zeros(3)print(f"\nIdentity: u + 0 = {u + zero}")print(f"Inverse: u + (-u) = {u + (-u)}")print(f"Commutativity: u + v = v + u? {np.array_equal(u + v, v + u)}") # Subtractiondiff = u - vprint(f"\nu - v = {diff}") # [1, -6, 5] # Geometric: distance between pointspoint_a = np.array([1, 2])point_b = np.array([4, 6])displacement = point_b - point_aprint(f"\nDisplacement from A to B: {displacement}") # [3, 4]print(f"Distance: {np.linalg.norm(displacement)}") # 5.0 # Dimension mismatch (will raise error - commented for safety)# u_2d = np.array([1, 2])# u_3d = np.array([1, 2, 3])# invalid = u_2d + u_3d # ValueError: operands could not be broadcastScalar multiplication multiplies a vector by a scalar (a single number), scaling each component uniformly.
Algebraic definition:
Given a scalar $c \in \mathbb{R}$ and a vector $\mathbf{v} = (v_1, v_2, \ldots, v_n)$, their product is:
$$c\mathbf{v} = (cv_1, cv_2, \ldots, cv_n)$$
For example: $$3 \cdot (2, -1, 5) = (6, -3, 15)$$
Geometric interpretation:
Scalar multiplication scales the vector:
The direction is preserved for positive $c$ and reversed for negative $c$. The magnitude is multiplied by $|c|$: $$|c\mathbf{v}| = |c| \cdot |\mathbf{v}|$$
Scalar multiplication is ubiquitous in ML. Learning rates scale gradients. Regularization coefficients scale penalty terms. Attention scores scale value vectors. Batch normalization scales activations. Every weight in a neural network performs scalar multiplication on individual components.
Properties of scalar multiplication:
| Property | Formula | Meaning |
|---|---|---|
| Distributive (vectors) | $c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$ | Scale distributes over addition |
| Distributive (scalars) | $(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$ | Scalars add, then multiply |
| Associative | $c(d\mathbf{v}) = (cd)\mathbf{v}$ | Multiply scalars, then scale once |
| Identity | $1 \cdot \mathbf{v} = \mathbf{v}$ | Scaling by 1 changes nothing |
| Zero | $0 \cdot \mathbf{v} = \mathbf{0}$ | Scaling by 0 gives zero vector |
Combining addition and scalar multiplication:
With both operations, we can form linear combinations: $$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k$$
This is the most fundamental compound operation in linear algebra and the basis for nearly all computations in machine learning. We'll explore linear combinations deeply in the next section.
1234567891011121314151617181920212223242526272829303132333435
import numpy as np # Scalar multiplicationv = np.array([2, -1, 5]) print(f"v = {v}")print(f"3 * v = {3 * v}") # [6, -3, 15]print(f"0.5 * v = {0.5 * v}") # [1, -0.5, 2.5]print(f"-1 * v = {-1 * v}") # [-2, 1, -5]print(f"0 * v = {0 * v}") # [0, 0, 0] # Effect on magnitudeprint(f"\n||v|| = {np.linalg.norm(v):.4f}")print(f"||3v|| = {np.linalg.norm(3 * v):.4f}") # 3 times largerprint(f"||0.5v|| = {np.linalg.norm(0.5 * v):.4f}") # halfprint(f"||-2v|| = {np.linalg.norm(-2 * v):.4f}") # 2 times larger (not negative!) # Normalization: scaling to unit lengthv_normalized = v / np.linalg.norm(v)print(f"\nNormalized v: {v_normalized}")print(f"||normalized v|| = {np.linalg.norm(v_normalized):.10f}") # Propertiesu = np.array([1, 2, 3])c, d = 2, 3 print(f"\nDistributive (vectors):")print(f"c(u + v) = {c * (u + v)}")print(f"cu + cv = {c * u + c * v}")print(f"Equal? {np.allclose(c * (u + v), c * u + c * v)}") print(f"\nDistributive (scalars):")print(f"(c + d)v = {(c + d) * v}")print(f"cv + dv = {c * v + d * v}")print(f"Equal? {np.allclose((c + d) * v, c * v + d * v)}")The dot product (also called the inner product or scalar product) is arguably the most important single operation in machine learning. It appears in matrix multiplication, neural network layers, attention mechanisms, similarity measures, and projections.
Algebraic definition:
Given two vectors $\mathbf{u} = (u_1, \ldots, u_n)$ and $\mathbf{v} = (v_1, \ldots, v_n)$ of the same dimension, their dot product is:
$$\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n$$
Crucially, the result is a scalar, not a vector.
For example: $$(3, 2, -1) \cdot (1, -2, 4) = 3(1) + 2(-2) + (-1)(4) = 3 - 4 - 4 = -5$$
Alternative notation:
The dot product is also written as:
Like addition, the dot product requires vectors of the same dimension. Computing the dot product of a 2D and 3D vector is undefined and will raise an error in NumPy.
Geometric interpretation:
The dot product has a beautiful geometric meaning:
$$\mathbf{u} \cdot \mathbf{v} = |\mathbf{u}| |\mathbf{v}| \cos(\theta)$$
where $\theta$ is the angle between the vectors.
This formula reveals that the dot product measures:
Interpretation by sign:
Properties of the dot product:
| Property | Formula | Meaning |
|---|---|---|
| Commutativity | $\mathbf{u} \cdot \mathbf{v} = \mathbf{v} \cdot \mathbf{u}$ | Order doesn't matter |
| Distributivity | $\mathbf{u} \cdot (\mathbf{v} + \mathbf{w}) = \mathbf{u} \cdot \mathbf{v} + \mathbf{u} \cdot \mathbf{w}$ | Distributes over addition |
| Scalar factoring | $(c\mathbf{u}) \cdot \mathbf{v} = c(\mathbf{u} \cdot \mathbf{v})$ | Scalars factor out |
| Self-dot product | $\mathbf{v} \cdot \mathbf{v} = |\mathbf{v}|^2$ | Dot with self gives squared magnitude |
| Zero with zero vector | $\mathbf{v} \cdot \mathbf{0} = 0$ | Dotting with zero gives zero |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
import numpy as np # Dot product computationu = np.array([3, 2, -1])v = np.array([1, -2, 4]) # Method 1: np.dotdot1 = np.dot(u, v)print(f"u = {u}")print(f"v = {v}")print(f"u · v (np.dot) = {dot1}") # -5 # Method 2: @ operator (matrix multiplication)dot2 = u @ vprint(f"u · v (@ operator) = {dot2}") # -5 # Method 3: Explicit sumdot3 = np.sum(u * v) # Element-wise multiply, then sumprint(f"u · v (sum of products) = {dot3}") # -5 # Relationship to magnitudeprint(f"\nv · v = {np.dot(v, v)}")print(f"||v||² = {np.linalg.norm(v)**2}")print(f"Equal? {np.isclose(np.dot(v, v), np.linalg.norm(v)**2)}") # Computing angle between vectorsdef angle_between(u, v): """Compute angle between vectors in radians.""" cos_theta = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v)) # Clamp to handle numerical errors cos_theta = np.clip(cos_theta, -1, 1) return np.arccos(cos_theta) a = np.array([1, 0])b = np.array([0, 1])c = np.array([1, 1]) print(f"\nAngle between [1,0] and [0,1]: {np.degrees(angle_between(a, b))}°") # 90°print(f"Angle between [1,0] and [1,1]: {np.degrees(angle_between(a, c)):.2f}°") # 45° # Orthogonality checkprint(f"\na · b = {np.dot(a, b)}") # 0 - orthogonal! # Sign indicates direction relationshipsame_dir = np.array([1, 2])opposite_dir = np.array([-2, -4])print(f"\nVectors pointing similarly: {np.dot(np.array([1, 1]), np.array([2, 3]))} (positive)")print(f"Vectors pointing oppositely: {np.dot(same_dir, opposite_dir)} (negative)")Cosine similarity is one of the most important similarity measures in machine learning. It measures the cosine of the angle between two vectors, ignoring their magnitudes.
Definition:
$$\text{cos_sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{u}| |\mathbf{v}|} = \cos(\theta)$$
This is equivalent to the dot product of the normalized (unit-length) versions of the vectors: $$\text{cos_sim}(\mathbf{u}, \mathbf{v}) = \hat{\mathbf{u}} \cdot \hat{\mathbf{v}}$$
Range and interpretation:
Why ignore magnitude?
In many ML contexts, we care about the direction of a vector more than its length. For example:
Word embeddings (like Word2Vec) encode semantic meaning in vector direction. The famous example 'king - man + woman ≈ queen' works because these concepts are encoded as directions. Cosine similarity between word vectors often correlates with human judgments of semantic similarity.
Cosine similarity vs. Euclidean distance:
Both measure vector relationships, but differently:
| Aspect | Cosine Similarity | Euclidean Distance |
|---|---|---|
| Measures | Orientation/direction | Absolute distance in space |
| Range | [-1, 1] | [0, ∞) |
| High value means | Similar direction | Far apart |
| Sensitive to magnitude | No | Yes |
| Best for | Comparing directions, text, embeddings | Comparing points in space |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
import numpy as npfrom sklearn.metrics.pairwise import cosine_similarity as sklearn_cosine def cosine_similarity(u, v): """Compute cosine similarity between two vectors.""" dot_product = np.dot(u, v) norm_u = np.linalg.norm(u) norm_v = np.linalg.norm(v) return dot_product / (norm_u * norm_v) # Example: Similar direction, different magnitudesv1 = np.array([1, 2, 3])v2 = np.array([2, 4, 6]) # Same direction, 2x magnitude print(f"v1 = {v1}")print(f"v2 = {v2}")print(f"Cosine similarity: {cosine_similarity(v1, v2):.4f}") # 1.0 (identical direction)print(f"Euclidean distance: {np.linalg.norm(v1 - v2):.4f}") # Non-zero! # Orthogonal vectorsv3 = np.array([1, 0, 0])v4 = np.array([0, 1, 0])print(f"\nOrthogonal vectors:")print(f"Cosine similarity: {cosine_similarity(v3, v4):.4f}") # 0 # Opposite vectorsv5 = np.array([1, 2, 3])v6 = np.array([-1, -2, -3])print(f"\nOpposite vectors:")print(f"Cosine similarity: {cosine_similarity(v5, v6):.4f}") # -1 # Practical NLP example: simple word vectors# (In reality, these would be learned embeddings)word_vectors = { 'king': np.array([0.2, 0.5, 0.1, 0.8, 0.3]), 'queen': np.array([0.3, 0.5, 0.2, 0.7, 0.4]), 'man': np.array([0.1, 0.2, 0.0, 0.9, 0.1]), 'woman': np.array([0.2, 0.2, 0.1, 0.8, 0.2]), 'apple': np.array([0.8, 0.1, 0.9, 0.1, 0.7]),} print("\nWord similarity example:")print(f"king-queen: {cosine_similarity(word_vectors['king'], word_vectors['queen']):.4f}")print(f"king-man: {cosine_similarity(word_vectors['king'], word_vectors['man']):.4f}")print(f"king-apple: {cosine_similarity(word_vectors['king'], word_vectors['apple']):.4f}") # Cosine distance (for clustering algorithms)def cosine_distance(u, v): return 1 - cosine_similarity(u, v) print(f"\nCosine distance (king-queen): {cosine_distance(word_vectors['king'], word_vectors['queen']):.4f}")Projection finds the component of one vector in the direction of another. This operation is fundamental in machine learning for dimensionality reduction, regression, and understanding feature contributions.
Scalar projection:
The scalar projection of $\mathbf{u}$ onto $\mathbf{v}$ (also called the component of $\mathbf{u}$ along $\mathbf{v}$) is:
$$\text{comp}_{\mathbf{v}}\mathbf{u} = |\mathbf{u}| \cos(\theta) = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{v}|}$$
This is a scalar representing how much of $\mathbf{u}$ lies in the direction of $\mathbf{v}$.
Vector projection:
The vector projection of $\mathbf{u}$ onto $\mathbf{v}$ is the actual vector:
$$\text{proj}_{\mathbf{v}}\mathbf{u} = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{v}|^2} \mathbf{v} = \frac{\mathbf{u} \cdot \mathbf{v}}{\mathbf{v} \cdot \mathbf{v}} \mathbf{v}$$
This is the vector that:
Imagine shining a light perpendicular to vector v. The shadow of vector u on the line containing v is the vector projection. This 'shadow' is the best approximation of u using only the direction of v.
The orthogonal component:
Any vector $\mathbf{u}$ can be decomposed into two parts:
$$\mathbf{u} = \text{proj}{\mathbf{v}}\mathbf{u} + \mathbf{u}{\perp}$$
where $\mathbf{u}{\perp} = \mathbf{u} - \text{proj}{\mathbf{v}}\mathbf{u}$ is perpendicular to $\mathbf{v}$.
Applications in ML:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
import numpy as np def scalar_projection(u, v): """Scalar projection of u onto v.""" return np.dot(u, v) / np.linalg.norm(v) def vector_projection(u, v): """Vector projection of u onto v.""" return (np.dot(u, v) / np.dot(v, v)) * v def orthogonal_component(u, v): """Component of u perpendicular to v.""" return u - vector_projection(u, v) # Exampleu = np.array([3, 4])v = np.array([1, 0]) # Unit vector along x-axis print(f"u = {u}")print(f"v = {v}")print(f"\nScalar projection of u onto v: {scalar_projection(u, v)}") # 3print(f"Vector projection of u onto v: {vector_projection(u, v)}") # [3, 0]print(f"Orthogonal component: {orthogonal_component(u, v)}") # [0, 4] # Verify decompositionproj = vector_projection(u, v)perp = orthogonal_component(u, v)print(f"\nReconstruction: proj + perp = {proj + perp}")print(f"Original u: {u}")print(f"Equal? {np.allclose(proj + perp, u)}") # Verify orthogonalityprint(f"\nperp · v = {np.dot(perp, v)}") # Should be 0 (or very close) # More complex exampleu = np.array([1, 2, 3])v = np.array([1, 1, 0]) proj_u_v = vector_projection(u, v)perp_u_v = orthogonal_component(u, v) print(f"\n3D Example:")print(f"u = {u}")print(f"v = {v}")print(f"proj_v(u) = {proj_u_v}")print(f"u_perp = {perp_u_v}")print(f"Orthogonal check (should be ~0): {np.dot(perp_u_v, v):.2e}")Beyond the core linear algebra operations, element-wise operations (also called Hadamard operations) are computationally essential in machine learning.
Element-wise product (Hadamard product):
The element-wise product multiplies corresponding components:
$$\mathbf{u} \odot \mathbf{v} = (u_1 v_1, u_2 v_2, \ldots, u_n v_n)$$
Unlike the dot product (which returns a scalar), the Hadamard product returns a vector of the same dimension.
Example: $$(2, 3, 4) \odot (5, 2, 3) = (10, 6, 12)$$
The Hadamard product is crucial in neural networks. LSTM gates use element-wise multiplication to control information flow. Attention mechanisms multiply attention scores with values element-wise. Dropout uses element-wise multiplication with binary masks. Understanding this operation is essential for understanding modern architectures.
Other element-wise operations:
Any scalar function can be applied element-wise to vectors:
| Operation | Notation | Formula | Use in ML |
|---|---|---|---|
| Multiplication | $\mathbf{u} \odot \mathbf{v}$ | $(u_i \cdot v_i)$ | Gating, masking, attention |
| Division | $\mathbf{u} \oslash \mathbf{v}$ | $(u_i / v_i)$ | Normalization |
| Exponentiation | $\exp(\mathbf{v})$ | $(e^{v_i})$ | Softmax computation |
| Logarithm | $\log(\mathbf{v})$ | $(\log v_i)$ | Log-likelihood |
| Power | $\mathbf{v}^p$ | $(v_i^p)$ | Polynomial features |
| Activation | $\sigma(\mathbf{v})$ | $(\sigma(v_i))$ | ReLU, sigmoid, tanh |
Broadcasting:
In NumPy (and most ML frameworks), operations between vectors and scalars are broadcast—the scalar is implicitly expanded to match the vector's shape:
$$\mathbf{v} + c = (v_1 + c, v_2 + c, \ldots, v_n + c)$$
This is technically an element-wise operation with an "expanded" scalar vector $(c, c, \ldots, c)$.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import numpy as np u = np.array([2, 3, 4])v = np.array([5, 2, 3]) # Element-wise multiplication (Hadamard product)hadamard = u * vprint(f"u = {u}")print(f"v = {v}")print(f"u ⊙ v (element-wise mult) = {hadamard}") # [10, 6, 12] # Compare with dot productprint(f"u · v (dot product) = {np.dot(u, v)}") # 28 (scalar) # Element-wise divisionprint(f"u ⊘ v (element-wise div) = {u / v}") # Common element-wise functionsx = np.array([1, 2, 3, 4])print(f"\nx = {x}")print(f"x^2 = {x ** 2}")print(f"sqrt(x) = {np.sqrt(x)}")print(f"exp(x) = {np.exp(x)}")print(f"log(x) = {np.log(x)}") # Activation functions (element-wise)def relu(x): return np.maximum(x, 0) def sigmoid(x): return 1 / (1 + np.exp(-x)) z = np.array([-2, -1, 0, 1, 2])print(f"\nz = {z}")print(f"ReLU(z) = {relu(z)}")print(f"sigmoid(z) = {sigmoid(z)}") # Broadcasting: scalar with vectorprint(f"\nBroadcasting:")print(f"x + 10 = {x + 10}") # 10 is broadcast to [10, 10, 10, 10]print(f"x * 2 = {x * 2}") # Practical: feature normalizationfeatures = np.array([100, 50, 25]) # Different scalesmean = np.mean(features)std = np.std(features)normalized = (features - mean) / stdprint(f"\nOriginal features: {features}")print(f"Normalized: {normalized}")The cross product is a operation specific to 3D vectors that produces a vector perpendicular to both inputs. While less common in ML than the dot product, it appears in computer graphics, robotics, and physics simulations.
Definition (3D only):
Given $\mathbf{u} = (u_1, u_2, u_3)$ and $\mathbf{v} = (v_1, v_2, v_3)$:
$$\mathbf{u} \times \mathbf{v} = \begin{bmatrix} u_2 v_3 - u_3 v_2 \ u_3 v_1 - u_1 v_3 \ u_1 v_2 - u_2 v_1 \end{bmatrix}$$
Key properties:
The magnitude equals the area of the parallelogram formed by the two vectors.
The cross product is less common in pure ML but appears in 3D vision and graphics (computing surface normals), physics simulations (torque, angular momentum), robotics (orientation and rotation). In higher dimensions, the wedge product from geometric algebra generalizes this concept.
123456789101112131415161718192021222324252627282930313233
import numpy as np # Cross product in 3Du = np.array([1, 0, 0]) # x-axisv = np.array([0, 1, 0]) # y-axis cross = np.cross(u, v)print(f"u = {u}")print(f"v = {v}")print(f"u × v = {cross}") # [0, 0, 1] - z-axis! # Verify perpendicularityprint(f"\n(u × v) · u = {np.dot(cross, u)}") # 0print(f"(u × v) · v = {np.dot(cross, v)}") # 0 # Anti-commutativitycross_vu = np.cross(v, u)print(f"\nv × u = {cross_vu}") # [0, 0, -1]print(f"u × v = -(v × u)? {np.allclose(cross, -cross_vu)}") # General examplea = np.array([1, 2, 3])b = np.array([4, 5, 6])print(f"\na = {a}")print(f"b = {b}")print(f"a × b = {np.cross(a, b)}") # Magnitude = area of parallelogramprint(f"||a × b|| = {np.linalg.norm(np.cross(a, b)):.4f}") # Parallel vectors have cross product = 0parallel = np.array([2, 4, 6]) # 2 * aprint(f"\na × (2a) = {np.cross(a, parallel)}") # [0, 0, 0]We've covered the fundamental operations that make vectors computational objects, not just data containers. These operations form the vocabulary of machine learning computation.
What's next:
With vectors and their operations understood, we're ready to explore linear combinations—how vectors can be combined to form new vectors, and the profound concepts of span and linear independence that determine what vectors can represent.
You now understand the fundamental operations on vectors: addition, scalar multiplication, dot products, cosine similarity, projections, and element-wise operations. These operations are the computational primitives of machine learning—every algorithm builds on them.