Loading content...
Every machine learning algorithm, without exception, operates on vectors. When you feed an image to a neural network, it sees a vector of pixel values. When a recommendation system profiles your preferences, it represents you as a vector in taste-space. When a language model processes your text, it converts words into vectors of numerical embeddings.
Vectors are not merely a mathematical convenience—they are the universal representation through which machines perceive and process information. Understanding vectors deeply is therefore not optional for machine learning practitioners; it is the prerequisite for understanding anything else.
This page begins our journey into linear algebra by establishing what vectors are, how they're represented, and why they provide such a powerful abstraction for computation.
By the end of this page, you will understand vectors as ordered collections of numbers, master vector notation across different conventions, develop geometric intuition for vectors in 2D and 3D while building toward higher-dimensional reasoning, and recognize how vectors serve as the fundamental data representation in machine learning.
The word "vector" carries different meanings across physics, mathematics, and computer science. Let us establish a precise definition that serves our purposes in machine learning while acknowledging these perspectives.
The algebraic definition:
A vector is an ordered list of numbers. That's it. Deceptively simple, profoundly powerful.
Formally, an n-dimensional vector is an ordered n-tuple of real numbers:
$$\mathbf{v} = (v_1, v_2, \ldots, v_n)$$
where each $v_i \in \mathbb{R}$ is called a component or entry of the vector, and $n$ is the dimension or length of the vector.
The key word is ordered—the sequence matters. The vector $(1, 2, 3)$ is fundamentally different from $(3, 2, 1)$ or $(2, 1, 3)$, even though they contain the same numbers.
When representing an image as a vector, the first component might be the red value of the top-left pixel, the second component the green value, and so on. Shuffling these components would destroy the image—the spatial structure is encoded in the ordering. This is why vectors are ordered lists, not sets.
Multiple perspectives on vectors:
While we'll primarily work with the algebraic view (vectors as ordered lists), it helps to understand other perspectives that provide complementary intuition:
| Perspective | Views Vector As | Useful For |
|---|---|---|
| Algebraic (Computer Science) | Ordered list of numbers | Computation, implementation, indexing |
| Geometric (Physics) | Arrow with magnitude and direction | Visualization, intuition, transformations |
| Abstract (Mathematics) | Element of a vector space satisfying axioms | Generalization, proving properties, understanding structure |
All three perspectives describe the same mathematical objects—they simply emphasize different aspects. For machine learning, we move fluidly between them:
Developing comfort with all three perspectives is essential for mastering machine learning mathematics.
Mathematical notation for vectors varies across textbooks, papers, and programming contexts. Familiarity with all conventions is essential for reading machine learning literature fluently.
Common notational conventions:
Column vectors vs. row vectors:
A critical distinction in linear algebra is between column vectors and row vectors. By convention in machine learning, vectors are typically column vectors unless stated otherwise:
$$\mathbf{v} = \begin{bmatrix} v_1 \ v_2 \ \vdots \ v_n \end{bmatrix}$$
A row vector is the transpose of a column vector:
$$\mathbf{v}^\top = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix}$$
This distinction matters crucially when we multiply vectors and matrices. The transpose operation (denoted $\top$ or sometimes $T$) converts between them.
Throughout machine learning literature and this course, vectors are column vectors by default. When we write $\mathbf{x} \in \mathbb{R}^n$, we mean x is an n×1 column vector. This convention aligns with how matrix multiplication with vectors works and with how neural network operations are typically defined.
Component indexing:
Vector components are accessed by their index. Different conventions exist:
Mathematical notation uses 1-indexing, while Python and most ML frameworks use 0-indexing. Being comfortable converting between them is essential.
Notation for common vector spaces:
1234567891011121314151617181920212223242526272829303132
import numpy as np # Creating vectors in NumPy# By default, np.array creates 1D arrays which are treated as vectorsv = np.array([1, 2, 3])print(f"Vector v: {v}")print(f"Shape: {v.shape}") # (3,) - a 1D array with 3 elementsprint(f"Dimension (ndim): {v.ndim}") # 1 # Explicit column vector (2D array with 1 column)v_col = np.array([[1], [2], [3]])print(f"Column vector:{v_col}")print(f"Shape: {v_col.shape}") # (3, 1) # Explicit row vector (2D array with 1 row)v_row = np.array([[1, 2, 3]])print(f"Row vector: {v_row}")print(f"Shape: {v_row.shape}") # (1, 3) # Transpose between row and columnprint(f"Transpose of column vector: {v_col.T}")print(f"Shape after transpose: {v_col.T.shape}") # (1, 3) # Accessing components (0-indexed in Python)print(f"First component (v[0]): {v[0]}")print(f"Last component (v[-1]): {v[-1]}")print(f"Second component (v[1]): {v[1]}")While vectors are algebraically just lists of numbers, their geometric interpretation provides crucial intuition. This intuition, developed in 2D and 3D, extends naturally (if invisibly) to higher dimensions.
Vectors as arrows:
Geometrically, a vector $\mathbf{v} = (v_1, v_2)$ in $\mathbb{R}^2$ represents an arrow from the origin $(0, 0)$ to the point $(v_1, v_2)$. The vector has two properties:
Critically, what matters is the displacement—the relative movement—not the absolute starting position. The vector $(3, 2)$ represents "move 3 units right and 2 units up" regardless of where you start.
When we draw a vector starting at the origin, we call it a 'position vector'—it points to a specific location. But the same vector could be drawn starting anywhere and would still represent the same displacement. In pure linear algebra, we typically think of vectors as displacements (direction and magnitude), not tied to any particular starting point.
Visualizing in 2D and 3D:
Consider the vector $\mathbf{v} = (3, 2)$ in $\mathbb{R}^2$:
In $\mathbb{R}^3$, the vector $\mathbf{v} = (3, 2, 1)$ has an additional z-component:
Beyond 3D:
In machine learning, we routinely work with vectors in $\mathbb{R}^{1000}$ or $\mathbb{R}^{1,000,000}$. We cannot visualize these spaces, but the algebraic operations and properties carry over identically. This is the power of abstraction: intuition developed in 2D and 3D guides reasoning in any dimension.
| Property | 2D Formula | 3D Formula | n-D Formula |
|---|---|---|---|
| Components | $(v_1, v_2)$ | $(v_1, v_2, v_3)$ | $(v_1, v_2, \ldots, v_n)$ |
| Magnitude | $\sqrt{v_1^2 + v_2^2}$ | $\sqrt{v_1^2 + v_2^2 + v_3^2}$ | $\sqrt{\sum_{i=1}^{n} v_i^2}$ |
| Unit vector | $\mathbf{v} / |\mathbf{v}|$ | $\mathbf{v} / |\mathbf{v}|$ | $\mathbf{v} / |\mathbf{v}|$ |
The magnitude (length) of a vector:
The magnitude or norm of a vector, denoted $|\mathbf{v}|$, is its length calculated via the Pythagorean theorem:
$$|\mathbf{v}| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}$$
This is called the Euclidean norm or L2 norm (we'll explore other norms later).
Unit vectors:
A unit vector has magnitude exactly 1. Any non-zero vector can be normalized to a unit vector by dividing by its magnitude:
$$\hat{\mathbf{v}} = \frac{\mathbf{v}}{|\mathbf{v}|}$$
Unit vectors are fundamental for representing pure directions without magnitude information.
12345678910111213141516171819202122232425
import numpy as np # Vector in 2Dv_2d = np.array([3, 4]) # Magnitude (Euclidean norm)magnitude_2d = np.linalg.norm(v_2d)print(f"Vector: {v_2d}")print(f"Magnitude: {magnitude_2d}") # 5.0 (3-4-5 triangle!) # Manual calculationmagnitude_manual = np.sqrt(np.sum(v_2d ** 2))print(f"Magnitude (manual): {magnitude_manual}") # Unit vector (normalization)unit_v = v_2d / magnitude_2dprint(f"Unit vector: {unit_v}") # [0.6, 0.8]print(f"Unit vector magnitude: {np.linalg.norm(unit_v):.10f}") # 1.0 # Higher dimensional examplev_100d = np.random.randn(100) # Random 100-dimensional vectorprint(f"100D vector magnitude: {np.linalg.norm(v_100d):.4f}")unit_100d = v_100d / np.linalg.norm(v_100d)print(f"100D unit vector magnitude: {np.linalg.norm(unit_100d):.10f}")Certain vectors appear so frequently in linear algebra and machine learning that they have standard names and notation.
The zero vector:
The zero vector $\mathbf{0}$ has all components equal to zero:
$$\mathbf{0} = \begin{bmatrix} 0 \ 0 \ \vdots \ 0 \end{bmatrix}$$
The zero vector is the additive identity—adding it to any vector leaves that vector unchanged: $$\mathbf{v} + \mathbf{0} = \mathbf{v}$$
Geometrically, the zero vector has no direction (or, equivalently, points in every direction) and has zero magnitude.
Standard basis vectors (canonical vectors):
The standard basis vectors or canonical vectors are unit vectors pointing along each coordinate axis. In $\mathbb{R}^n$, the $i$-th standard basis vector $\mathbf{e}_i$ has a 1 in position $i$ and 0 everywhere else:
$$\mathbf{e}_1 = \begin{bmatrix} 1 \ 0 \ 0 \ \vdots \ 0 \end{bmatrix}, \quad \mathbf{e}_2 = \begin{bmatrix} 0 \ 1 \ 0 \ \vdots \ 0 \end{bmatrix}, \quad \ldots \quad \mathbf{e}_n = \begin{bmatrix} 0 \ 0 \ 0 \ \vdots \ 1 \end{bmatrix}$$
These vectors are called a 'basis' because every vector in ℝⁿ can be expressed as a combination of them. For example, (3, 2, 5) = 3e₁ + 2e₂ + 5e₃. This decomposition is unique and fundamental to linear algebra. We'll explore this deeply when we cover basis and span.
The ones vector:
The ones vector $\mathbf{1}$ (or $\mathbf{1}_n$) has all components equal to 1:
$$\mathbf{1} = \begin{bmatrix} 1 \ 1 \ \vdots \ 1 \end{bmatrix}$$
This vector appears frequently in statistics (for computing sums and means) and machine learning (as a bias term in linear models).
Notation summary:
| Vector | Symbol | Definition | Common Use |
|---|---|---|---|
| Zero vector | $\mathbf{0}$ or $\mathbf{0}_n$ | All components are 0 | Additive identity, origin |
| Standard basis | $\mathbf{e}_i$ | 1 in position i, 0 elsewhere | Coordinate extraction, basis |
| Ones vector | $\mathbf{1}$ or $\mathbf{1}_n$ | All components are 1 | Summing, averaging, bias terms |
12345678910111213141516171819202122232425262728293031323334353637383940
import numpy as np # Zero vectorzero_vec = np.zeros(5)print(f"Zero vector: {zero_vec}") # Ones vectorones_vec = np.ones(5)print(f"Ones vector: {ones_vec}") # Standard basis vectors in R^3e1 = np.array([1, 0, 0])e2 = np.array([0, 1, 0])e3 = np.array([0, 0, 1])print(f"Standard basis in R^3:")print(f"e1 = {e1}")print(f"e2 = {e2}")print(f"e3 = {e3}") # Creating standard basis vectors programmaticallydef standard_basis(n, i): """Create the i-th standard basis vector in R^n (0-indexed).""" e = np.zeros(n) e[i] = 1 return e # Example: standard basis for R^5print("Standard basis for R^5:")for i in range(5): print(f"e_{i+1} = {standard_basis(5, i)}") # Express a vector as sum of basis vectorsv = np.array([3, -2, 5])reconstructed = 3*e1 + (-2)*e2 + 5*e3print(f"Original: {v}")print(f"Reconstructed from basis: {reconstructed}")print(f"Equal: {np.allclose(v, reconstructed)}")The true power of vectors emerges when we recognize that almost any data can be encoded as a vector. This encoding is the bridge between the real world and machine learning algorithms.
The vectorization principle:
Machine learning requires numerical input. Since vectors are ordered collections of numbers, converting data to vectors (called vectorization or feature extraction) is the essential first step in any ML pipeline.
Different data types require different vectorization strategies:
| Data Type | Vector Representation | Dimension Example |
|---|---|---|
| Grayscale Image | Flatten pixel intensities into a vector | 28×28 image → $\mathbb{R}^{784}$ |
| Color Image | Flatten RGB channels into a vector | 32×32×3 image → $\mathbb{R}^{3072}$ |
| Text (Bag of Words) | Count of each vocabulary word | 10,000 word vocab → $\mathbb{R}^{10000}$ |
| Text (Embeddings) | Learned dense vector per word/sentence | Word2Vec → $\mathbb{R}^{300}$ |
| Audio | Spectrogram or waveform samples | 1 sec @ 44.1kHz → $\mathbb{R}^{44100}$ |
| Tabular Data | Each column becomes a component | 20 features → $\mathbb{R}^{20}$ |
| Categorical Data | One-hot encoding | 5 categories → $\mathbb{R}^{5}$ |
Notice how quickly dimensions grow—a tiny 32×32 color image becomes a vector in ℝ³⁰⁷². This 'curse of dimensionality' is one of the central challenges in machine learning, driving techniques like dimensionality reduction (PCA, autoencoders) which we'll study later.
Example: Image as a vector
Consider a tiny 3×3 grayscale image where each pixel has an intensity from 0 (black) to 255 (white):
[100, 150, 200]
[50, 255, 50]
[0, 100, 255]
Flattening row by row creates a 9-dimensional vector: $$\mathbf{x} = (100, 150, 200, 50, 255, 50, 0, 100, 255)$$
This vector lives in $\mathbb{R}^9$. Every possible 3×3 grayscale image corresponds to exactly one point in this 9-dimensional space.
Example: User profile as a vector
A streaming service might represent a user's preferences as: $$\mathbf{u} = (\text{action}, \text{comedy}, \text{drama}, \text{horror}, \text{scifi})$$
For example, $(0.8, 0.3, 0.5, 0.1, 0.9)$ indicates strong preference for action and sci-fi, moderate preference for drama, and low preference for horror.
Why this representation matters:
Once data is vectorized:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
import numpy as np # Example 1: Flattening a small imageimage_3x3 = np.array([ [100, 150, 200], [50, 255, 50], [0, 100, 255]])print("Original 3x3 image:")print(image_3x3) # Flatten to vectorimage_vector = image_3x3.flatten()print(f"Flattened vector: {image_vector}")print(f"Shape: {image_vector.shape}") # (9,) # Example 2: One-hot encoding for categorical datacategories = ['cat', 'dog', 'bird', 'fish', 'rabbit']category_to_index = {cat: i for i, cat in enumerate(categories)} def one_hot_encode(category, categories): """Convert category to one-hot vector.""" vec = np.zeros(len(categories)) vec[category_to_index[category]] = 1 return vec print("One-hot encodings:")for cat in categories: print(f"{cat}: {one_hot_encode(cat, categories)}") # Example 3: Simulated user preference vectoruser_profile = np.array([0.8, 0.3, 0.5, 0.1, 0.9])genres = ['action', 'comedy', 'drama', 'horror', 'scifi']print("User preference vector:")for genre, pref in zip(genres, user_profile): print(f" {genre}: {pref}") # Finding similar users via distanceuser1 = np.array([0.8, 0.3, 0.5, 0.1, 0.9])user2 = np.array([0.7, 0.4, 0.5, 0.2, 0.85]) # Similaruser3 = np.array([0.1, 0.9, 0.8, 0.0, 0.1]) # Different dist_1_2 = np.linalg.norm(user1 - user2)dist_1_3 = np.linalg.norm(user1 - user3)print(f"Distance user1 to user2: {dist_1_2:.4f}")print(f"Distance user1 to user3: {dist_1_3:.4f}")print(f"user2 is more similar to user1: {dist_1_2 < dist_1_3}")Understanding when two vectors are equal—and how to compare vectors—is fundamental.
Vector equality:
Two vectors $\mathbf{u}$ and $\mathbf{v}$ are equal if and only if:
Formally: $\mathbf{u} = \mathbf{v} \iff u_i = v_i \text{ for all } i$
This means:
Approximate equality (numerical computing):
In practice, floating-point computations introduce small errors. We rarely check exact equality; instead, we check if vectors are close enough:
$$|\mathbf{u} - \mathbf{v}| < \epsilon$$
where $\epsilon$ is a small tolerance (e.g., $10^{-10}$).
Due to floating-point arithmetic, two vectors that should mathematically be equal might differ by tiny amounts like 1e-16. Always use np.allclose() or np.isclose() for floating-point comparisons. Exact equality checks will lead to subtle, hard-to-debug failures.
Comparing vectors:
Vectors themselves are not inherently ordered—there's no universal "less than" relationship. The vector $(1, 3)$ is neither less than nor greater than $(2, 2)$.
However, we can compare vectors by:
123456789101112131415161718192021222324252627282930313233
import numpy as np # Exact equality (integers - safe to compare)a = np.array([1, 2, 3])b = np.array([1, 2, 3])c = np.array([1, 3, 2]) print(f"a == b element-wise: {a == b}")print(f"a equals b (all): {np.array_equal(a, b)}") # Trueprint(f"a equals c (all): {np.array_equal(a, c)}") # False # Floating point gotchax = np.array([0.1 + 0.2, 0.3])y = np.array([0.3, 0.3])print(f"0.1 + 0.2 = {0.1 + 0.2}") # Not exactly 0.3!print(f"x == y: {x == y}") # May show False for first component!print(f"x equals y (exact): {np.array_equal(x, y)}") # False! # Correct approach: use tolerancesprint(f"x approximately equals y: {np.allclose(x, y)}") # True # Custom toleranceprint(f"Close with atol=1e-15: {np.allclose(x, y, atol=1e-15)}") # Component-wise comparisonu = np.array([1, 2, 3])v = np.array([2, 2, 4])print(f"u = {u}")print(f"v = {v}")print(f"u <= v element-wise: {u <= v}")print(f"u <= v (all components): {np.all(u <= v)}")Before we proceed to vector operations (next page), let's establish the fundamental properties that make vectors so useful mathematically.
Vectors form a mathematical structure:
The set of all n-dimensional real vectors, $\mathbb{R}^n$, together with addition and scalar multiplication (which we'll define next page), forms a vector space. This structure guarantees certain properties that enable all of linear algebra.
Key properties (preview):
These eight properties define what it means to be a vector space. Any set that satisfies these axioms (with appropriate operations) IS a vector space—even if the 'vectors' aren't traditional arrays of numbers. Functions, matrices, and even quantum states can form vector spaces. This abstraction is what makes linear algebra universally applicable.
We'll explore these axioms rigorously when we cover vector spaces formally. For now, recognize that these properties are not arbitrary—they capture the essential structure that makes vectors useful for representing and manipulating data.
We've established the foundational concept of vectors—the building blocks of all machine learning mathematics. Let's consolidate what we've learned:
What's next:
Now that we understand what vectors are, we'll explore what we can do with them. The next page covers vector operations: addition, scalar multiplication, dot products, and more—the computational tools that make vectors powerful for machine learning.
You now understand vectors as ordered lists of numbers, can work with different notations, have geometric intuition for vectors, and recognize how vectors encode data in machine learning. This foundation prepares you for everything that follows in linear algebra and machine learning.