Matrices And Linear Transformations - Learning Module

Loading content...

0/245

Matrices as Linear Transformations

Beyond Arrays of Numbers

When you first encounter matrices in mathematics, they often appear as rectangular grids of numbers—notation to be manipulated according to seemingly arbitrary rules. You learn to multiply matrices using the 'row-by-column' procedure without ever understanding why that procedure exists or what it means.\n\nThis superficial understanding is a barrier to mastering machine learning. In ML, matrices are everywhere: weight matrices in neural networks, covariance matrices in statistics, transformation matrices in data preprocessing, kernel matrices in support vector machines. Without geometric intuition, these are just symbols to shuffle around. With it, deep patterns emerge that make complex algorithms feel natural.\n\nThe central insight: A matrix is not a table of numbers. A matrix is a complete description of a linear transformation—a function that takes vectors as input and produces transformed vectors as output, preserving the essential structure of the space.

What You Will Master

By the end of this page, you will understand matrices as functions that transform space. You will see how every matrix encodes a specific geometric operation—stretching, rotating, reflecting, shearing, or projecting—and why this perspective is essential for comprehending machine learning algorithms at their deepest level.

Matrices as Functions

Let's begin with a mental reset. Forget everything you know about matrix mechanics. Instead, consider the simplest possible question: What does a matrix do?\n\nA matrix A is a function that takes a vector x as input and produces a new vector y as output:\n\n$$\mathbf{y} = A\mathbf{x}$$\n\nThat's it. Everything else—all the rules for matrix multiplication, the definitions of determinants and eigenvalues, the entire apparatus of linear algebra—follows from understanding this one idea deeply.\n\nThe function analogy:\n\nJust as the function $f(x) = 2x$ takes a number and doubles it, a matrix takes a vector and transforms it according to a specific rule. The difference is that vectors inhabit multi-dimensional space, so the transformation is richer—it can affect direction as well as magnitude.\n\nConsider a 2×2 matrix acting on 2D vectors:\n\n$$A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$$\n\n$$A\mathbf{x} = \begin{bmatrix} 2 \cdot 1 + 0 \cdot 1 \\ 0 \cdot 1 + 3 \cdot 1 \end{bmatrix} = \begin{bmatrix} 2 \\ 3 \end{bmatrix}$$\n\nThe input vector $(1, 1)$ becomes $(2, 3)$. The matrix stretched the x-component by factor 2 and the y-component by factor 3. This is scaling—one of the fundamental transformation types.

Why This Matters for ML

In a neural network, each layer performs exactly this operation: multiply input vector by weight matrix, producing output vector. The weight matrix IS the learned transformation. Understanding matrices as transformations means understanding what neural networks are actually doing—they're learning to warp input space until the problem becomes linearly separable.

What Makes a Transformation "Linear"

Not every function on vectors is a linear transformation. The term "linear" imposes two strict constraints that together define the essence of what matrices can represent.\n\nA function $T: \mathbb{R}^n \to \mathbb{R}^m$ is a linear transformation if and only if:\n\n1. Additivity (Preservation of Addition): \n $T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})$\n\n2. Homogeneity (Preservation of Scalar Multiplication): \n $T(c\mathbf{u}) = cT(\mathbf{u})$\n\nThese can be combined into a single condition:\n$$T(c_1\mathbf{u} + c_2\mathbf{v}) = c_1 T(\mathbf{u}) + c_2 T(\mathbf{v})$$\n\nThis says: the transformation of a linear combination equals the linear combination of the transformations. In prose: linear transformations preserve the structure of vector spaces.

Geometric Implications

These algebraic conditions have profound geometric consequences. Any linear transformation must: (1) map the origin to the origin, (2) map straight lines to straight lines (or points), (3) preserve parallelism—if lines are parallel before transformation, they remain parallel after. No curving, no bending, no translation.

Examples of linear transformations:\n- Rotation around the origin\n- Scaling (uniform or non-uniform)\n- Reflection across a line through the origin\n- Shearing\n- Projection onto a subspace\n\nExamples of NON-linear transformations:\n- Translation: $T(\mathbf{x}) = \mathbf{x} + \mathbf{b}$ — violates $T(\mathbf{0}) = \mathbf{0}$\n- Affine: $T(\mathbf{x}) = A\mathbf{x} + \mathbf{b}$ — linear part plus translation\n- Any function involving powers, products of components, etc.\n\nThe fundamental theorem (informal):\n\nEvery linear transformation between finite-dimensional vector spaces can be represented by a matrix, and every matrix represents a linear transformation. The matrix IS the transformation, encoded in a specific way.

Linear vs Non-Linear Transformations
Transformation	Formula	Linear?	Why
Scaling	$T(\mathbf{x}) = 2\mathbf{x}$	✅ Yes	Both properties satisfied
Rotation	$T(\mathbf{x}) = R\mathbf{x}$ (R = rotation matrix)	✅ Yes	Preserves structure
Translation	$T(\mathbf{x}) = \mathbf{x} + (1, 2)$	❌ No	$T(\mathbf{0}) \neq \mathbf{0}$
Squaring	$T(x, y) = (x^2, y^2)$	❌ No	$T(2\mathbf{x}) \neq 2T(\mathbf{x})$
ReLU	$T(x) = \max(0, x)$	❌ No	Not additive

How Matrices Encode Transformations

Here's the key insight that connects algebra to geometry: the columns of a matrix tell you where the standard basis vectors go.\n\nIn 2D, the standard basis vectors are:\n$$\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad \mathbf{e}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$$\n\nThese are the unit vectors pointing along the x-axis and y-axis respectively.\n\nWhen you apply a matrix $A$ to these basis vectors:\n$$A\mathbf{e}_1 = \text{first column of } A$$\n$$A\mathbf{e}_2 = \text{second column of } A$$\n\nThis is profound. The entire behavior of the transformation is determined by what it does to the basis vectors. Since any vector can be written as a linear combination of basis vectors, and linear transformations preserve linear combinations, knowing what happens to the basis tells you what happens to everything.

Reading Transformations from ColumnsConsider the matrix representing a 90° counterclockwise rotation:

Input

$$R_{90} = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix}$$

Output

Column 1: $(0, 1)$ — where $\\mathbf{e}_1$ goes after rotation\nColumn 2: $(-1, 0)$ — where $\\mathbf{e}_2$ goes after rotation

Explanation

The first column tells us: a vector pointing right (1, 0) gets rotated to point up (0, 1). The second column tells us: a vector pointing up (0, 1) gets rotated to point left (-1, 0). This IS a 90° counterclockwise rotation—you can visualize it geometrically.

Constructing transformation matrices:\n\nThis principle works in reverse. To build a matrix for any linear transformation:\n\n1. Determine what the transformation does to each standard basis vector\n2. Place those destination vectors as columns of the matrix\n3. Done—you have the matrix representation\n\nExample: Reflection across the x-axis\n- $\mathbf{e}_1 = (1, 0)$ stays at $(1, 0)$ (on the axis, no change)\n- $\mathbf{e}_2 = (0, 1)$ goes to $(0, -1)$ (flipped below the axis)\n\n$$\text{Reflection matrix} = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}$$\n\nExample: Scaling by factor 3 in x, factor 2 in y\n- $\mathbf{e}_1 = (1, 0)$ goes to $(3, 0)$\n- $\mathbf{e}_2 = (0, 1)$ goes to $(0, 2)$\n\n$$\text{Scaling matrix} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}$$

The Visual Algorithm

Whenever you see a matrix, visualize a grid being transformed. The columns show where the grid lines emanating from the origin end up. This mental image makes matrix operations intuitive—you stop seeing arithmetic and start seeing geometry.

Fundamental Transformation Types

All 2D linear transformations fall into categories that combine these fundamental types. Understanding each type geometrically makes complex transformations decomposable and intuitive.

Scaling (Dilation/Contraction)\n\nScaling stretches or compresses space along the coordinate axes.\n\n$$S = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix}$$\n\n- $s_x > 1$: stretch horizontally\n- $0 < s_x < 1$: compress horizontally\n- $s_x < 0$: flip and scale horizontally\n- $s_x = s_y$: uniform scaling (preserves shape)\n\nML Applications:\n- Feature normalization (scaling inputs to similar ranges)\n- Weight initialization in neural networks\n- Covariance matrix diagonal represents per-feature variance

Composition of Transformations

Here's where the matrix perspective becomes powerful: composing transformations corresponds to multiplying matrices.\n\nIf transformation $A$ is applied first, then transformation $B$, the combined effect is:\n$$\mathbf{y} = B(A\mathbf{x}) = (BA)\mathbf{x}$$\n\nThe matrix $BA$ encapsulates both transformations in one. This is why matrix multiplication is defined the way it is—it's designed to make function composition work correctly.\n\nOrder matters:\n\nMatrix multiplication is NOT commutative: $AB \neq BA$ in general. This reflects the geometric reality that rotation then scaling differs from scaling then rotation.

Scale Then Rotate vs Rotate Then ScaleConsider scaling by 2 in x-direction and rotating 90°:

Input

$$S = \\begin{bmatrix} 2 & 0 \\\\ 0 & 1 \\end{bmatrix} \\quad R = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix}$$

Output

Scale then Rotate: $RS = \\begin{bmatrix} 0 & -1 \\\\ 2 & 0 \\end{bmatrix}$\nRotate then Scale: $SR = \\begin{bmatrix} 0 & -2 \\\\ 1 & 0 \\end{bmatrix}$

Explanation

The results are different! 'Scale then rotate' stretches horizontally first, then rotates. 'Rotate then scale' rotates first, so the horizontal stretch affects what was originally the y-direction. Always read matrix products right-to-left for the order of transformations.

Right-to-Left Reading

In the expression $ABC\mathbf{x}$, the transformations apply from right to left: first $C$, then $B$, then $A$. This is opposite to how we read English but matches how function composition works: $f(g(h(x)))$ means $h$ first, then $g$, then $f$.

The power of composition:\n\nComplex transformations can be decomposed into sequences of simple ones. The Singular Value Decomposition (SVD), which we'll study later, shows that ANY matrix can be written as:\n$$A = U \Sigma V^T$$\n\nThis means: any linear transformation is equivalent to a rotation ($V^T$), followed by a scaling ($\Sigma$), followed by another rotation ($U$). This decomposition is foundational for understanding everything from PCA to matrix compression to the numerics of neural network training.

Transformations That Change Dimension

So far, we've focused on square matrices that transform n-dimensional space to n-dimensional space. But matrices can also change the dimension of vectors.\n\nAn $m \times n$ matrix transforms n-dimensional vectors into m-dimensional vectors:\n$$A: \mathbb{R}^n \to \mathbb{R}^m$$\n\n$$A_{m \times n} \cdot \mathbf{x}{n \times 1} = \mathbf{y}{m \times 1}$$

Dimension-Changing Matrices in ML
Matrix Shape	Transformation Type	ML Example
$m > n$ (more rows)	Maps to higher dimension	Lifting to feature space, embedding layers
$m < n$ (fewer rows)	Maps to lower dimension	Dimensionality reduction, pooling, compression
$m = n$ (square)	Same dimension	Rotations, reflections in latent space

Dimension reduction example:\n\nA $2 \times 3$ matrix projects 3D vectors onto a 2D plane:\n\n$$A = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} 3 \\ 4 \\ 5 \end{bmatrix}$$\n\n$$A\mathbf{x} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$$\n\nThe z-component is discarded—this is projection onto the xy-plane.\n\nDimension increase example:\n\nA $3 \times 2$ matrix lifts 2D vectors into 3D:\n\n$$A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$$\n\n$$A\mathbf{x} = \begin{bmatrix} 3 \\ 4 \\ 0 \end{bmatrix}$$\n\nThe 2D vector is embedded in 3D space, lying in the xy-plane.

Neural Network Perspective

Each fully-connected layer in a neural network uses a weight matrix to transform the input dimension. A layer going from 784 inputs (28×28 image) to 256 hidden units uses a 256×784 matrix—reducing dimension. A layer going from 256 to 512 uses a 512×256 matrix—increasing dimension. The architecture literally defines a sequence of dimension-changing transformations.

Special Transformations: Identity and Zero

Two transformations deserve special attention for their role as 'boundary cases' in the space of all linear transformations.

The Identity Matrix

•Symbol: $I$ or $I_n$ for n×n identity
•Form: Ones on diagonal, zeros elsewhere
•Effect: $I\mathbf{x} = \mathbf{x}$ for all $\mathbf{x}$
•Geometric meaning: The 'do nothing' transformation
•Properties: $AI = IA = A$ for all conformable $A$
•ML role: Skip connections in ResNets add identity: $y = F(x) + x$

The Zero Matrix

•Symbol: $O$ or $0_{m×n}$
•Form: All entries are zero
•Effect: $O\mathbf{x} = \mathbf{0}$ for all $\mathbf{x}$
•Geometric meaning: Collapse everything to origin
•Properties: $A + O = A$, $AO = O$, $OA = O$
•ML role: Zero-initialized biases, gradient of constant

The spectrum between identity and zero:\n\nThink of all possible matrices as sitting on a spectrum. At one extreme is the identity—preserving everything. At the other extreme is the zero matrix—destroying everything. In between are the interesting transformations: rotations, scalings, projections, shears, and their combinations.\n\nEigendecomposition preview:\n\nWe'll later see that eigenvalues tell us where a matrix sits on this spectrum along different directions. Eigenvalue of 1 means that direction is preserved (identity-like). Eigenvalue of 0 means that direction is destroyed (zero-like). Eigenvalues between 0 and 1 mean contraction; greater than 1 mean expansion.

Summary: Seeing Matrices as Transformations

We've developed a geometric understanding of matrices that goes far beyond computational mechanics. Let's consolidate the key insights:

Key Takeaways

•A matrix is a function — It takes vectors in, transforms them according to a rule, and outputs new vectors. The numbers in the matrix encode this rule completely.
•Linear transformations preserve structure — They satisfy additivity and homogeneity, which geometrically means lines stay straight, parallel lines stay parallel, and the origin stays fixed.
•Columns reveal the transformation — The columns of a matrix show where the standard basis vectors go. This single insight makes any matrix instantly visualizable.
•All transformations decompose — Complex transformations are compositions of simple ones: scalings, rotations, reflections, shears, and projections.
•Dimension can change — Non-square matrices lift or project between dimensions, fundamental to feature extraction and dimensionality reduction in ML.
•This perspective is essential for ML — Neural network layers, PCA, SVD, least squares, and countless other ML tools are matrix operations. Understanding them geometrically is understanding ML deeply.

What's next:\n\nNow that we understand what matrices ARE conceptually, we'll develop fluency with the operations we can perform on them. The next page covers matrix arithmetic—addition, scalar multiplication, and the matrix product—with the geometric perspective always in mind.

Page Complete

You now understand matrices as linear transformations—functions that move, stretch, rotate, and reshape space while preserving its linear structure. This geometric foundation will make every subsequent matrix topic clearer and more intuitive. Next: matrix operations and the geometry of matrix multiplication.

Matrices as Linear Transformations

Beyond Arrays of Numbers

What You Will Master

Matrices as Functions

Why This Matters for ML

What Makes a Transformation "Linear"

Geometric Implications

Linear vs Non-Linear Transformations
Transformation	Formula	Linear?	Why
Scaling	$T(\mathbf{x}) = 2\mathbf{x}$	✅ Yes	Both properties satisfied
Rotation	$T(\mathbf{x}) = R\mathbf{x}$ (R = rotation matrix)	✅ Yes	Preserves structure
Translation	$T(\mathbf{x}) = \mathbf{x} + (1, 2)$	❌ No	$T(\mathbf{0}) \neq \mathbf{0}$
Squaring	$T(x, y) = (x^2, y^2)$	❌ No	$T(2\mathbf{x}) \neq 2T(\mathbf{x})$
ReLU	$T(x) = \max(0, x)$	❌ No	Not additive

How Matrices Encode Transformations

Reading Transformations from ColumnsConsider the matrix representing a 90° counterclockwise rotation:

Input

$$R_{90} = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix}$$

Output

Column 1: $(0, 1)$ — where $\\mathbf{e}_1$ goes after rotation\nColumn 2: $(-1, 0)$ — where $\\mathbf{e}_2$ goes after rotation

Explanation

The Visual Algorithm

Fundamental Transformation Types

All 2D linear transformations fall into categories that combine these fundamental types. Understanding each type geometrically makes complex transformations decomposable and intuitive.

Composition of Transformations

Scale Then Rotate vs Rotate Then ScaleConsider scaling by 2 in x-direction and rotating 90°:

Input

$$S = \\begin{bmatrix} 2 & 0 \\\\ 0 & 1 \\end{bmatrix} \\quad R = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix}$$

Output

Scale then Rotate: $RS = \\begin{bmatrix} 0 & -1 \\\\ 2 & 0 \\end{bmatrix}$\nRotate then Scale: $SR = \\begin{bmatrix} 0 & -2 \\\\ 1 & 0 \\end{bmatrix}$

Explanation

Right-to-Left Reading

Transformations That Change Dimension

Dimension-Changing Matrices in ML
Matrix Shape	Transformation Type	ML Example
$m > n$ (more rows)	Maps to higher dimension	Lifting to feature space, embedding layers
$m < n$ (fewer rows)	Maps to lower dimension	Dimensionality reduction, pooling, compression
$m = n$ (square)	Same dimension	Rotations, reflections in latent space

Neural Network Perspective

Special Transformations: Identity and Zero

Two transformations deserve special attention for their role as 'boundary cases' in the space of all linear transformations.

The Identity Matrix

•Symbol: $I$ or $I_n$ for n×n identity
•Form: Ones on diagonal, zeros elsewhere
•Effect: $I\mathbf{x} = \mathbf{x}$ for all $\mathbf{x}$
•Geometric meaning: The 'do nothing' transformation
•Properties: $AI = IA = A$ for all conformable $A$
•ML role: Skip connections in ResNets add identity: $y = F(x) + x$

The Zero Matrix

•Symbol: $O$ or $0_{m×n}$
•Form: All entries are zero
•Effect: $O\mathbf{x} = \mathbf{0}$ for all $\mathbf{x}$
•Geometric meaning: Collapse everything to origin
•Properties: $A + O = A$, $AO = O$, $OA = O$
•ML role: Zero-initialized biases, gradient of constant

Summary: Seeing Matrices as Transformations

We've developed a geometric understanding of matrices that goes far beyond computational mechanics. Let's consolidate the key insights:

Key Takeaways

•A matrix is a function — It takes vectors in, transforms them according to a rule, and outputs new vectors. The numbers in the matrix encode this rule completely.
•Linear transformations preserve structure — They satisfy additivity and homogeneity, which geometrically means lines stay straight, parallel lines stay parallel, and the origin stays fixed.
•Columns reveal the transformation — The columns of a matrix show where the standard basis vectors go. This single insight makes any matrix instantly visualizable.
•All transformations decompose — Complex transformations are compositions of simple ones: scalings, rotations, reflections, shears, and projections.
•Dimension can change — Non-square matrices lift or project between dimensions, fundamental to feature extraction and dimensionality reduction in ML.
•This perspective is essential for ML — Neural network layers, PCA, SVD, least squares, and countless other ML tools are matrix operations. Understanding them geometrically is understanding ML deeply.

Page Complete