Matrices And Linear Transformations - Learning Module

Loading content...

0/278

Matrix Operations

The Arithmetic of Transformations

Having established that matrices represent linear transformations, we now turn to the operations we can perform on them. These aren't arbitrary rules invented for computational convenience—they're the natural algebraic expressions of geometric operations.

The key principle: Every matrix operation has a geometric meaning. Addition blends transformations. Scalar multiplication strengthens or weakens them. Matrix multiplication composes them. The transpose reflects across a symmetry axis. Understanding these connections makes matrix algebra feel inevitable rather than arbitrary.

What You Will Master

By the end of this page, you will perform matrix operations fluently while understanding their geometric significance. You'll see why these operations are defined the way they are, and how they combine to express complex transformations concisely.

Matrix Addition: Blending Transformations

Matrix addition is the simplest operation: add corresponding elements.

$$A + B = \begin{bmatrix} a_{11} + b_{11} & a_{12} + b_{12} \\ a_{21} + b_{21} & a_{22} + b_{22} \end{bmatrix}$$

Requirement: Matrices must have the same dimensions. You cannot add a 2×3 matrix to a 3×2 matrix.

Geometric interpretation:

If $A$ sends the basis vector $\mathbf{e}_1$ to column 1 of $A$, and $B$ sends $\mathbf{e}_1$ to column 1 of $B$, then $(A+B)$ sends $\mathbf{e}_1$ to the vector sum of those destinations. It's like averaging or blending two transformations.

Blending Rotation and ScalingConsider a 45° rotation and a scaling by 2:

Input

$$R = \\begin{bmatrix} 0.707 & -0.707 \\\\ 0.707 & 0.707 \\end{bmatrix}, \\quad S = \\begin{bmatrix} 2 & 0 \\\\ 0 & 2 \\end{bmatrix}$$

Output

$$R + S = \\begin{bmatrix} 2.707 & -0.707 \\\\ 0.707 & 2.707 \\end{bmatrix}$$

Explanation

This sum doesn't represent 'rotate then scale' or 'scale then rotate'—those would be matrix products. Instead, it creates a new transformation that blends properties of both. The result scales diagonally while adding a rotational shear.

When Addition Makes Sense

Matrix addition is most meaningful when matrices represent similar types of operations. Adding a rotation to a scaling gives a valid matrix but may not have an intuitive geometric interpretation. Addition is natural for: combining effects (like forces), averaging transformations, or creating linear interpolations between states.

Properties of matrix addition:

Property	Statement	Meaning
Commutative	$A + B = B + A$	Order doesn't matter
Associative	$(A + B) + C = A + (B + C)$	Grouping doesn't matter
Additive Identity	$A + O = A$	Zero matrix is neutral
Additive Inverse	$A + (-A) = O$	Every matrix has a negation

Scalar Multiplication: Amplifying Transformations

Scalar multiplication multiplies every entry by a constant:

$$cA = \begin{bmatrix} ca_{11} & ca_{12} \\ ca_{21} & ca_{22} \end{bmatrix}$$

Geometric interpretation:

If $A$ is a transformation, then $cA$ applies the same transformation but scales the entire output by $c$. This is equivalent to first applying $A$, then uniformly scaling by $c$.

Effects of Scalar Multiplication
Scalar $c$	Effect on Transformation	Example Use
$c > 1$	Amplify the transformation	Stronger effect
$0 < c < 1$	Dampen the transformation	Partial application
$c = 0$	Collapse to zero matrix	Nullify transform
$c = -1$	Reverse the transformation	Opposite direction
$c < -1$	Reverse and amplify	Strong opposite effect

Scalar multiplication and rotation:

Consider a 90° rotation matrix: $$R_{90} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}$$

Then $2R_{90}$: $$2R_{90} = \begin{bmatrix} 0 & -2 \\ 2 & 0 \end{bmatrix}$$

This rotates by 90° AND scales by factor 2. The basis vectors $\mathbf{e}_1$ and $\mathbf{e}_2$ move to $(0, 2)$ and $(-2, 0)$ respectively—rotated AND stretched.

ML Application: Learning Rate

In gradient descent, the learning rate acts like a scalar multiplier on the gradient matrix. A learning rate of 0.01 means 'apply 1% of the suggested update direction.' Too large (>1) and updates overshoot; too small and learning stalls. Scalar multiplication literally controls learning speed.

Matrix Transpose: Swapping Rows and Columns

The transpose of matrix $A$, denoted $A^T$, is obtained by swapping rows and columns:

$$(A^T){ij} = A{ji}$$

For a 2×3 matrix, the transpose is 3×2:

$$A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \implies A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}$$

Geometric interpretation:

For square matrices, the transpose reflects the transformation across the main diagonal of the matrix. More precisely, if $A$ transforms via certain angles and scales, $A^T$ uses a 'mirror image' of those parameters.

For rectangular matrices, transpose swaps the domain and codomain dimensions—a $3 \times 5$ matrix (maps $\mathbb{R}^5 \to \mathbb{R}^3$) transposes to a $5 \times 3$ matrix (maps $\mathbb{R}^3 \to \mathbb{R}^5$).

Transpose Properties

•Double transpose: $(A^T)^T = A$
•Sum transpose: $(A + B)^T = A^T + B^T$
•Scalar transpose: $(cA)^T = cA^T$
•Product transpose: $(AB)^T = B^T A^T$ ← Note the reversal!
•Symmetric matrix: $A = A^T$ (equals its own transpose)

Product Transpose Reversal

The transpose of a product reverses the order: $(AB)^T = B^T A^T$. This is crucial and often forgotten. Geometrically: if we transpose the composition of two transformations, we must compose the transposes in reverse order. This property appears constantly in gradient derivations for neural networks.

Special matrix types defined by transpose:

Type	Definition	Geometric Meaning
Symmetric	$A = A^T$	Transformation has perpendicular eigenvectors
Skew-symmetric	$A = -A^T$	Pure rotation (in 3D: infinitesimal rotation)
Orthogonal	$A^T = A^{-1}$	Preserves lengths and angles

ML Applications:

Covariance matrices are symmetric by construction
Weight matrices and their transposes appear in backpropagation
Orthogonal matrices used in orthogonal weight initialization

Matrix-Vector Multiplication: Applying Transformations

Matrix-vector multiplication is the fundamental operation that applies a transformation to a vector. There are two equivalent ways to interpret it, and understanding both provides deeper insight.

Row-wise interpretation (dot products):

Each component of the output is the dot product of a row of $A$ with the input vector:

$$\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} a_{11}x_1 + a_{12}x_2 \\ a_{21}x_1 + a_{22}x_2 \end{bmatrix}$$

Output component $y_i = (\text{row } i) \cdot \mathbf{x}$

Geometric meaning: Each output component measures how much the input aligns with a particular direction defined by the row.

ML example: In a neural network layer, each neuron computes a weighted sum of inputs—that's a dot product. The row of the weight matrix contains that neuron's learned sensitivities to each input feature.

Both Views of the Same ComputationLet's multiply a matrix by a vector using both interpretations:

Input

$$A = \\begin{bmatrix} 2 & 1 \\\\ 0 & 3 \\end{bmatrix}, \\quad \\mathbf{x} = \\begin{bmatrix} 4 \\\\ 2 \\end{bmatrix}$$

Output

**Row picture:** $y_1 = 2(4) + 1(2) = 10$, $y_2 = 0(4) + 3(2) = 6$ → $(10, 6)$

**Column picture:** $4\\begin{bmatrix}2\\\\0\\end{bmatrix} + 2\\begin{bmatrix}1\\\\3\\end{bmatrix} = \\begin{bmatrix}8\\\\0\\end{bmatrix} + \\begin{bmatrix}2\\\\6\\end{bmatrix} = \\begin{bmatrix}10\\\\6\\end{bmatrix}$

Explanation

Same answer, different perspectives. Row picture: compute dot products. Column picture: combine columns. Both are essential for ML intuition.

The Column View in Deep Learning

In neural networks, think of the weight matrix columns as learned feature detectors. The input vector says 'how much of each feature is present.' The output is the combination of feature responses. This column view makes layer operations intuitive: you're combining learned representations.

Matrix-Matrix Multiplication: Composing Transformations

Matrix multiplication is where the transformation perspective truly shines. The product $AB$ represents applying transformation $B$ first, then transformation $A$.

Definition:

For $A$ (size $m \times n$) and $B$ (size $n \times p$), their product $C = AB$ has size $m \times p$:

$$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj} = (\text{row } i \text{ of } A) \cdot (\text{column } j \text{ of } B)$$

Dimension rule: Inner dimensions must match. $(m \times \mathbf{n}) \cdot (\mathbf{n} \times p) = (m \times p)$

If $A$ is $2 \times 3$ and $B$ is $3 \times 4$, then $AB$ is $2 \times 4$. But $BA$ is undefined (4 ≠ 2).

Why this definition?

The definition enforces that $(AB)\mathbf{x} = A(B\mathbf{x})$ for any vector $\mathbf{x}$. We want the matrix product to behave exactly like function composition:

Apply $B$ to $\mathbf{x}$: $\mathbf{u} = B\mathbf{x}$ (transformation 1)
Apply $A$ to result: $\mathbf{y} = A\mathbf{u} = A(B\mathbf{x})$ (transformation 2)

The product $AB$ should give the same result in one step: $\mathbf{y} = (AB)\mathbf{x}$

The row-by-column multiplication rule is the ONLY definition that makes this work.

Composing Scale and RotationScale by 2 in x-direction, then rotate 90° counterclockwise:

Input

$$S = \\begin{bmatrix} 2 & 0 \\\\ 0 & 1 \\end{bmatrix} \\text{ (scale x by 2)}$$
$$R = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix} \\text{ (rotate 90°)}$$

Output

$$RS = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix} \\begin{bmatrix} 2 & 0 \\\\ 0 & 1 \\end{bmatrix} = \\begin{bmatrix} 0 & -1 \\\\ 2 & 0 \\end{bmatrix}$$

Explanation

Reading right-to-left: first scale x by 2, then rotate 90°. The basis vector (1,0) goes to (2,0) after scaling, then to (0,2) after rotation—which is exactly column 1 of the product. Verify: column 2 should take (0,1) → (0,1) → (-1,0). ✓

Matrix Multiplication is NOT Commutative

In general, $AB eq BA$. 'Rotate then scale' differs from 'scale then rotate.' Even when both products are defined (square matrices of same size), the results typically differ. This reflects the physical reality that the order of operations matters in geometry.

Properties of matrix multiplication:

Property	Statement	Note
NOT commutative	$AB
eq BA$ generally	Order matters!
Associative	$(AB)C = A(BC)$	Grouping doesn't affect result
Distributive	$A(B+C) = AB + AC$	Multiplication distributes over addition
Identity	$AI = IA = A$	Identity matrix is neutral
Zero	$A \cdot O = O$	Zero matrix annihilates

Four Ways to View Matrix Multiplication

Understanding matrix multiplication deeply requires seeing it from multiple angles. Here are four complementary perspectives:

1. Entry-by-entry (dot product view):

Each entry $(i,j)$ of $C = AB$ is the dot product of row $i$ of $A$ with column $j$ of $B$:

$$C_{ij} = \mathbf{a}i^T \cdot \mathbf{b}j = \sum_k A{ik} B{kj}$$

When useful: Computing individual entries, understanding computational complexity ($O(n^3)$ for $n \times n$ matrices).

Outer Product in Neural Networks

In gradient descent, weight updates have the form $\Delta W = \eta \cdot \mathbf{error} \cdot \mathbf{input}^T$—an outer product! Each training example contributes a rank-1 update. Understanding this view clarifies how neural networks learn: they accumulate evidence from input-error correlations.

Computational Complexity of Matrix Operations

Understanding computational cost is essential for ML practitioners who work with matrices containing millions of parameters.

Complexity of Common Matrix Operations
Operation	Time Complexity	Memory	Notes
Addition $A + B$ ($m \times n$)	$O(mn)$	$O(mn)$	Element-wise, highly parallelizable
Scalar mult $cA$	$O(mn)$	$O(mn)$	Element-wise
Transpose $A^T$	$O(mn)$	$O(mn)$	In practice, often $O(1)$ via view
Matrix-vector $A\mathbf{x}$ ($m \times n$)	$O(mn)$	$O(m + n)$	Each output needs $n$ ops
Matrix-matrix $AB$ ($m \times n \times p$)	$O(mnp)$	$O(mp)$	Naive algorithm; faster exist
Square matrix mult ($n \times n$)	$O(n^3)$	$O(n^2)$	Strassen: $O(n^{2.807})$

The $n^3$ barrier:

Naive matrix multiplication of two $n \times n$ matrices requires $O(n^3)$ operations. For $n = 1000$, that's a billion operations. For $n = 10000$, it's a trillion.

Optimizations in practice:

Strassen's algorithm: $O(n^{2.807})$—faster but numerically less stable
Block/tiled algorithms: Better cache utilization
GPU parallelization: Thousands of cores compute in parallel
Sparse matrices: Skip zero entries entirely
Low-rank approximations: Use $UV^T$ instead of full matrix when rank is low

Associativity for Efficiency

Matrix multiplication is associative: $(AB)C = A(BC)$. But the order affects computation cost! Computing $ABC$ where $A$ is $10 \times 100$, $B$ is $100 \times 5$, $C$ is $5 \times 50$:

• $(AB)C$: $(10 \times 100 \times 5) + (10 \times 5 \times 50) = 5000 + 2500 = 7500$ ops • $A(BC)$: $(100 \times 5 \times 50) + (10 \times 100 \times 50) = 25000 + 50000 = 75000$ ops

Choosing the right order matters—10x difference here!

Summary: Matrix Operations Reference

We've covered the essential operations on matrices and their geometric meanings. Here's a consolidated reference:

Key Takeaways

•Addition blends transformations by combining their effects on basis vectors
•Scalar multiplication amplifies or dampens a transformation uniformly
•Transpose swaps rows and columns, reversing the product order: $(AB)^T = B^T A^T$
•Matrix-vector product applies the transformation: row view (dot products) and column view (linear combination) are equivalent
•Matrix-matrix product composes transformations, applying right-to-left
•Four views of matrix product: entry-wise, column-wise, row-wise, outer product sum
•Complexity of multiplication is $O(n^3)$ for square matrices—order of operations matters for chains

Quick Reference: Essential Formulas
Operation	Formula	Requirement
Addition	$(A+B){ij} = A{ij} + B_{ij}$	Same dimensions
Scalar mult	$(cA){ij} = cA{ij}$	Any matrix
Transpose	$(A^T){ij} = A{ji}$	Any matrix
Product	$(AB){ij} = \sum_k A{ik}B_{kj}$	Cols of $A$ = Rows of $B$

What's next:

Now that we can compute with matrices, we turn to interpreting what matrix multiplication tells us about the relationship between input and output space. The next page explores the deep meaning of matrix multiplication as transformation composition, including how to visualize and reason about chained transformations in ML.

Page Complete

You now have computational fluency with matrix operations and understand their geometric significance. These operations aren't arbitrary—they're the algebra of transformations. Next: diving deeper into matrix multiplication interpretation and what it reveals about composed transformations.

Matrix Operations

The Arithmetic of Transformations

What You Will Master

Matrix Addition: Blending Transformations

Matrix addition is the simplest operation: add corresponding elements.

$$A + B = \begin{bmatrix} a_{11} + b_{11} & a_{12} + b_{12} \\ a_{21} + b_{21} & a_{22} + b_{22} \end{bmatrix}$$

Requirement: Matrices must have the same dimensions. You cannot add a 2×3 matrix to a 3×2 matrix.

Geometric interpretation:

Blending Rotation and ScalingConsider a 45° rotation and a scaling by 2:

Input

$$R = \\begin{bmatrix} 0.707 & -0.707 \\\\ 0.707 & 0.707 \\end{bmatrix}, \\quad S = \\begin{bmatrix} 2 & 0 \\\\ 0 & 2 \\end{bmatrix}$$

Output

$$R + S = \\begin{bmatrix} 2.707 & -0.707 \\\\ 0.707 & 2.707 \\end{bmatrix}$$

Explanation

When Addition Makes Sense

Properties of matrix addition:

Property	Statement	Meaning
Commutative	$A + B = B + A$	Order doesn't matter
Associative	$(A + B) + C = A + (B + C)$	Grouping doesn't matter
Additive Identity	$A + O = A$	Zero matrix is neutral
Additive Inverse	$A + (-A) = O$	Every matrix has a negation

Scalar Multiplication: Amplifying Transformations

Scalar multiplication multiplies every entry by a constant:

$$cA = \begin{bmatrix} ca_{11} & ca_{12} \\ ca_{21} & ca_{22} \end{bmatrix}$$

Geometric interpretation:

If $A$ is a transformation, then $cA$ applies the same transformation but scales the entire output by $c$. This is equivalent to first applying $A$, then uniformly scaling by $c$.

Effects of Scalar Multiplication
Scalar $c$	Effect on Transformation	Example Use
$c > 1$	Amplify the transformation	Stronger effect
$0 < c < 1$	Dampen the transformation	Partial application
$c = 0$	Collapse to zero matrix	Nullify transform
$c = -1$	Reverse the transformation	Opposite direction
$c < -1$	Reverse and amplify	Strong opposite effect

Scalar multiplication and rotation:

Consider a 90° rotation matrix: $$R_{90} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}$$

Then $2R_{90}$: $$2R_{90} = \begin{bmatrix} 0 & -2 \\ 2 & 0 \end{bmatrix}$$

This rotates by 90° AND scales by factor 2. The basis vectors $\mathbf{e}_1$ and $\mathbf{e}_2$ move to $(0, 2)$ and $(-2, 0)$ respectively—rotated AND stretched.

ML Application: Learning Rate

Matrix Transpose: Swapping Rows and Columns

The transpose of matrix $A$, denoted $A^T$, is obtained by swapping rows and columns:

$$(A^T){ij} = A{ji}$$

For a 2×3 matrix, the transpose is 3×2:

$$A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \implies A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}$$

Geometric interpretation:

Transpose Properties

•Double transpose: $(A^T)^T = A$
•Sum transpose: $(A + B)^T = A^T + B^T$
•Scalar transpose: $(cA)^T = cA^T$
•Product transpose: $(AB)^T = B^T A^T$ ← Note the reversal!
•Symmetric matrix: $A = A^T$ (equals its own transpose)

Product Transpose Reversal

Special matrix types defined by transpose:

Type	Definition	Geometric Meaning
Symmetric	$A = A^T$	Transformation has perpendicular eigenvectors
Skew-symmetric	$A = -A^T$	Pure rotation (in 3D: infinitesimal rotation)
Orthogonal	$A^T = A^{-1}$	Preserves lengths and angles

ML Applications:

Covariance matrices are symmetric by construction
Weight matrices and their transposes appear in backpropagation
Orthogonal matrices used in orthogonal weight initialization

Matrix-Vector Multiplication: Applying Transformations

Matrix-vector multiplication is the fundamental operation that applies a transformation to a vector. There are two equivalent ways to interpret it, and understanding both provides deeper insight.

Row-wise interpretation (dot products):

Each component of the output is the dot product of a row of $A$ with the input vector:

$$\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} a_{11}x_1 + a_{12}x_2 \\ a_{21}x_1 + a_{22}x_2 \end{bmatrix}$$

Output component $y_i = (\text{row } i) \cdot \mathbf{x}$

Geometric meaning: Each output component measures how much the input aligns with a particular direction defined by the row.

Both Views of the Same ComputationLet's multiply a matrix by a vector using both interpretations:

Input

$$A = \\begin{bmatrix} 2 & 1 \\\\ 0 & 3 \\end{bmatrix}, \\quad \\mathbf{x} = \\begin{bmatrix} 4 \\\\ 2 \\end{bmatrix}$$

Output

**Row picture:** $y_1 = 2(4) + 1(2) = 10$, $y_2 = 0(4) + 3(2) = 6$ → $(10, 6)$

**Column picture:** $4\\begin{bmatrix}2\\\\0\\end{bmatrix} + 2\\begin{bmatrix}1\\\\3\\end{bmatrix} = \\begin{bmatrix}8\\\\0\\end{bmatrix} + \\begin{bmatrix}2\\\\6\\end{bmatrix} = \\begin{bmatrix}10\\\\6\\end{bmatrix}$

Explanation

Same answer, different perspectives. Row picture: compute dot products. Column picture: combine columns. Both are essential for ML intuition.

The Column View in Deep Learning

Matrix-Matrix Multiplication: Composing Transformations

Matrix multiplication is where the transformation perspective truly shines. The product $AB$ represents applying transformation $B$ first, then transformation $A$.

Definition:

For $A$ (size $m \times n$) and $B$ (size $n \times p$), their product $C = AB$ has size $m \times p$:

$$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj} = (\text{row } i \text{ of } A) \cdot (\text{column } j \text{ of } B)$$

Dimension rule: Inner dimensions must match. $(m \times \mathbf{n}) \cdot (\mathbf{n} \times p) = (m \times p)$

If $A$ is $2 \times 3$ and $B$ is $3 \times 4$, then $AB$ is $2 \times 4$. But $BA$ is undefined (4 ≠ 2).

Why this definition?

The definition enforces that $(AB)\mathbf{x} = A(B\mathbf{x})$ for any vector $\mathbf{x}$. We want the matrix product to behave exactly like function composition:

Apply $B$ to $\mathbf{x}$: $\mathbf{u} = B\mathbf{x}$ (transformation 1)
Apply $A$ to result: $\mathbf{y} = A\mathbf{u} = A(B\mathbf{x})$ (transformation 2)

The product $AB$ should give the same result in one step: $\mathbf{y} = (AB)\mathbf{x}$

The row-by-column multiplication rule is the ONLY definition that makes this work.

Composing Scale and RotationScale by 2 in x-direction, then rotate 90° counterclockwise:

Input

$$S = \\begin{bmatrix} 2 & 0 \\\\ 0 & 1 \\end{bmatrix} \\text{ (scale x by 2)}$$
$$R = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix} \\text{ (rotate 90°)}$$

Output

$$RS = \\begin{bmatrix} 0 & -1 \\\\ 1 & 0 \\end{bmatrix} \\begin{bmatrix} 2 & 0 \\\\ 0 & 1 \\end{bmatrix} = \\begin{bmatrix} 0 & -1 \\\\ 2 & 0 \\end{bmatrix}$$

Explanation

Matrix Multiplication is NOT Commutative

Properties of matrix multiplication:

Property	Statement	Note
NOT commutative	$AB
eq BA$ generally	Order matters!
Associative	$(AB)C = A(BC)$	Grouping doesn't affect result
Distributive	$A(B+C) = AB + AC$	Multiplication distributes over addition
Identity	$AI = IA = A$	Identity matrix is neutral
Zero	$A \cdot O = O$	Zero matrix annihilates

Four Ways to View Matrix Multiplication

Understanding matrix multiplication deeply requires seeing it from multiple angles. Here are four complementary perspectives:

1. Entry-by-entry (dot product view):

Each entry $(i,j)$ of $C = AB$ is the dot product of row $i$ of $A$ with column $j$ of $B$:

$$C_{ij} = \mathbf{a}i^T \cdot \mathbf{b}j = \sum_k A{ik} B{kj}$$

When useful: Computing individual entries, understanding computational complexity ($O(n^3)$ for $n \times n$ matrices).

Outer Product in Neural Networks

Computational Complexity of Matrix Operations

Understanding computational cost is essential for ML practitioners who work with matrices containing millions of parameters.

Complexity of Common Matrix Operations
Operation	Time Complexity	Memory	Notes
Addition $A + B$ ($m \times n$)	$O(mn)$	$O(mn)$	Element-wise, highly parallelizable
Scalar mult $cA$	$O(mn)$	$O(mn)$	Element-wise
Transpose $A^T$	$O(mn)$	$O(mn)$	In practice, often $O(1)$ via view
Matrix-vector $A\mathbf{x}$ ($m \times n$)	$O(mn)$	$O(m + n)$	Each output needs $n$ ops
Matrix-matrix $AB$ ($m \times n \times p$)	$O(mnp)$	$O(mp)$	Naive algorithm; faster exist
Square matrix mult ($n \times n$)	$O(n^3)$	$O(n^2)$	Strassen: $O(n^{2.807})$

The $n^3$ barrier:

Naive matrix multiplication of two $n \times n$ matrices requires $O(n^3)$ operations. For $n = 1000$, that's a billion operations. For $n = 10000$, it's a trillion.

Optimizations in practice:

Strassen's algorithm: $O(n^{2.807})$—faster but numerically less stable
Block/tiled algorithms: Better cache utilization
GPU parallelization: Thousands of cores compute in parallel
Sparse matrices: Skip zero entries entirely
Low-rank approximations: Use $UV^T$ instead of full matrix when rank is low

Associativity for Efficiency

Matrix multiplication is associative: $(AB)C = A(BC)$. But the order affects computation cost! Computing $ABC$ where $A$ is $10 \times 100$, $B$ is $100 \times 5$, $C$ is $5 \times 50$:

• $(AB)C$: $(10 \times 100 \times 5) + (10 \times 5 \times 50) = 5000 + 2500 = 7500$ ops • $A(BC)$: $(100 \times 5 \times 50) + (10 \times 100 \times 50) = 25000 + 50000 = 75000$ ops

Choosing the right order matters—10x difference here!

Summary: Matrix Operations Reference

We've covered the essential operations on matrices and their geometric meanings. Here's a consolidated reference:

Key Takeaways

•Addition blends transformations by combining their effects on basis vectors
•Scalar multiplication amplifies or dampens a transformation uniformly
•Transpose swaps rows and columns, reversing the product order: $(AB)^T = B^T A^T$
•Matrix-vector product applies the transformation: row view (dot products) and column view (linear combination) are equivalent
•Matrix-matrix product composes transformations, applying right-to-left
•Four views of matrix product: entry-wise, column-wise, row-wise, outer product sum
•Complexity of multiplication is $O(n^3)$ for square matrices—order of operations matters for chains

Quick Reference: Essential Formulas
Operation	Formula	Requirement
Addition	$(A+B){ij} = A{ij} + B_{ij}$	Same dimensions
Scalar mult	$(cA){ij} = cA{ij}$	Any matrix
Transpose	$(A^T){ij} = A{ji}$	Any matrix
Product	$(AB){ij} = \sum_k A{ik}B_{kj}$	Cols of $A$ = Rows of $B$

What's next:

Page Complete