Loading learning content...
Every machine learning algorithm that fits a model to data is, at its mathematical core, solving a projection problem. When you train a linear regression model, you're not just "fitting a line"—you're finding the orthogonal projection of your target vector onto the column space of your feature matrix. When principal component analysis reduces dimensions, it's projecting data onto lower-dimensional subspaces that capture maximum variance.
Orthogonal projection is the mathematical operation that finds the closest point in a subspace to a given point. This seemingly abstract geometric operation is the engine powering fundamental ML techniques:
Understanding projection at a deep, geometric level transforms how you understand model fitting. You'll see that least squares isn't an arbitrary choice—it's the mathematically natural consequence of seeking the closest approximation in a geometric sense.
By the end of this page, you will understand orthogonal projection geometrically and algebraically, derive the projection formula from first principles, prove key properties of projections, and see how this single operation underlies the entire theory of least squares approximation.
Before diving into formulas, let's build geometric intuition. Consider a simple 3D scenario: you have a vector b in three-dimensional space, and a plane passing through the origin (a 2D subspace). What point on the plane is closest to b?
Intuitively, you'd drop a perpendicular from b to the plane. The foot of that perpendicular—let's call it p—is the orthogonal projection of b onto the plane.
Key geometric properties:
The perpendicularity condition is crucial. If the error weren't perpendicular, you could "slide" along the plane to find a closer point—contradicting that p is the closest. The Pythagorean theorem guarantees this: any other point in the plane forms a right triangle with p and b, making it necessarily farther from b.
Think of orthogonal projection as casting a shadow. If a light source shines perpendicular to a plane, the shadow of any object on that plane is its orthogonal projection. The shadow "loses" the component of the object that sticks out perpendicular to the plane, keeping only the part that lies parallel to it.
Generalizing to arbitrary subspaces:
This intuition extends to any subspace S of any dimension within any vector space. Given a vector b and a subspace S:
We write b = proj_S(b) + e, decomposing b into a component in S and a component orthogonal to S. This decomposition is unique and forms the foundation of approximation theory.
| Subspace Type | Geometric Picture | Projection Result |
|---|---|---|
| Line through origin | Project onto a 1D line | Scalar multiple of the direction vector |
| Plane through origin | Project onto a 2D plane | Linear combination of two basis vectors |
| Hyperplane in ℝⁿ | Project onto (n-1)-dimensional subspace | Remove the normal component |
| Column space of A | Project onto span of columns | The 'best fit' for the linear system |
Let's start with the simplest case: projecting a vector b onto a line spanned by a single non-zero vector a. The projection must be a scalar multiple of a:
proj_a(b) = xa for some scalar x
To find x, we use the perpendicularity condition. The error b - xa must be perpendicular to a:
(b - xa) · a = 0
Expanding:
b · a - x(a · a) = 0
Solving for x:
x = (a · b) / (a · a) = (a · b) / ||a||²
Therefore, the projection formula is:
proj_a(b) = (a · b / a · a) · a = (aᵀb / aᵀa) · a
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
import numpy as np def project_onto_line(b: np.ndarray, a: np.ndarray) -> np.ndarray: """ Project vector b onto the line spanned by vector a. Parameters: b: The vector to project (n,) a: The direction vector defining the line (n,) Returns: The projection of b onto the line span({a}) Mathematical derivation: We seek p = x*a such that (b - x*a) ⊥ a => (b - x*a)·a = 0 => b·a - x·(a·a) = 0 => x = (a·b)/(a·a) => p = ((a·b)/(a·a)) * a """ # Validate inputs if np.allclose(a, 0): raise ValueError("Direction vector a cannot be zero") # Compute the scalar coefficient scalar = np.dot(a, b) / np.dot(a, a) # The projection is a scalar multiple of a projection = scalar * a return projection def decompose_orthogonally(b: np.ndarray, a: np.ndarray) -> tuple: """ Decompose vector b into component parallel to a and component perpendicular to a. Returns: (projection, error) where b = projection + error and projection ∥ a while error ⊥ a """ projection = project_onto_line(b, a) error = b - projection # Verify orthogonality (should be approximately zero) assert np.allclose(np.dot(error, a), 0), "Error should be orthogonal to a" return projection, error # Example: Project (1, 2, 3) onto the line spanned by (1, 1, 1)b = np.array([1.0, 2.0, 3.0])a = np.array([1.0, 1.0, 1.0]) proj, err = decompose_orthogonally(b, a) print(f"Original vector b: {b}")print(f"Direction vector a: {a}")print(f"Projection onto line: {proj}")print(f"Error (perpendicular component): {err}")print(f"Reconstruction (proj + err): {proj + err}")print(f"Verify: error · a = {np.dot(err, a):.10f} (should be ≈ 0)")print(f"Verify: ||b||² = ||proj||² + ||err||² : {np.linalg.norm(b)**2:.4f} = {np.linalg.norm(proj)**2:.4f} + {np.linalg.norm(err)**2:.4f}")Important observations:
1. The Pythagorean Theorem holds: Since the projection and error are orthogonal: ||b||² = ||proj_a(b)||² + ||e||²
This is the Pythagorean theorem in action—the squared length of the hypotenuse equals the sum of squared lengths of the legs.
2. Scaling invariance of direction: If we scale a by a non-zero constant c, the projection doesn't change: proj_{ca}(b) = ((ca) · b / (ca) · (ca)) · (ca) = (c(a · b) / c²(a · a)) · (ca) = (a · b / a · a) · a = proj_a(b)
The projection depends only on the direction of the line, not the magnitude of the vector spanning it.
3. Normalized case: If a is a unit vector (||a|| = 1), the formula simplifies: proj_a(b) = (a · b)a
The scalar (a · b) is the signed length of the projection—positive if b has a component in the direction of a, negative if opposite.
The scalar x = (a·b)/(a·a) represents the 'coordinate' of the projection in the 1D subspace spanned by a. In machine learning terms, if a represents a feature direction, this scalar tells you 'how much' of that feature is present in b.
Now consider projecting b onto a subspace S of dimension k, spanned by linearly independent vectors a₁, a₂, ..., aₖ. We can arrange these as columns of a matrix A = [a₁ | a₂ | ... | aₖ], so S = Col(A), the column space of A.
The projection must be expressible as a linear combination of the basis vectors:
proj_S(b) = x₁a₁ + x₂a₂ + ... + xₖaₖ = Ax
for some coefficient vector x ∈ ℝᵏ.
The orthogonality condition says the error b - Ax must be perpendicular to every vector in S. Since {a₁, ..., aₖ} spans S, it suffices to require:
aᵢ · (b - Ax) = 0 for all i = 1, ..., k
In matrix form:
Aᵀ(b - Ax) = 0
Expanding:
Aᵀb - AᵀAx = 0
AᵀAx** = Aᵀ**b****
These are the famous normal equations. If A has full column rank (columns are linearly independent), then AᵀA is invertible, and:
x = (AᵀA)⁻¹Aᵀb
The projection is:
proj_S(b) = A(AᵀA)⁻¹Aᵀb
The matrix AᵀA is a k×k symmetric matrix. It's invertible if and only if A has full column rank (its columns are linearly independent). If A has linearly dependent columns, AᵀA is singular, and we need the pseudoinverse (covered later in this module).
Deriving the formula step by step:
Step 1: Set up the optimization problem We want the point p ∈ Col(A) that minimizes ||b - p||. Since p ∈ Col(A), we write p = Ax for some x.
Step 2: Apply the orthogonality condition The error e = b - Ax must be orthogonal to Col(A). This means e ⊥ aⱼ for each column aⱼ of A.
Step 3: Express orthogonality in matrix form The condition aⱼᵀe = 0 for all j can be written as Aᵀe = 0, or Aᵀ(b - Ax) = 0.
Step 4: Solve the normal equations Rearranging: AᵀAx = Aᵀb. If AᵀA is invertible: x = (AᵀA)⁻¹Aᵀb.
Step 5: The projection Substituting back: p = Ax = A(AᵀA)⁻¹Aᵀb.
The expression A(AᵀA)⁻¹Aᵀ is so fundamental that it has its own name: the projection matrix (or hat matrix in statistics). We'll explore its properties in depth on the next page.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
import numpy as npfrom numpy.linalg import inv, norm, matrix_rank def project_onto_column_space(b: np.ndarray, A: np.ndarray) -> dict: """ Project vector b onto the column space of matrix A. Parameters: b: The vector to project (m,) A: Matrix whose column space defines the subspace (m x n) Returns: Dictionary containing: - projection: The projection of b onto Col(A) - coefficients: The coefficient vector x such that projection = Ax - error: The residual b - projection - projection_matrix: The matrix P = A(AᵀA)⁻¹Aᵀ Mathematical basis: The projection p = Ax where x satisfies the normal equations: AᵀAx = Aᵀb Solution: x = (AᵀA)⁻¹Aᵀb (when A has full column rank) Projection: p = A(AᵀA)⁻¹Aᵀb = Pb where P is the projection matrix """ m, n = A.shape # Check that A has full column rank rank = matrix_rank(A) if rank < n: raise ValueError(f"Matrix A must have full column rank. Rank={rank}, n={n}") # Compute the key matrices ATA = A.T @ A # n x n (Gram matrix) ATb = A.T @ b # n x 1 # Solve the normal equations: AᵀAx = Aᵀb # Using explicit inverse for clarity (in practice, use np.linalg.solve) ATA_inv = inv(ATA) x = ATA_inv @ ATb # Coefficient vector # Compute the projection and error projection = A @ x error = b - projection # Compute the projection matrix P = A(AᵀA)⁻¹Aᵀ projection_matrix = A @ ATA_inv @ A.T # Verify key properties assert np.allclose(A.T @ error, 0), "Error should be orthogonal to column space" assert np.allclose(norm(b)**2, norm(projection)**2 + norm(error)**2), "Pythagorean theorem should hold" return { 'projection': projection, 'coefficients': x, 'error': error, 'projection_matrix': projection_matrix } # Example: Project b onto the column space of A# Consider a 3D vector and a 2D subspace (a plane through origin)A = np.array([ [1.0, 0.0], [1.0, 1.0], [0.0, 1.0]]) # 3x2 matrix: column space is a 2D plane in ℝ³ b = np.array([1.0, 2.0, 3.0]) # Vector in ℝ³ result = project_onto_column_space(b, A) print("=== Projection onto Column Space ===")print(f"\nMatrix A (defines the subspace):")print(A)print(f"\nVector to project b: {b}")print(f"\nProjection p = Ax: {result['projection']}")print(f"Coefficients x: {result['coefficients']}")print(f"Error e = b - p: {result['error']}")print(f"\nProjection matrix P:")print(result['projection_matrix']) print(f"\n=== Verification ===")print(f"Aᵀe = {A.T @ result['error']} (should be ≈ [0, 0])")print(f"||b||² = {norm(b)**2:.4f}")print(f"||p||² + ||e||² = {norm(result['projection'])**2:.4f} + {norm(result['error'])**2:.4f} = {norm(result['projection'])**2 + norm(result['error'])**2:.4f}")The orthogonality principle is the cornerstone of projection theory. It states:
A vector p in subspace S is the orthogonal projection of b onto S if and only if the error (b - p) is orthogonal to every vector in S.
This is an if and only if statement—orthogonality both characterizes projections and is sufficient to identify them. Let's prove both directions:
Proof that the projection satisfies orthogonality:
Suppose p is the closest point in S to b. We want to show (b - p) ⊥ S.
Take any vector s ∈ S and consider p + ts for scalar t. This is also in S. The squared distance from b to p + ts is:
||b - (p + ts)||² = ||(b - p) - ts||² = ||b - p||² - 2t(b - p)·s + t²||s||²
Since p minimizes this distance, the derivative with respect to t at t=0 must be zero:
d/dt [||b - p||² - 2t(b - p)·s + t²||s||²] |_{t=0} = -2(b - p)·s = 0
Therefore (b - p)·s = 0 for all s ∈ S, proving orthogonality.
Proof that orthogonality implies closest point:
Conversely, suppose p ∈ S satisfies (b - p) ⊥ S. We want to show p is the closest point to b in S.
Take any other point q ∈ S. Then p - q ∈ S (since S is a subspace), so (b - p) ⊥ (p - q).
By the Pythagorean theorem applied to the right triangle with legs (b - p) and (p - q):
||b - q||² = ||(b - p) + (p - q)||² = ||b - p||² + ||p - q||²
Since ||p - q||² ≥ 0, we have:
||b - q||² ≥ ||b - p||²
with equality if and only if q = p. Thus p is the unique closest point.
The key insight: Finding the projection is equivalent to finding a vector in S such that the error is orthogonal to S. This transforms a minimization problem into a system of linear equations.
In linear regression, we seek coefficients that minimize squared error. The orthogonality principle tells us this is equivalent to requiring the residuals to be orthogonal to the feature vectors. This geometric insight explains why the normal equations work—they encode exactly this orthogonality condition.
The projection formula simplifies dramatically when the subspace has an orthonormal basis. If u₁, u₂, ..., uₖ are orthonormal vectors (mutually perpendicular unit vectors) spanning S, the projection becomes:
proj_S(b) = (u₁·b)u₁ + (u₂·b)u₂ + ... + (uₖ·b)uₖ = Σᵢ (uᵢᵀb)uᵢ
No matrix inversion required! This is because for orthonormal columns, AᵀA = I (the identity matrix).
Why does this work?
Let Q = [u₁ | u₂ | ... | uₖ]. Since the columns are orthonormal:
QᵀQ = I (the k×k identity matrix)
The general projection formula gives:
proj_S(b) = Q(QᵀQ)⁻¹Qᵀb = QI⁻¹Qᵀb = QQᵀb
Expanding QQᵀb:
QQᵀb = [u₁ | ... | uₖ] · [u₁ᵀb, ..., uₖᵀb]ᵀ = Σᵢ (uᵢᵀb)uᵢ
Each term (uᵢᵀb)uᵢ is the projection of b onto the line spanned by uᵢ, and the total projection is simply the sum of these individual projections.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
import numpy as np def project_onto_orthonormal_basis(b: np.ndarray, Q: np.ndarray) -> np.ndarray: """ Project vector b onto the subspace spanned by orthonormal columns of Q. When the basis is orthonormal, the projection simplifies to: proj(b) = Q @ Qᵀ @ b = Σᵢ (uᵢᵀb)uᵢ This is much simpler and more numerically stable than the general formula. Parameters: b: Vector to project (m,) Q: Matrix with orthonormal columns (m x k) Returns: The projection of b onto Col(Q) """ # Verify orthonormality k = Q.shape[1] QTQ = Q.T @ Q if not np.allclose(QTQ, np.eye(k)): raise ValueError("Columns of Q must be orthonormal") # The beautiful simplicity: proj = QQᵀb projection = Q @ (Q.T @ b) return projection def compare_projection_methods(b: np.ndarray, A: np.ndarray): """ Compare projection using general formula vs orthonormal basis. Demonstrates that orthonormalizing first (via QR decomposition) leads to simpler, more stable computation. """ from numpy.linalg import qr, inv # Method 1: General formula with original basis ATA = A.T @ A proj_general = A @ inv(ATA) @ A.T @ b # Method 2: First orthonormalize, then project Q, R = qr(A) # A = QR where Q has orthonormal columns proj_orthonormal = Q @ (Q.T @ b) print("=== Comparing Projection Methods ===") print(f"\nOriginal matrix A:") print(A) print(f"\nOrthonormal Q (from QR decomposition):") print(Q) print(f"\nVector b: {b}") print(f"\nProjection via general formula (A(AᵀA)⁻¹Aᵀb): {proj_general}") print(f"Projection via orthonormal basis (QQᵀb): {proj_orthonormal}") print(f"\nDifference: {np.linalg.norm(proj_general - proj_orthonormal):.2e}") # Show the individual projections onto each orthonormal basis vector print(f"\n=== Decomposition onto Orthonormal Basis ===") for i in range(Q.shape[1]): u = Q[:, i] component = np.dot(u, b) * u print(f"Projection onto u{i+1}: {component} (coefficient: {np.dot(u, b):.4f})") return proj_general, proj_orthonormal # ExampleA = np.array([ [1.0, 2.0], [0.0, 1.0], [1.0, 0.0]]) b = np.array([1.0, 2.0, 3.0]) compare_projection_methods(b, A) # Direct orthonormal projection exampleprint("\n" + "="*50)print("=== Direct Orthonormal Projection ===") # Create an orthonormal basis for a 2D subspace of ℝ³u1 = np.array([1/np.sqrt(2), 1/np.sqrt(2), 0])u2 = np.array([0, 0, 1])Q = np.column_stack([u1, u2]) b = np.array([3.0, 1.0, 2.0])proj = project_onto_orthonormal_basis(b, Q) print(f"\nOrthonormal basis vectors:")print(f"u1 = {u1}")print(f"u2 = {u2}")print(f"\nVector b = {b}")print(f"Projection = {proj}")print(f"Error = {b - proj}")print(f"\nVerify error ⊥ subspace:")print(f" error · u1 = {np.dot(b - proj, u1):.10f}")print(f" error · u2 = {np.dot(b - proj, u2):.10f}")The orthonormal formula proj = QQᵀb requires only matrix-vector multiplications (O(mk) operations), while the general formula A(AᵀA)⁻¹Aᵀb requires matrix inversion (O(k³) operations). For high-dimensional problems, orthonormalizing the basis first (via Gram-Schmidt or QR decomposition) and then projecting is both faster and more numerically stable.
Connection to Fourier Series:
The orthonormal projection formula is the discrete analog of Fourier series. In Fourier analysis, a function is decomposed as:
f(x) = Σₙ cₙ φₙ(x)
where {φₙ} are orthonormal basis functions (sines and cosines), and the coefficients are:
cₙ = ⟨f, φₙ⟩ = ∫ f(x)φₙ(x) dx
This is exactly the continuous analog of cᵢ = uᵢᵀb. Projection onto orthonormal bases is the fundamental operation behind:
For any subspace S of a vector space V, the orthogonal complement S⊥ (read "S perp") is the set of all vectors orthogonal to every vector in S:
S⊥ = {v ∈ V : v · s = 0 for all s ∈ S}
The orthogonal complement is itself a subspace, and together S and S⊥ span the entire space:
V = S ⊕ S⊥ (direct sum decomposition)
This means every vector b ∈ V can be uniquely written as:
b = p + e where p ∈ S and e ∈ S⊥
The projection operation is precisely the operation that extracts the S component: proj_S(b) = p, while the complement projection extracts the S⊥ component: proj_{S⊥}(b) = e = b - p.
| Property | Mathematical Statement | Geometric Meaning |
|---|---|---|
| Direct sum | V = S ⊕ S⊥ | Every vector uniquely decomposes into S and S⊥ parts |
| Dimension | dim(S) + dim(S⊥) = dim(V) | Dimensions are complementary |
| Double complement | (S⊥)⊥ = S | Taking complement twice returns original subspace |
| Intersection | S ∩ S⊥ = {0} | Only the zero vector is in both |
| Complement of complement projection | I - P_S = P_{S⊥} | Complementary projections sum to identity |
Connection to the Four Fundamental Subspaces:
For an m × n matrix A, there are four fundamental subspaces, related by orthogonal complements:
The crucial orthogonality relationships:
When we project b onto Col(A), the error e = b - Ax̂ lands in Null(Aᵀ), which is precisely why Aᵀe = 0 (the normal equations).
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
import numpy as npfrom numpy.linalg import svd, matrix_rank def find_orthogonal_complement_basis(A: np.ndarray) -> np.ndarray: """ Find an orthonormal basis for the orthogonal complement of Col(A). The orthogonal complement of Col(A) is Null(Aᵀ), the left null space. We find this using the SVD: A = UΣVᵀ The columns of U corresponding to zero singular values form a basis for the left null space (orthogonal complement of column space). Parameters: A: Matrix whose column space complement we seek (m x n) Returns: Matrix whose columns form an orthonormal basis for Col(A)⊥ """ m, n = A.shape r = matrix_rank(A) # SVD gives orthonormal bases for all four fundamental subspaces U, S, Vt = svd(A, full_matrices=True) # First r columns of U span Col(A) # Remaining (m - r) columns of U span Col(A)⊥ = Null(Aᵀ) orthogonal_complement_basis = U[:, r:] return orthogonal_complement_basis def verify_orthogonal_decomposition(b: np.ndarray, A: np.ndarray): """ Verify that b decomposes into Col(A) and Col(A)⊥ components. """ from numpy.linalg import lstsq # Project b onto Col(A) x_hat, residuals, rank, s = lstsq(A, b, rcond=None) projection = A @ x_hat error = b - projection # Get basis for orthogonal complement complement_basis = find_orthogonal_complement_basis(A) print("=== Orthogonal Decomposition ===") print(f"\nVector b: {b}") print(f"Projection onto Col(A): {projection}") print(f"Error (in Col(A)⊥): {error}") print(f"\nReconstruction b = projection + error: {projection + error}") print(f"\n=== Verification ===") # Verify projection is in Col(A) print(f"Projection is linear combination of A's columns: Ax̂ where x̂ = {x_hat}") # Verify error is orthogonal to all columns of A print(f"\nError orthogonal to each column of A:") for i in range(A.shape[1]): print(f" error · A[:,{i}] = {np.dot(error, A[:, i]):.10f}") # Verify error is in the orthogonal complement if complement_basis.size > 0: print(f"\nOrthogonal complement basis (dim = {complement_basis.shape[1]}):") print(complement_basis) # Express error in this basis if complement_basis.shape[1] > 0: coeffs = complement_basis.T @ error print(f"\nError expressed in complement basis:") print(f" Coefficients: {coeffs}") print(f" Reconstruction: {complement_basis @ coeffs}") # Pythagorean theorem print(f"\n=== Pythagorean Theorem ===") print(f"||b||² = {np.linalg.norm(b)**2:.6f}") print(f"||proj||² + ||error||² = {np.linalg.norm(projection)**2:.6f} + {np.linalg.norm(error)**2:.6f}") print(f" = {np.linalg.norm(projection)**2 + np.linalg.norm(error)**2:.6f}") # Example: A 3x2 matrix (column space is a plane in ℝ³)A = np.array([ [1.0, 0.0], [1.0, 1.0], [0.0, 1.0]]) b = np.array([1.0, 2.0, 3.0]) verify_orthogonal_decomposition(b, A)We've stated repeatedly that orthogonal projection finds the closest point. Let's prove this rigorously from a purely algebraic perspective, connecting to the optimization viewpoint central to machine learning.
Theorem: Let S be a subspace of ℝⁿ and b ∈ ℝⁿ. Among all vectors s ∈ S, the orthogonal projection proj_S(b) uniquely minimizes ||b - s||.
Proof using calculus:
Let S = Col(A) where A is m × k with full column rank. We seek x ∈ ℝᵏ minimizing:
f(x) = ||b - Ax||² = (b - Ax)ᵀ(b - Ax)
Expanding:
f(x) = bᵀb - 2xᵀAᵀb + xᵀAᵀAx
Taking the gradient with respect to x:
∇f(x) = -2Aᵀb + 2AᵀAx
Setting the gradient to zero:
AᵀAx = Aᵀb
These are the normal equations. Since AᵀA is positive definite (when A has full column rank), the Hessian ∇²f = 2AᵀA is positive definite, confirming this is a minimum.
The unique solution is x̂ = (AᵀA)⁻¹Aᵀb, giving projection p = Ax̂ = A(AᵀA)⁻¹Aᵀb.
This derivation is identical to deriving the least squares solution for an overdetermined linear system Ax = b. The 'best' solution—the one minimizing ||b - Ax||²—is precisely the one that projects b onto Col(A). Least squares IS orthogonal projection.
Geometric proof (without calculus):
Let p = proj_S(b) and let q be any other point in S. We want to show ||b - p|| < ||b - q||.
Define e = b - p (the error). By the orthogonality principle, e ⊥ S.
Since both p and q are in S, the vector (p - q) ∈ S, so e ⊥ (p - q).
Now compute: ||b - q||² = ||(b - p) + (p - q)||² = ||e + (p - q)||² = ||e||² + ||p - q||² + 2e·(p - q) [expansion] = ||e||² + ||p - q||² + 0 [orthogonality] = ||b - p||² + ||p - q||²
Since ||p - q||² ≥ 0, we have:
||b - q||² ≥ ||b - p||²
Equality holds if and only if ||p - q|| = 0, i.e., q = p.
This proves p is the unique closest point—the distance is strictly larger for any other point in S.
We've built a comprehensive understanding of orthogonal projection—the mathematical operation that finds the closest point in a subspace to a given point. Let's consolidate the key concepts:
What's next:
In the next page, we'll study projection matrices—the linear operators that perform projection. The matrix P = A(AᵀA)⁻¹Aᵀ has remarkable properties (idempotent, symmetric) that reveal deep structure. Understanding projection matrices provides computational tools and theoretical insights essential for linear regression, PCA, and beyond.
You now understand orthogonal projection—the geometric operation at the heart of least squares. You can project vectors onto lines, planes, and general subspaces, and you understand why the projection minimizes distance through both geometric and algebraic arguments. Next, we formalize this operation as a projection matrix.