Machine LearningLinear Algebra Foundations

Projections and Least Squares

LevelIntermediate

Duration90 mins

TopicLinear Algebra Foundations

1 / 5

Orthogonal Projection

The Geometry of Best Approximation

Every machine learning algorithm that fits a model to data is, at its mathematical core, solving a projection problem. When you train a linear regression model, you're not just "fitting a line"—you're finding the orthogonal projection of your target vector onto the column space of your feature matrix. When principal component analysis reduces dimensions, it's projecting data onto lower-dimensional subspaces that capture maximum variance.

Orthogonal projection is the mathematical operation that finds the closest point in a subspace to a given point. This seemingly abstract geometric operation is the engine powering fundamental ML techniques:

Linear Regression: Projects the target vector onto the column space spanned by features
PCA: Projects data onto principal component subspaces
Gram-Schmidt Orthogonalization: Builds orthonormal bases through sequential projections
Signal Processing: Decomposes signals into components via projection onto basis functions

Understanding projection at a deep, geometric level transforms how you understand model fitting. You'll see that least squares isn't an arbitrary choice—it's the mathematically natural consequence of seeking the closest approximation in a geometric sense.

What You Will Learn

By the end of this page, you will understand orthogonal projection geometrically and algebraically, derive the projection formula from first principles, prove key properties of projections, and see how this single operation underlies the entire theory of least squares approximation.

The Geometric Intuition

Before diving into formulas, let's build geometric intuition. Consider a simple 3D scenario: you have a vector b in three-dimensional space, and a plane passing through the origin (a 2D subspace). What point on the plane is closest to b?

Intuitively, you'd drop a perpendicular from b to the plane. The foot of that perpendicular—let's call it p—is the orthogonal projection of b onto the plane.

Key geometric properties:

Perpendicularity: The error vector (b - p) is perpendicular to every vector in the plane
Uniqueness: There's exactly one closest point—the projection is unique
Distance Minimization: The projection minimizes the Euclidean distance from b to any point in the subspace

The perpendicularity condition is crucial. If the error weren't perpendicular, you could "slide" along the plane to find a closer point—contradicting that p is the closest. The Pythagorean theorem guarantees this: any other point in the plane forms a right triangle with p and b, making it necessarily farther from b.

The Shadow Analogy

Think of orthogonal projection as casting a shadow. If a light source shines perpendicular to a plane, the shadow of any object on that plane is its orthogonal projection. The shadow "loses" the component of the object that sticks out perpendicular to the plane, keeping only the part that lies parallel to it.

Generalizing to arbitrary subspaces:

This intuition extends to any subspace S of any dimension within any vector space. Given a vector b and a subspace S:

The orthogonal projection proj_S(b) is the unique vector in S closest to b
The error or residual is e = b - proj_S(b)
The error is orthogonal to every vector in S: for all s ∈ S, we have e · s = 0

We write b = proj_S(b) + e, decomposing b into a component in S and a component orthogonal to S. This decomposition is unique and forms the foundation of approximation theory.

Projection onto Different Subspace Types
Subspace Type	Geometric Picture	Projection Result
Line through origin	Project onto a 1D line	Scalar multiple of the direction vector
Plane through origin	Project onto a 2D plane	Linear combination of two basis vectors
Hyperplane in ℝⁿ	Project onto (n-1)-dimensional subspace	Remove the normal component
Column space of A	Project onto span of columns	The 'best fit' for the linear system

Projection onto a Line

Let's start with the simplest case: projecting a vector b onto a line spanned by a single non-zero vector a. The projection must be a scalar multiple of a:

proj_a(b) = xa for some scalar x

To find x, we use the perpendicularity condition. The error b - xa must be perpendicular to a:

(b - xa) · a = 0

Expanding:

b · a - x(a · a) = 0

Solving for x:

x = (a · b) / (a · a) = (a · b) / ||a||²

Therefore, the projection formula is:

proj_a(b) = (a · b / a · a) · a = (aᵀb / aᵀa) · a

projection_onto_line.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import numpy as np
 
def project_onto_line(b: np.ndarray, a: np.ndarray) -> np.ndarray:
    """
    Project vector b onto the line spanned by vector a.
    
    Parameters:
        b: The vector to project (n,)
        a: The direction vector defining the line (n,)
    
    Returns:
        The projection of b onto the line span({a})
    
    Mathematical derivation:
        We seek p = x*a such that (b - x*a) ⊥ a
        => (b - x*a)·a = 0
        => b·a - x·(a·a) = 0
        => x = (a·b)/(a·a)
        => p = ((a·b)/(a·a)) * a
    """
    # Validate inputs
    if np.allclose(a, 0):
        raise ValueError("Direction vector a cannot be zero")
    
    # Compute the scalar coefficient
    scalar = np.dot(a, b) / np.dot(a, a)
    
    # The projection is a scalar multiple of a
    projection = scalar * a
    
    return projection
 
def decompose_orthogonally(b: np.ndarray, a: np.ndarray) -> tuple:
    """
    Decompose vector b into component parallel to a and component perpendicular to a.
    
    Returns:
        (projection, error) where b = projection + error
        and projection ∥ a while error ⊥ a
    """
    projection = project_onto_line(b, a)
    error = b - projection
    
    # Verify orthogonality (should be approximately zero)
    assert np.allclose(np.dot(error, a), 0), "Error should be orthogonal to a"
    
    return projection, error
 
# Example: Project (1, 2, 3) onto the line spanned by (1, 1, 1)
b = np.array([1.0, 2.0, 3.0])
a = np.array([1.0, 1.0, 1.0])
 
proj, err = decompose_orthogonally(b, a)
 
print(f"Original vector b: {b}")
print(f"Direction vector a: {a}")
print(f"Projection onto line: {proj}")
print(f"Error (perpendicular component): {err}")
print(f"Reconstruction (proj + err): {proj + err}")
print(f"Verify: error · a = {np.dot(err, a):.10f} (should be ≈ 0)")
print(f"Verify: ||b||² = ||proj||² + ||err||² : {np.linalg.norm(b)**2:.4f} = {np.linalg.norm(proj)**2:.4f} + {np.linalg.norm(err)**2:.4f}")

Important observations:

1. The Pythagorean Theorem holds: Since the projection and error are orthogonal: ||b||² = ||proj_a(b)||² + ||e||²

This is the Pythagorean theorem in action—the squared length of the hypotenuse equals the sum of squared lengths of the legs.

2. Scaling invariance of direction: If we scale a by a non-zero constant c, the projection doesn't change: proj_{ca}(b) = ((ca) · b / (ca) · (ca)) · (ca) = (c(a · b) / c²(a · a)) · (ca) = (a · b / a · a) · a = proj_a(b)

The projection depends only on the direction of the line, not the magnitude of the vector spanning it.

3. Normalized case: If a is a unit vector (||a|| = 1), the formula simplifies: proj_a(b) = (a · b)a

The scalar (a · b) is the signed length of the projection—positive if b has a component in the direction of a, negative if opposite.

Coordinate Representation

The scalar x = (a·b)/(a·a) represents the 'coordinate' of the projection in the 1D subspace spanned by a. In machine learning terms, if a represents a feature direction, this scalar tells you 'how much' of that feature is present in b.

Projection onto a General Subspace

Now consider projecting b onto a subspace S of dimension k, spanned by linearly independent vectors a₁, a₂, ..., aₖ. We can arrange these as columns of a matrix A = [a₁ | a₂ | ... | aₖ], so S = Col(A), the column space of A.

The projection must be expressible as a linear combination of the basis vectors:

proj_S(b) = x₁a₁ + x₂a₂ + ... + xₖaₖ = Ax

for some coefficient vector x ∈ ℝᵏ.

The orthogonality condition says the error b - Ax must be perpendicular to every vector in S. Since {a₁, ..., aₖ} spans S, it suffices to require:

aᵢ · (b - Ax) = 0 for all i = 1, ..., k

In matrix form:

Aᵀ(b - Ax) = 0

Expanding:

Aᵀb - AᵀAx = 0

AᵀAx** = Aᵀ**b****

These are the famous normal equations. If A has full column rank (columns are linearly independent), then AᵀA is invertible, and:

x = (AᵀA)⁻¹Aᵀb

The projection is:

proj_S(b) = A(AᵀA)⁻¹Aᵀb

Why AᵀA is Invertible

The matrix AᵀA is a k×k symmetric matrix. It's invertible if and only if A has full column rank (its columns are linearly independent). If A has linearly dependent columns, AᵀA is singular, and we need the pseudoinverse (covered later in this module).

Deriving the formula step by step:

Step 1: Set up the optimization problem We want the point p ∈ Col(A) that minimizes ||b - p||. Since p ∈ Col(A), we write p = Ax for some x.

Step 2: Apply the orthogonality condition The error e = b - Ax must be orthogonal to Col(A). This means e ⊥ aⱼ for each column aⱼ of A.

Step 3: Express orthogonality in matrix form The condition aⱼᵀe = 0 for all j can be written as Aᵀe = 0, or Aᵀ(b - Ax) = 0.

Step 4: Solve the normal equations Rearranging: AᵀAx = Aᵀb. If AᵀA is invertible: x = (AᵀA)⁻¹Aᵀb.

Step 5: The projection Substituting back: p = Ax = A(AᵀA)⁻¹Aᵀb.

The expression A(AᵀA)⁻¹Aᵀ is so fundamental that it has its own name: the projection matrix (or hat matrix in statistics). We'll explore its properties in depth on the next page.

projection_onto_subspace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import numpy as np
from numpy.linalg import inv, norm, matrix_rank
 
def project_onto_column_space(b: np.ndarray, A: np.ndarray) -> dict:
    """
    Project vector b onto the column space of matrix A.
    
    Parameters:
        b: The vector to project (m,)
        A: Matrix whose column space defines the subspace (m x n)
    
    Returns:
        Dictionary containing:
        - projection: The projection of b onto Col(A)
        - coefficients: The coefficient vector x such that projection = Ax
        - error: The residual b - projection
        - projection_matrix: The matrix P = A(AᵀA)⁻¹Aᵀ
    
    Mathematical basis:
        The projection p = Ax where x satisfies the normal equations:
        AᵀAx = Aᵀb
        
        Solution: x = (AᵀA)⁻¹Aᵀb (when A has full column rank)
        Projection: p = A(AᵀA)⁻¹Aᵀb = Pb where P is the projection matrix
    """
    m, n = A.shape
    
    # Check that A has full column rank
    rank = matrix_rank(A)
    if rank < n:
        raise ValueError(f"Matrix A must have full column rank. Rank={rank}, n={n}")
    
    # Compute the key matrices
    ATA = A.T @ A          # n x n (Gram matrix)
    ATb = A.T @ b          # n x 1
    
    # Solve the normal equations: AᵀAx = Aᵀb
    # Using explicit inverse for clarity (in practice, use np.linalg.solve)
    ATA_inv = inv(ATA)
    x = ATA_inv @ ATb      # Coefficient vector
    
    # Compute the projection and error
    projection = A @ x
    error = b - projection
    
    # Compute the projection matrix P = A(AᵀA)⁻¹Aᵀ
    projection_matrix = A @ ATA_inv @ A.T
    
    # Verify key properties
    assert np.allclose(A.T @ error, 0), "Error should be orthogonal to column space"
    assert np.allclose(norm(b)**2, norm(projection)**2 + norm(error)**2), "Pythagorean theorem should hold"
    
    return {
        'projection': projection,
        'coefficients': x,
        'error': error,
        'projection_matrix': projection_matrix
    }
 
# Example: Project b onto the column space of A
# Consider a 3D vector and a 2D subspace (a plane through origin)
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])  # 3x2 matrix: column space is a 2D plane in ℝ³
 
b = np.array([1.0, 2.0, 3.0])  # Vector in ℝ³
 
result = project_onto_column_space(b, A)
 
print("=== Projection onto Column Space ===")
print(f"\nMatrix A (defines the subspace):")
print(A)
print(f"\nVector to project b: {b}")
print(f"\nProjection p = Ax: {result['projection']}")
print(f"Coefficients x: {result['coefficients']}")
print(f"Error e = b - p: {result['error']}")
print(f"\nProjection matrix P:")
print(result['projection_matrix'])
 
print(f"\n=== Verification ===")
print(f"Aᵀe = {A.T @ result['error']} (should be ≈ [0, 0])")
print(f"||b||² = {norm(b)**2:.4f}")
print(f"||p||² + ||e||² = {norm(result['projection'])**2:.4f} + {norm(result['error'])**2:.4f} = {norm(result['projection'])**2 + norm(result['error'])**2:.4f}")

The Orthogonality Principle

The orthogonality principle is the cornerstone of projection theory. It states:

A vector p in subspace S is the orthogonal projection of b onto S if and only if the error (b - p) is orthogonal to every vector in S.

This is an if and only if statement—orthogonality both characterizes projections and is sufficient to identify them. Let's prove both directions:

Proof that the projection satisfies orthogonality:

Suppose p is the closest point in S to b. We want to show (b - p) ⊥ S.

Take any vector s ∈ S and consider p + ts for scalar t. This is also in S. The squared distance from b to p + ts is:

||b - (p + ts)||² = ||(b - p) - ts||² = ||b - p||² - 2t(b - p)·s + t²||s||²

Since p minimizes this distance, the derivative with respect to t at t=0 must be zero:

d/dt [||b - p||² - 2t(b - p)·s + t²||s||²] |_{t=0} = -2(b - p)·s = 0

Therefore (b - p)·s = 0 for all s ∈ S, proving orthogonality.

Proof that orthogonality implies closest point:

Conversely, suppose p ∈ S satisfies (b - p) ⊥ S. We want to show p is the closest point to b in S.

Take any other point q ∈ S. Then p - q ∈ S (since S is a subspace), so (b - p) ⊥ (p - q).

By the Pythagorean theorem applied to the right triangle with legs (b - p) and (p - q):

||b - q||² = ||(b - p) + (p - q)||² = ||b - p||² + ||p - q||²

Since ||p - q||² ≥ 0, we have:

||b - q||² ≥ ||b - p||²

with equality if and only if q = p. Thus p is the unique closest point.

The key insight: Finding the projection is equivalent to finding a vector in S such that the error is orthogonal to S. This transforms a minimization problem into a system of linear equations.

Why This Matters for Machine Learning

In linear regression, we seek coefficients that minimize squared error. The orthogonality principle tells us this is equivalent to requiring the residuals to be orthogonal to the feature vectors. This geometric insight explains why the normal equations work—they encode exactly this orthogonality condition.

Key Properties of Orthogonal Projection

•Uniqueness: The orthogonal projection is unique—there's exactly one closest point
•Existence: For finite-dimensional subspaces, the projection always exists
•Error Characterization: The error vector lies in the orthogonal complement S⊥
•Pythagorean Decomposition: ||b||² = ||proj||² + ||error||² always holds
•Linearity: Projection is a linear operation: proj(αb₁ + βb₂) = α·proj(b₁) + β·proj(b₂)

Projection with Orthonormal Bases

The projection formula simplifies dramatically when the subspace has an orthonormal basis. If u₁, u₂, ..., uₖ are orthonormal vectors (mutually perpendicular unit vectors) spanning S, the projection becomes:

proj_S(b) = (u₁·b)u₁ + (u₂·b)u₂ + ... + (uₖ·b)uₖ = Σᵢ (uᵢᵀb)uᵢ

No matrix inversion required! This is because for orthonormal columns, AᵀA = I (the identity matrix).

Why does this work?

Let Q = [u₁ | u₂ | ... | uₖ]. Since the columns are orthonormal:

QᵀQ = I (the k×k identity matrix)

The general projection formula gives:

proj_S(b) = Q(QᵀQ)⁻¹Qᵀb = QI⁻¹Qᵀb = QQᵀb

Expanding QQᵀb:

QQᵀb = [u₁ | ... | uₖ] · [u₁ᵀb, ..., uₖᵀb]ᵀ = Σᵢ (uᵢᵀb)uᵢ

Each term (uᵢᵀb)uᵢ is the projection of b onto the line spanned by uᵢ, and the total projection is simply the sum of these individual projections.

orthonormal_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
 
def project_onto_orthonormal_basis(b: np.ndarray, Q: np.ndarray) -> np.ndarray:
    """
    Project vector b onto the subspace spanned by orthonormal columns of Q.
    
    When the basis is orthonormal, the projection simplifies to:
    proj(b) = Q @ Qᵀ @ b = Σᵢ (uᵢᵀb)uᵢ
    
    This is much simpler and more numerically stable than the general formula.
    
    Parameters:
        b: Vector to project (m,)
        Q: Matrix with orthonormal columns (m x k)
    
    Returns:
        The projection of b onto Col(Q)
    """
    # Verify orthonormality
    k = Q.shape[1]
    QTQ = Q.T @ Q
    if not np.allclose(QTQ, np.eye(k)):
        raise ValueError("Columns of Q must be orthonormal")
    
    # The beautiful simplicity: proj = QQᵀb
    projection = Q @ (Q.T @ b)
    
    return projection
 
def compare_projection_methods(b: np.ndarray, A: np.ndarray):
    """
    Compare projection using general formula vs orthonormal basis.
    
    Demonstrates that orthonormalizing first (via QR decomposition)
    leads to simpler, more stable computation.
    """
    from numpy.linalg import qr, inv
    
    # Method 1: General formula with original basis
    ATA = A.T @ A
    proj_general = A @ inv(ATA) @ A.T @ b
    
    # Method 2: First orthonormalize, then project
    Q, R = qr(A)  # A = QR where Q has orthonormal columns
    proj_orthonormal = Q @ (Q.T @ b)
    
    print("=== Comparing Projection Methods ===")
    print(f"\nOriginal matrix A:")
    print(A)
    print(f"\nOrthonormal Q (from QR decomposition):")
    print(Q)
    print(f"\nVector b: {b}")
    print(f"\nProjection via general formula (A(AᵀA)⁻¹Aᵀb): {proj_general}")
    print(f"Projection via orthonormal basis (QQᵀb):        {proj_orthonormal}")
    print(f"\nDifference: {np.linalg.norm(proj_general - proj_orthonormal):.2e}")
    
    # Show the individual projections onto each orthonormal basis vector
    print(f"\n=== Decomposition onto Orthonormal Basis ===")
    for i in range(Q.shape[1]):
        u = Q[:, i]
        component = np.dot(u, b) * u
        print(f"Projection onto u{i+1}: {component} (coefficient: {np.dot(u, b):.4f})")
    
    return proj_general, proj_orthonormal
 
# Example
A = np.array([
    [1.0, 2.0],
    [0.0, 1.0],
    [1.0, 0.0]
])
 
b = np.array([1.0, 2.0, 3.0])
 
compare_projection_methods(b, A)
 
# Direct orthonormal projection example
print("\n" + "="*50)
print("=== Direct Orthonormal Projection ===")
 
# Create an orthonormal basis for a 2D subspace of ℝ³
u1 = np.array([1/np.sqrt(2), 1/np.sqrt(2), 0])
u2 = np.array([0, 0, 1])
Q = np.column_stack([u1, u2])
 
b = np.array([3.0, 1.0, 2.0])
proj = project_onto_orthonormal_basis(b, Q)
 
print(f"\nOrthonormal basis vectors:")
print(f"u1 = {u1}")
print(f"u2 = {u2}")
print(f"\nVector b = {b}")
print(f"Projection = {proj}")
print(f"Error = {b - proj}")
print(f"\nVerify error ⊥ subspace:")
print(f"  error · u1 = {np.dot(b - proj, u1):.10f}")
print(f"  error · u2 = {np.dot(b - proj, u2):.10f}")

Computational Advantage

The orthonormal formula proj = QQᵀb requires only matrix-vector multiplications (O(mk) operations), while the general formula A(AᵀA)⁻¹Aᵀb requires matrix inversion (O(k³) operations). For high-dimensional problems, orthonormalizing the basis first (via Gram-Schmidt or QR decomposition) and then projecting is both faster and more numerically stable.

Connection to Fourier Series:

The orthonormal projection formula is the discrete analog of Fourier series. In Fourier analysis, a function is decomposed as:

f(x) = Σₙ cₙ φₙ(x)

where {φₙ} are orthonormal basis functions (sines and cosines), and the coefficients are:

cₙ = ⟨f, φₙ⟩ = ∫ f(x)φₙ(x) dx

This is exactly the continuous analog of cᵢ = uᵢᵀb. Projection onto orthonormal bases is the fundamental operation behind:

Discrete Fourier Transform (DFT): Projects signals onto complex exponential basis
Wavelet Transform: Projects onto wavelet basis functions
Principal Component Analysis: Projects onto eigenvectors of the covariance matrix
Spherical Harmonics: Projects functions on the sphere onto harmonic basis

The Orthogonal Complement

For any subspace S of a vector space V, the orthogonal complement S⊥ (read "S perp") is the set of all vectors orthogonal to every vector in S:

S⊥ = {v ∈ V : v · s = 0 for all s ∈ S}

The orthogonal complement is itself a subspace, and together S and S⊥ span the entire space:

V = S ⊕ S⊥ (direct sum decomposition)

This means every vector b ∈ V can be uniquely written as:

b = p + e where p ∈ S and e ∈ S⊥

The projection operation is precisely the operation that extracts the S component: proj_S(b) = p, while the complement projection extracts the S⊥ component: proj_{S⊥}(b) = e = b - p.

Properties of Orthogonal Complements
Property	Mathematical Statement	Geometric Meaning
Direct sum	V = S ⊕ S⊥	Every vector uniquely decomposes into S and S⊥ parts
Dimension	dim(S) + dim(S⊥) = dim(V)	Dimensions are complementary
Double complement	(S⊥)⊥ = S	Taking complement twice returns original subspace
Intersection	S ∩ S⊥ = {0}	Only the zero vector is in both
Complement of complement projection	I - P_S = P_{S⊥}	Complementary projections sum to identity

Connection to the Four Fundamental Subspaces:

For an m × n matrix A, there are four fundamental subspaces, related by orthogonal complements:

Column space of A: Col(A) ⊆ ℝᵐ (dimension = rank(A) = r)
Left null space of A: Null(Aᵀ) ⊆ ℝᵐ (dimension = m - r)
Row space of A: Row(A) = Col(Aᵀ) ⊆ ℝⁿ (dimension = r)
Null space of A: Null(A) ⊆ ℝⁿ (dimension = n - r)

The crucial orthogonality relationships:

Col(A) ⊥ Null(Aᵀ): Column space and left null space are orthogonal complements in ℝᵐ
Row(A) ⊥ Null(A): Row space and null space are orthogonal complements in ℝⁿ

When we project b onto Col(A), the error e = b - Ax̂ lands in Null(Aᵀ), which is precisely why Aᵀe = 0 (the normal equations).

orthogonal_complement.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from numpy.linalg import svd, matrix_rank
 
def find_orthogonal_complement_basis(A: np.ndarray) -> np.ndarray:
    """
    Find an orthonormal basis for the orthogonal complement of Col(A).
    
    The orthogonal complement of Col(A) is Null(Aᵀ), the left null space.
    We find this using the SVD: A = UΣVᵀ
    
    The columns of U corresponding to zero singular values form a basis
    for the left null space (orthogonal complement of column space).
    
    Parameters:
        A: Matrix whose column space complement we seek (m x n)
    
    Returns:
        Matrix whose columns form an orthonormal basis for Col(A)⊥
    """
    m, n = A.shape
    r = matrix_rank(A)
    
    # SVD gives orthonormal bases for all four fundamental subspaces
    U, S, Vt = svd(A, full_matrices=True)
    
    # First r columns of U span Col(A)
    # Remaining (m - r) columns of U span Col(A)⊥ = Null(Aᵀ)
    orthogonal_complement_basis = U[:, r:]
    
    return orthogonal_complement_basis
 
def verify_orthogonal_decomposition(b: np.ndarray, A: np.ndarray):
    """
    Verify that b decomposes into Col(A) and Col(A)⊥ components.
    """
    from numpy.linalg import lstsq
    
    # Project b onto Col(A)
    x_hat, residuals, rank, s = lstsq(A, b, rcond=None)
    projection = A @ x_hat
    error = b - projection
    
    # Get basis for orthogonal complement
    complement_basis = find_orthogonal_complement_basis(A)
    
    print("=== Orthogonal Decomposition ===")
    print(f"\nVector b: {b}")
    print(f"Projection onto Col(A): {projection}")
    print(f"Error (in Col(A)⊥): {error}")
    print(f"\nReconstruction b = projection + error: {projection + error}")
    
    print(f"\n=== Verification ===")
    
    # Verify projection is in Col(A)
    print(f"Projection is linear combination of A's columns: Ax̂ where x̂ = {x_hat}")
    
    # Verify error is orthogonal to all columns of A
    print(f"\nError orthogonal to each column of A:")
    for i in range(A.shape[1]):
        print(f"  error · A[:,{i}] = {np.dot(error, A[:, i]):.10f}")
    
    # Verify error is in the orthogonal complement
    if complement_basis.size > 0:
        print(f"\nOrthogonal complement basis (dim = {complement_basis.shape[1]}):")
        print(complement_basis)
        
        # Express error in this basis
        if complement_basis.shape[1] > 0:
            coeffs = complement_basis.T @ error
            print(f"\nError expressed in complement basis:")
            print(f"  Coefficients: {coeffs}")
            print(f"  Reconstruction: {complement_basis @ coeffs}")
    
    # Pythagorean theorem
    print(f"\n=== Pythagorean Theorem ===")
    print(f"||b||² = {np.linalg.norm(b)**2:.6f}")
    print(f"||proj||² + ||error||² = {np.linalg.norm(projection)**2:.6f} + {np.linalg.norm(error)**2:.6f}")
    print(f"                       = {np.linalg.norm(projection)**2 + np.linalg.norm(error)**2:.6f}")
 
# Example: A 3x2 matrix (column space is a plane in ℝ³)
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
b = np.array([1.0, 2.0, 3.0])
 
verify_orthogonal_decomposition(b, A)

Why Orthogonal Projection Minimizes Distance

We've stated repeatedly that orthogonal projection finds the closest point. Let's prove this rigorously from a purely algebraic perspective, connecting to the optimization viewpoint central to machine learning.

Theorem: Let S be a subspace of ℝⁿ and b ∈ ℝⁿ. Among all vectors s ∈ S, the orthogonal projection proj_S(b) uniquely minimizes ||b - s||.

Proof using calculus:

Let S = Col(A) where A is m × k with full column rank. We seek x ∈ ℝᵏ minimizing:

f(x) = ||b - Ax||² = (b - Ax)ᵀ(b - Ax)

Expanding:

f(x) = bᵀb - 2xᵀAᵀb + xᵀAᵀAx

Taking the gradient with respect to x:

∇f(x) = -2Aᵀb + 2AᵀAx

Setting the gradient to zero:

AᵀAx = Aᵀb

These are the normal equations. Since AᵀA is positive definite (when A has full column rank), the Hessian ∇²f = 2AᵀA is positive definite, confirming this is a minimum.

The unique solution is x̂ = (AᵀA)⁻¹Aᵀb, giving projection p = Ax̂ = A(AᵀA)⁻¹Aᵀb.

Connection to Least Squares

This derivation is identical to deriving the least squares solution for an overdetermined linear system Ax = b. The 'best' solution—the one minimizing ||b - Ax||²—is precisely the one that projects b onto Col(A). Least squares IS orthogonal projection.

Geometric proof (without calculus):

Let p = proj_S(b) and let q be any other point in S. We want to show ||b - p|| < ||b - q||.

Define e = b - p (the error). By the orthogonality principle, e ⊥ S.

Since both p and q are in S, the vector (p - q) ∈ S, so e ⊥ (p - q).

Now compute: ||b - q||² = ||(b - p) + (p - q)||² = ||e + (p - q)||² = ||e||² + ||p - q||² + 2e·(p - q) [expansion] = ||e||² + ||p - q||² + 0 [orthogonality] = ||b - p||² + ||p - q||²

Since ||p - q||² ≥ 0, we have:

||b - q||² ≥ ||b - p||²

Equality holds if and only if ||p - q|| = 0, i.e., q = p.

This proves p is the unique closest point—the distance is strictly larger for any other point in S.

Key Insights from the Proof

•The Pythagorean theorem is central — The proof relies on the right-angle relationship between error and subspace
•Uniqueness is guaranteed — Any deviation from the projection increases distance strictly
•Calculus and geometry agree — Setting the gradient to zero yields the orthogonality condition
•The normal equations emerge naturally — They encode the perpendicularity requirement algebraically

Summary: Orthogonal Projection

We've built a comprehensive understanding of orthogonal projection—the mathematical operation that finds the closest point in a subspace to a given point. Let's consolidate the key concepts:

Key Takeaways

•Orthogonal projection finds the closest point — Given vector b and subspace S, proj_S(b) is the unique point in S minimizing ||b - s||
•The orthogonality principle characterizes projections — The error (b - proj_S(b)) is perpendicular to every vector in S
•The projection formula for Col(A) is A(AᵀA)⁻¹Aᵀb — This emerges from encoding orthogonality as the normal equations
•Orthonormal bases simplify projection to QQᵀb — No matrix inversion needed when the basis is orthonormal
•Every vector decomposes as projection + error — Where error lies in the orthogonal complement S⊥
•The Pythagorean theorem holds — ||b||² = ||projection||² + ||error||² always

What's next:

In the next page, we'll study projection matrices—the linear operators that perform projection. The matrix P = A(AᵀA)⁻¹Aᵀ has remarkable properties (idempotent, symmetric) that reveal deep structure. Understanding projection matrices provides computational tools and theoretical insights essential for linear regression, PCA, and beyond.

Page Complete

You now understand orthogonal projection—the geometric operation at the heart of least squares. You can project vectors onto lines, planes, and general subspaces, and you understand why the projection minimizes distance through both geometric and algebraic arguments. Next, we formalize this operation as a projection matrix.

1 / 5

Loading learning content...

Machine LearningLinear Algebra Foundations

Projections and Least Squares

LevelIntermediate

Duration90 mins

TopicLinear Algebra Foundations

1 / 5

Orthogonal Projection

The Geometry of Best Approximation

Linear Regression: Projects the target vector onto the column space spanned by features
PCA: Projects data onto principal component subspaces
Gram-Schmidt Orthogonalization: Builds orthonormal bases through sequential projections
Signal Processing: Decomposes signals into components via projection onto basis functions

What You Will Learn

The Geometric Intuition

Intuitively, you'd drop a perpendicular from b to the plane. The foot of that perpendicular—let's call it p—is the orthogonal projection of b onto the plane.

Key geometric properties:

Perpendicularity: The error vector (b - p) is perpendicular to every vector in the plane
Uniqueness: There's exactly one closest point—the projection is unique
Distance Minimization: The projection minimizes the Euclidean distance from b to any point in the subspace

The Shadow Analogy

Generalizing to arbitrary subspaces:

This intuition extends to any subspace S of any dimension within any vector space. Given a vector b and a subspace S:

The orthogonal projection proj_S(b) is the unique vector in S closest to b
The error or residual is e = b - proj_S(b)
The error is orthogonal to every vector in S: for all s ∈ S, we have e · s = 0

We write b = proj_S(b) + e, decomposing b into a component in S and a component orthogonal to S. This decomposition is unique and forms the foundation of approximation theory.

Projection onto Different Subspace Types
Subspace Type	Geometric Picture	Projection Result
Line through origin	Project onto a 1D line	Scalar multiple of the direction vector
Plane through origin	Project onto a 2D plane	Linear combination of two basis vectors
Hyperplane in ℝⁿ	Project onto (n-1)-dimensional subspace	Remove the normal component
Column space of A	Project onto span of columns	The 'best fit' for the linear system

Projection onto a Line

Let's start with the simplest case: projecting a vector b onto a line spanned by a single non-zero vector a. The projection must be a scalar multiple of a:

proj_a(b) = xa for some scalar x

To find x, we use the perpendicularity condition. The error b - xa must be perpendicular to a:

(b - xa) · a = 0

Expanding:

b · a - x(a · a) = 0

Solving for x:

x = (a · b) / (a · a) = (a · b) / ||a||²

Therefore, the projection formula is:

proj_a(b) = (a · b / a · a) · a = (aᵀb / aᵀa) · a

projection_onto_line.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import numpy as np
 
def project_onto_line(b: np.ndarray, a: np.ndarray) -> np.ndarray:
    """
    Project vector b onto the line spanned by vector a.
    
    Parameters:
        b: The vector to project (n,)
        a: The direction vector defining the line (n,)
    
    Returns:
        The projection of b onto the line span({a})
    
    Mathematical derivation:
        We seek p = x*a such that (b - x*a) ⊥ a
        => (b - x*a)·a = 0
        => b·a - x·(a·a) = 0
        => x = (a·b)/(a·a)
        => p = ((a·b)/(a·a)) * a
    """
    # Validate inputs
    if np.allclose(a, 0):
        raise ValueError("Direction vector a cannot be zero")
    
    # Compute the scalar coefficient
    scalar = np.dot(a, b) / np.dot(a, a)
    
    # The projection is a scalar multiple of a
    projection = scalar * a
    
    return projection
 
def decompose_orthogonally(b: np.ndarray, a: np.ndarray) -> tuple:
    """
    Decompose vector b into component parallel to a and component perpendicular to a.
    
    Returns:
        (projection, error) where b = projection + error
        and projection ∥ a while error ⊥ a
    """
    projection = project_onto_line(b, a)
    error = b - projection
    
    # Verify orthogonality (should be approximately zero)
    assert np.allclose(np.dot(error, a), 0), "Error should be orthogonal to a"
    
    return projection, error
 
# Example: Project (1, 2, 3) onto the line spanned by (1, 1, 1)
b = np.array([1.0, 2.0, 3.0])
a = np.array([1.0, 1.0, 1.0])
 
proj, err = decompose_orthogonally(b, a)
 
print(f"Original vector b: {b}")
print(f"Direction vector a: {a}")
print(f"Projection onto line: {proj}")
print(f"Error (perpendicular component): {err}")
print(f"Reconstruction (proj + err): {proj + err}")
print(f"Verify: error · a = {np.dot(err, a):.10f} (should be ≈ 0)")
print(f"Verify: ||b||² = ||proj||² + ||err||² : {np.linalg.norm(b)**2:.4f} = {np.linalg.norm(proj)**2:.4f} + {np.linalg.norm(err)**2:.4f}")

Important observations:

1. The Pythagorean Theorem holds: Since the projection and error are orthogonal: ||b||² = ||proj_a(b)||² + ||e||²

This is the Pythagorean theorem in action—the squared length of the hypotenuse equals the sum of squared lengths of the legs.

The projection depends only on the direction of the line, not the magnitude of the vector spanning it.

3. Normalized case: If a is a unit vector (||a|| = 1), the formula simplifies: proj_a(b) = (a · b)a

The scalar (a · b) is the signed length of the projection—positive if b has a component in the direction of a, negative if opposite.

Coordinate Representation

Projection onto a General Subspace

The projection must be expressible as a linear combination of the basis vectors:

proj_S(b) = x₁a₁ + x₂a₂ + ... + xₖaₖ = Ax

for some coefficient vector x ∈ ℝᵏ.

The orthogonality condition says the error b - Ax must be perpendicular to every vector in S. Since {a₁, ..., aₖ} spans S, it suffices to require:

aᵢ · (b - Ax) = 0 for all i = 1, ..., k

In matrix form:

Aᵀ(b - Ax) = 0

Expanding:

Aᵀb - AᵀAx = 0

AᵀAx** = Aᵀ**b****

These are the famous normal equations. If A has full column rank (columns are linearly independent), then AᵀA is invertible, and:

x = (AᵀA)⁻¹Aᵀb

The projection is:

proj_S(b) = A(AᵀA)⁻¹Aᵀb

Why AᵀA is Invertible

Deriving the formula step by step:

Step 1: Set up the optimization problem We want the point p ∈ Col(A) that minimizes ||b - p||. Since p ∈ Col(A), we write p = Ax for some x.

Step 2: Apply the orthogonality condition The error e = b - Ax must be orthogonal to Col(A). This means e ⊥ aⱼ for each column aⱼ of A.

Step 3: Express orthogonality in matrix form The condition aⱼᵀe = 0 for all j can be written as Aᵀe = 0, or Aᵀ(b - Ax) = 0.

Step 4: Solve the normal equations Rearranging: AᵀAx = Aᵀb. If AᵀA is invertible: x = (AᵀA)⁻¹Aᵀb.

Step 5: The projection Substituting back: p = Ax = A(AᵀA)⁻¹Aᵀb.

The expression A(AᵀA)⁻¹Aᵀ is so fundamental that it has its own name: the projection matrix (or hat matrix in statistics). We'll explore its properties in depth on the next page.

projection_onto_subspace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import numpy as np
from numpy.linalg import inv, norm, matrix_rank
 
def project_onto_column_space(b: np.ndarray, A: np.ndarray) -> dict:
    """
    Project vector b onto the column space of matrix A.
    
    Parameters:
        b: The vector to project (m,)
        A: Matrix whose column space defines the subspace (m x n)
    
    Returns:
        Dictionary containing:
        - projection: The projection of b onto Col(A)
        - coefficients: The coefficient vector x such that projection = Ax
        - error: The residual b - projection
        - projection_matrix: The matrix P = A(AᵀA)⁻¹Aᵀ
    
    Mathematical basis:
        The projection p = Ax where x satisfies the normal equations:
        AᵀAx = Aᵀb
        
        Solution: x = (AᵀA)⁻¹Aᵀb (when A has full column rank)
        Projection: p = A(AᵀA)⁻¹Aᵀb = Pb where P is the projection matrix
    """
    m, n = A.shape
    
    # Check that A has full column rank
    rank = matrix_rank(A)
    if rank < n:
        raise ValueError(f"Matrix A must have full column rank. Rank={rank}, n={n}")
    
    # Compute the key matrices
    ATA = A.T @ A          # n x n (Gram matrix)
    ATb = A.T @ b          # n x 1
    
    # Solve the normal equations: AᵀAx = Aᵀb
    # Using explicit inverse for clarity (in practice, use np.linalg.solve)
    ATA_inv = inv(ATA)
    x = ATA_inv @ ATb      # Coefficient vector
    
    # Compute the projection and error
    projection = A @ x
    error = b - projection
    
    # Compute the projection matrix P = A(AᵀA)⁻¹Aᵀ
    projection_matrix = A @ ATA_inv @ A.T
    
    # Verify key properties
    assert np.allclose(A.T @ error, 0), "Error should be orthogonal to column space"
    assert np.allclose(norm(b)**2, norm(projection)**2 + norm(error)**2), "Pythagorean theorem should hold"
    
    return {
        'projection': projection,
        'coefficients': x,
        'error': error,
        'projection_matrix': projection_matrix
    }
 
# Example: Project b onto the column space of A
# Consider a 3D vector and a 2D subspace (a plane through origin)
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])  # 3x2 matrix: column space is a 2D plane in ℝ³
 
b = np.array([1.0, 2.0, 3.0])  # Vector in ℝ³
 
result = project_onto_column_space(b, A)
 
print("=== Projection onto Column Space ===")
print(f"\nMatrix A (defines the subspace):")
print(A)
print(f"\nVector to project b: {b}")
print(f"\nProjection p = Ax: {result['projection']}")
print(f"Coefficients x: {result['coefficients']}")
print(f"Error e = b - p: {result['error']}")
print(f"\nProjection matrix P:")
print(result['projection_matrix'])
 
print(f"\n=== Verification ===")
print(f"Aᵀe = {A.T @ result['error']} (should be ≈ [0, 0])")
print(f"||b||² = {norm(b)**2:.4f}")
print(f"||p||² + ||e||² = {norm(result['projection'])**2:.4f} + {norm(result['error'])**2:.4f} = {norm(result['projection'])**2 + norm(result['error'])**2:.4f}")

The Orthogonality Principle

The orthogonality principle is the cornerstone of projection theory. It states:

A vector p in subspace S is the orthogonal projection of b onto S if and only if the error (b - p) is orthogonal to every vector in S.

This is an if and only if statement—orthogonality both characterizes projections and is sufficient to identify them. Let's prove both directions:

Proof that the projection satisfies orthogonality:

Suppose p is the closest point in S to b. We want to show (b - p) ⊥ S.

Take any vector s ∈ S and consider p + ts for scalar t. This is also in S. The squared distance from b to p + ts is:

||b - (p + ts)||² = ||(b - p) - ts||² = ||b - p||² - 2t(b - p)·s + t²||s||²

Since p minimizes this distance, the derivative with respect to t at t=0 must be zero:

d/dt [||b - p||² - 2t(b - p)·s + t²||s||²] |_{t=0} = -2(b - p)·s = 0

Therefore (b - p)·s = 0 for all s ∈ S, proving orthogonality.

Proof that orthogonality implies closest point:

Conversely, suppose p ∈ S satisfies (b - p) ⊥ S. We want to show p is the closest point to b in S.

Take any other point q ∈ S. Then p - q ∈ S (since S is a subspace), so (b - p) ⊥ (p - q).

By the Pythagorean theorem applied to the right triangle with legs (b - p) and (p - q):

||b - q||² = ||(b - p) + (p - q)||² = ||b - p||² + ||p - q||²

Since ||p - q||² ≥ 0, we have:

||b - q||² ≥ ||b - p||²

with equality if and only if q = p. Thus p is the unique closest point.

The key insight: Finding the projection is equivalent to finding a vector in S such that the error is orthogonal to S. This transforms a minimization problem into a system of linear equations.

Why This Matters for Machine Learning

Key Properties of Orthogonal Projection

•Uniqueness: The orthogonal projection is unique—there's exactly one closest point
•Existence: For finite-dimensional subspaces, the projection always exists
•Error Characterization: The error vector lies in the orthogonal complement S⊥
•Pythagorean Decomposition: ||b||² = ||proj||² + ||error||² always holds
•Linearity: Projection is a linear operation: proj(αb₁ + βb₂) = α·proj(b₁) + β·proj(b₂)

Projection with Orthonormal Bases

proj_S(b) = (u₁·b)u₁ + (u₂·b)u₂ + ... + (uₖ·b)uₖ = Σᵢ (uᵢᵀb)uᵢ

No matrix inversion required! This is because for orthonormal columns, AᵀA = I (the identity matrix).

Why does this work?

Let Q = [u₁ | u₂ | ... | uₖ]. Since the columns are orthonormal:

QᵀQ = I (the k×k identity matrix)

The general projection formula gives:

proj_S(b) = Q(QᵀQ)⁻¹Qᵀb = QI⁻¹Qᵀb = QQᵀb

Expanding QQᵀb:

QQᵀb = [u₁ | ... | uₖ] · [u₁ᵀb, ..., uₖᵀb]ᵀ = Σᵢ (uᵢᵀb)uᵢ

Each term (uᵢᵀb)uᵢ is the projection of b onto the line spanned by uᵢ, and the total projection is simply the sum of these individual projections.

orthonormal_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
 
def project_onto_orthonormal_basis(b: np.ndarray, Q: np.ndarray) -> np.ndarray:
    """
    Project vector b onto the subspace spanned by orthonormal columns of Q.
    
    When the basis is orthonormal, the projection simplifies to:
    proj(b) = Q @ Qᵀ @ b = Σᵢ (uᵢᵀb)uᵢ
    
    This is much simpler and more numerically stable than the general formula.
    
    Parameters:
        b: Vector to project (m,)
        Q: Matrix with orthonormal columns (m x k)
    
    Returns:
        The projection of b onto Col(Q)
    """
    # Verify orthonormality
    k = Q.shape[1]
    QTQ = Q.T @ Q
    if not np.allclose(QTQ, np.eye(k)):
        raise ValueError("Columns of Q must be orthonormal")
    
    # The beautiful simplicity: proj = QQᵀb
    projection = Q @ (Q.T @ b)
    
    return projection
 
def compare_projection_methods(b: np.ndarray, A: np.ndarray):
    """
    Compare projection using general formula vs orthonormal basis.
    
    Demonstrates that orthonormalizing first (via QR decomposition)
    leads to simpler, more stable computation.
    """
    from numpy.linalg import qr, inv
    
    # Method 1: General formula with original basis
    ATA = A.T @ A
    proj_general = A @ inv(ATA) @ A.T @ b
    
    # Method 2: First orthonormalize, then project
    Q, R = qr(A)  # A = QR where Q has orthonormal columns
    proj_orthonormal = Q @ (Q.T @ b)
    
    print("=== Comparing Projection Methods ===")
    print(f"\nOriginal matrix A:")
    print(A)
    print(f"\nOrthonormal Q (from QR decomposition):")
    print(Q)
    print(f"\nVector b: {b}")
    print(f"\nProjection via general formula (A(AᵀA)⁻¹Aᵀb): {proj_general}")
    print(f"Projection via orthonormal basis (QQᵀb):        {proj_orthonormal}")
    print(f"\nDifference: {np.linalg.norm(proj_general - proj_orthonormal):.2e}")
    
    # Show the individual projections onto each orthonormal basis vector
    print(f"\n=== Decomposition onto Orthonormal Basis ===")
    for i in range(Q.shape[1]):
        u = Q[:, i]
        component = np.dot(u, b) * u
        print(f"Projection onto u{i+1}: {component} (coefficient: {np.dot(u, b):.4f})")
    
    return proj_general, proj_orthonormal
 
# Example
A = np.array([
    [1.0, 2.0],
    [0.0, 1.0],
    [1.0, 0.0]
])
 
b = np.array([1.0, 2.0, 3.0])
 
compare_projection_methods(b, A)
 
# Direct orthonormal projection example
print("\n" + "="*50)
print("=== Direct Orthonormal Projection ===")
 
# Create an orthonormal basis for a 2D subspace of ℝ³
u1 = np.array([1/np.sqrt(2), 1/np.sqrt(2), 0])
u2 = np.array([0, 0, 1])
Q = np.column_stack([u1, u2])
 
b = np.array([3.0, 1.0, 2.0])
proj = project_onto_orthonormal_basis(b, Q)
 
print(f"\nOrthonormal basis vectors:")
print(f"u1 = {u1}")
print(f"u2 = {u2}")
print(f"\nVector b = {b}")
print(f"Projection = {proj}")
print(f"Error = {b - proj}")
print(f"\nVerify error ⊥ subspace:")
print(f"  error · u1 = {np.dot(b - proj, u1):.10f}")
print(f"  error · u2 = {np.dot(b - proj, u2):.10f}")

Computational Advantage

Connection to Fourier Series:

The orthonormal projection formula is the discrete analog of Fourier series. In Fourier analysis, a function is decomposed as:

f(x) = Σₙ cₙ φₙ(x)

where {φₙ} are orthonormal basis functions (sines and cosines), and the coefficients are:

cₙ = ⟨f, φₙ⟩ = ∫ f(x)φₙ(x) dx

This is exactly the continuous analog of cᵢ = uᵢᵀb. Projection onto orthonormal bases is the fundamental operation behind:

Discrete Fourier Transform (DFT): Projects signals onto complex exponential basis
Wavelet Transform: Projects onto wavelet basis functions
Principal Component Analysis: Projects onto eigenvectors of the covariance matrix
Spherical Harmonics: Projects functions on the sphere onto harmonic basis

The Orthogonal Complement

For any subspace S of a vector space V, the orthogonal complement S⊥ (read "S perp") is the set of all vectors orthogonal to every vector in S:

S⊥ = {v ∈ V : v · s = 0 for all s ∈ S}

The orthogonal complement is itself a subspace, and together S and S⊥ span the entire space:

V = S ⊕ S⊥ (direct sum decomposition)

This means every vector b ∈ V can be uniquely written as:

b = p + e where p ∈ S and e ∈ S⊥

The projection operation is precisely the operation that extracts the S component: proj_S(b) = p, while the complement projection extracts the S⊥ component: proj_{S⊥}(b) = e = b - p.

Properties of Orthogonal Complements
Property	Mathematical Statement	Geometric Meaning
Direct sum	V = S ⊕ S⊥	Every vector uniquely decomposes into S and S⊥ parts
Dimension	dim(S) + dim(S⊥) = dim(V)	Dimensions are complementary
Double complement	(S⊥)⊥ = S	Taking complement twice returns original subspace
Intersection	S ∩ S⊥ = {0}	Only the zero vector is in both
Complement of complement projection	I - P_S = P_{S⊥}	Complementary projections sum to identity

Connection to the Four Fundamental Subspaces:

For an m × n matrix A, there are four fundamental subspaces, related by orthogonal complements:

Column space of A: Col(A) ⊆ ℝᵐ (dimension = rank(A) = r)
Left null space of A: Null(Aᵀ) ⊆ ℝᵐ (dimension = m - r)
Row space of A: Row(A) = Col(Aᵀ) ⊆ ℝⁿ (dimension = r)
Null space of A: Null(A) ⊆ ℝⁿ (dimension = n - r)

The crucial orthogonality relationships:

Col(A) ⊥ Null(Aᵀ): Column space and left null space are orthogonal complements in ℝᵐ
Row(A) ⊥ Null(A): Row space and null space are orthogonal complements in ℝⁿ

When we project b onto Col(A), the error e = b - Ax̂ lands in Null(Aᵀ), which is precisely why Aᵀe = 0 (the normal equations).

orthogonal_complement.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from numpy.linalg import svd, matrix_rank
 
def find_orthogonal_complement_basis(A: np.ndarray) -> np.ndarray:
    """
    Find an orthonormal basis for the orthogonal complement of Col(A).
    
    The orthogonal complement of Col(A) is Null(Aᵀ), the left null space.
    We find this using the SVD: A = UΣVᵀ
    
    The columns of U corresponding to zero singular values form a basis
    for the left null space (orthogonal complement of column space).
    
    Parameters:
        A: Matrix whose column space complement we seek (m x n)
    
    Returns:
        Matrix whose columns form an orthonormal basis for Col(A)⊥
    """
    m, n = A.shape
    r = matrix_rank(A)
    
    # SVD gives orthonormal bases for all four fundamental subspaces
    U, S, Vt = svd(A, full_matrices=True)
    
    # First r columns of U span Col(A)
    # Remaining (m - r) columns of U span Col(A)⊥ = Null(Aᵀ)
    orthogonal_complement_basis = U[:, r:]
    
    return orthogonal_complement_basis
 
def verify_orthogonal_decomposition(b: np.ndarray, A: np.ndarray):
    """
    Verify that b decomposes into Col(A) and Col(A)⊥ components.
    """
    from numpy.linalg import lstsq
    
    # Project b onto Col(A)
    x_hat, residuals, rank, s = lstsq(A, b, rcond=None)
    projection = A @ x_hat
    error = b - projection
    
    # Get basis for orthogonal complement
    complement_basis = find_orthogonal_complement_basis(A)
    
    print("=== Orthogonal Decomposition ===")
    print(f"\nVector b: {b}")
    print(f"Projection onto Col(A): {projection}")
    print(f"Error (in Col(A)⊥): {error}")
    print(f"\nReconstruction b = projection + error: {projection + error}")
    
    print(f"\n=== Verification ===")
    
    # Verify projection is in Col(A)
    print(f"Projection is linear combination of A's columns: Ax̂ where x̂ = {x_hat}")
    
    # Verify error is orthogonal to all columns of A
    print(f"\nError orthogonal to each column of A:")
    for i in range(A.shape[1]):
        print(f"  error · A[:,{i}] = {np.dot(error, A[:, i]):.10f}")
    
    # Verify error is in the orthogonal complement
    if complement_basis.size > 0:
        print(f"\nOrthogonal complement basis (dim = {complement_basis.shape[1]}):")
        print(complement_basis)
        
        # Express error in this basis
        if complement_basis.shape[1] > 0:
            coeffs = complement_basis.T @ error
            print(f"\nError expressed in complement basis:")
            print(f"  Coefficients: {coeffs}")
            print(f"  Reconstruction: {complement_basis @ coeffs}")
    
    # Pythagorean theorem
    print(f"\n=== Pythagorean Theorem ===")
    print(f"||b||² = {np.linalg.norm(b)**2:.6f}")
    print(f"||proj||² + ||error||² = {np.linalg.norm(projection)**2:.6f} + {np.linalg.norm(error)**2:.6f}")
    print(f"                       = {np.linalg.norm(projection)**2 + np.linalg.norm(error)**2:.6f}")
 
# Example: A 3x2 matrix (column space is a plane in ℝ³)
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
b = np.array([1.0, 2.0, 3.0])
 
verify_orthogonal_decomposition(b, A)

Why Orthogonal Projection Minimizes Distance

Theorem: Let S be a subspace of ℝⁿ and b ∈ ℝⁿ. Among all vectors s ∈ S, the orthogonal projection proj_S(b) uniquely minimizes ||b - s||.

Proof using calculus:

Let S = Col(A) where A is m × k with full column rank. We seek x ∈ ℝᵏ minimizing:

f(x) = ||b - Ax||² = (b - Ax)ᵀ(b - Ax)

Expanding:

f(x) = bᵀb - 2xᵀAᵀb + xᵀAᵀAx

Taking the gradient with respect to x:

∇f(x) = -2Aᵀb + 2AᵀAx

Setting the gradient to zero:

AᵀAx = Aᵀb

These are the normal equations. Since AᵀA is positive definite (when A has full column rank), the Hessian ∇²f = 2AᵀA is positive definite, confirming this is a minimum.

The unique solution is x̂ = (AᵀA)⁻¹Aᵀb, giving projection p = Ax̂ = A(AᵀA)⁻¹Aᵀb.

Connection to Least Squares

Geometric proof (without calculus):

Let p = proj_S(b) and let q be any other point in S. We want to show ||b - p|| < ||b - q||.

Define e = b - p (the error). By the orthogonality principle, e ⊥ S.

Since both p and q are in S, the vector (p - q) ∈ S, so e ⊥ (p - q).

Now compute: ||b - q||² = ||(b - p) + (p - q)||² = ||e + (p - q)||² = ||e||² + ||p - q||² + 2e·(p - q) [expansion] = ||e||² + ||p - q||² + 0 [orthogonality] = ||b - p||² + ||p - q||²

Since ||p - q||² ≥ 0, we have:

||b - q||² ≥ ||b - p||²

Equality holds if and only if ||p - q|| = 0, i.e., q = p.

This proves p is the unique closest point—the distance is strictly larger for any other point in S.

Key Insights from the Proof

•The Pythagorean theorem is central — The proof relies on the right-angle relationship between error and subspace
•Uniqueness is guaranteed — Any deviation from the projection increases distance strictly
•Calculus and geometry agree — Setting the gradient to zero yields the orthogonality condition
•The normal equations emerge naturally — They encode the perpendicularity requirement algebraically

Summary: Orthogonal Projection

We've built a comprehensive understanding of orthogonal projection—the mathematical operation that finds the closest point in a subspace to a given point. Let's consolidate the key concepts:

Key Takeaways

•Orthogonal projection finds the closest point — Given vector b and subspace S, proj_S(b) is the unique point in S minimizing ||b - s||
•The orthogonality principle characterizes projections — The error (b - proj_S(b)) is perpendicular to every vector in S
•The projection formula for Col(A) is A(AᵀA)⁻¹Aᵀb — This emerges from encoding orthogonality as the normal equations
•Orthonormal bases simplify projection to QQᵀb — No matrix inversion needed when the basis is orthonormal
•Every vector decomposes as projection + error — Where error lies in the orthogonal complement S⊥
•The Pythagorean theorem holds — ||b||² = ||projection||² + ||error||² always

What's next:

Page Complete

1 / 5