Machine LearningLinear Algebra Foundations

Projections and Least Squares

LevelIntermediate

Duration90 mins

TopicLinear Algebra Foundations

2 / 5

Projection Matrices

From Operation to Operator

In the previous page, we learned that orthogonal projection maps a vector b to its closest point in a subspace Col(A):

proj_{Col(A)}(b) = A(AᵀA)⁻¹Aᵀb

Notice that this is a linear function of b—the expression A(AᵀA)⁻¹Aᵀ is a matrix that can be computed independently of b. This matrix is called the projection matrix (or hat matrix in statistics, because it puts the "hat" on y to get ŷ).

P = A(AᵀA)⁻¹Aᵀ

Once P is computed, projecting any vector is a simple matrix-vector multiplication: p = Pb. More importantly, the projection matrix has remarkable algebraic properties that reveal deep structure:

•Idempotent: P² = P — projecting twice is the same as projecting once
•Symmetric: Pᵀ = P — the matrix equals its transpose
•Fixed points: If v ∈ Col(A), then Pv = v — vectors in the subspace are unchanged
•Annihilates orthogonal complement: If v ⊥ Col(A), then Pv = 0

What You Will Learn

By the end of this page, you will understand projection matrices deeply—their defining properties, how to construct and verify them, their geometric interpretation, and their role in linear regression. You'll also learn about the complementary projection (I - P) and leverage scores, which measure how influential each data point is.

Defining the Projection Matrix

Given a matrix A ∈ ℝᵐˣⁿ with full column rank (rank(A) = n), the projection matrix onto Col(A) is:

P = A(AᵀA)⁻¹Aᵀ

This is an m × m matrix that projects any vector b ∈ ℝᵐ onto the column space of A.

Key observations:

Dimensions: A is m×n, (AᵀA)⁻¹ is n×n, so A(AᵀA)⁻¹Aᵀ is m×m
Independence from b: P depends only on A, not on the vector being projected
Full column rank requirement: (AᵀA)⁻¹ exists only when A has full column rank
Computational note: We rarely compute P explicitly—it's expensive (m² storage) and unnecessary when we just need Pb

For an orthonormal basis Q (where QᵀQ = I), the projection matrix simplifies beautifully:

P = Q(QᵀQ)⁻¹Qᵀ = QQᵀ

This is computationally cheaper and more numerically stable.

projection_matrix.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from numpy.linalg import inv, qr, matrix_rank
 
def projection_matrix(A: np.ndarray) -> np.ndarray:
    """
    Compute the projection matrix P = A(AᵀA)⁻¹Aᵀ for projecting onto Col(A).
    
    Parameters:
        A: Matrix with full column rank (m x n where m >= n)
    
    Returns:
        P: The m x m projection matrix onto Col(A)
    
    Note: Computing P explicitly is O(m²n + n³) and requires O(m²) storage.
          In practice, compute Pb directly when possible.
    """
    m, n = A.shape
    
    # Verify full column rank
    if matrix_rank(A) < n:
        raise ValueError("A must have full column rank")
    
    # Compute the projection matrix
    ATA = A.T @ A              # n x n
    ATA_inv = inv(ATA)         # n x n
    P = A @ ATA_inv @ A.T      # m x m
    
    return P
 
def projection_matrix_orthonormal(Q: np.ndarray) -> np.ndarray:
    """
    Compute projection matrix when basis is orthonormal: P = QQᵀ
    
    Much simpler and more stable than the general formula.
    """
    # Verify orthonormality
    n = Q.shape[1]
    if not np.allclose(Q.T @ Q, np.eye(n)):
        raise ValueError("Columns of Q must be orthonormal")
    
    return Q @ Q.T
 
def project_via_matrix(b: np.ndarray, A: np.ndarray) -> dict:
    """
    Project b onto Col(A) and analyze the projection matrix.
    """
    P = projection_matrix(A)
    projection = P @ b
    
    return {
        'projection_matrix': P,
        'projection': projection,
        'error': b - projection
    }
 
# Example: Create a projection matrix for a 2D plane in ℝ³
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
P = projection_matrix(A)
 
print("=== Projection Matrix Construction ===")
print(f"\nMatrix A (defines the subspace):")
print(A)
print(f"\nProjection matrix P = A(AᵀA)⁻¹Aᵀ:")
print(np.round(P, 4))
print(f"\nShape of P: {P.shape}")
 
# Project a vector
b = np.array([1.0, 2.0, 3.0])
result = project_via_matrix(b, A)
 
print(f"\n=== Projecting b = {b} ===")
print(f"Projection Pb: {result['projection']}")
print(f"Error (I-P)b: {result['error']}")
 
# Compare with orthonormal version
Q, R = qr(A)
P_ortho = projection_matrix_orthonormal(Q)
 
print(f"\n=== Orthonormal Basis Version ===")
print(f"Orthonormal Q (from QR):")
print(np.round(Q, 4))
print(f"\nP = QQᵀ:")
print(np.round(P_ortho, 4))
print(f"\nDifference from A(AᵀA)⁻¹Aᵀ: {np.linalg.norm(P - P_ortho):.2e}")

The Idempotent Property: P² = P

A matrix P is idempotent if P² = P, meaning applying it twice is the same as applying it once. This is the algebraic expression of a fundamental geometric fact: once a vector is in the subspace, projecting again doesn't move it.

Proof that P = A(AᵀA)⁻¹Aᵀ is idempotent:

P² = [A(AᵀA)⁻¹Aᵀ][A(AᵀA)⁻¹Aᵀ] = A(AᵀA)⁻¹AᵀA⁻¹Aᵀ = A(AᵀA)⁻¹[I]Aᵀ = A(AᵀA)⁻¹Aᵀ = P

The key step is recognizing that (AᵀA)⁻¹(AᵀA) = I, the identity matrix.

Geometric interpretation:

Idempotency captures the essence of projection:

Pb is already in Col(A)
Projecting Pb again onto Col(A) gives the same result
Formally: P(Pb) = Pb, which says P² = P when applied to any b

Idempotent ≠ Identity

Don't confuse idempotent with the identity matrix. The identity I satisfies I² = I, but it does nothing to vectors. A non-trivial projection P projects onto a proper subspace—it actually moves points. But once projected, they're 'stuck' in the subspace.

Eigenvalue characterization of idempotent matrices:

If P is idempotent, its eigenvalues can only be 0 or 1. Here's why:

If Pv = λv for some eigenvector v ≠ 0, then:

P²v = P(Pv) = P(λv) = λPv = λ²v

But since P² = P:

P²v = Pv = λv

Therefore λ²v = λv, which gives (λ² - λ)v = 0.

Since v ≠ 0, we have λ² - λ = 0, so λ(λ - 1) = 0, meaning λ = 0 or λ = 1.

For the projection matrix P onto a rank-r subspace:

r eigenvalues equal to 1 (corresponding to vectors in Col(A), which are unchanged)
m - r eigenvalues equal to 0 (corresponding to vectors in Col(A)⊥, which are annihilated)

idempotent_verification.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
from numpy.linalg import eig, matrix_rank
 
def verify_idempotent(P: np.ndarray, tol: float = 1e-10) -> dict:
    """
    Verify that P is idempotent (P² = P) and analyze its eigenstructure.
    """
    P_squared = P @ P
    
    # Check idempotency
    is_idempotent = np.allclose(P_squared, P, atol=tol)
    
    # Compute eigenvalues
    eigenvalues, eigenvectors = eig(P)
    eigenvalues_real = np.real(eigenvalues)  # Should be real for symmetric matrices
    
    # Count eigenvalues near 0 and 1
    count_zero = np.sum(np.abs(eigenvalues_real) < tol)
    count_one = np.sum(np.abs(eigenvalues_real - 1) < tol)
    
    # Rank equals trace for projection matrices
    rank_from_eigenvalues = int(round(np.sum(eigenvalues_real)))
    trace_P = np.trace(P)
    
    return {
        'is_idempotent': is_idempotent,
        'P_squared_minus_P_norm': np.linalg.norm(P_squared - P),
        'eigenvalues': np.sort(eigenvalues_real)[::-1],
        'count_eigenvalue_1': count_one,
        'count_eigenvalue_0': count_zero,
        'trace': trace_P,
        'rank': rank_from_eigenvalues
    }
 
# Example: Projection matrix for a 2D plane in ℝ³
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
P = A @ np.linalg.inv(A.T @ A) @ A.T
 
print("=== Idempotent Property Verification ===")
print(f"\nProjection matrix P:")
print(np.round(P, 4))
 
result = verify_idempotent(P)
 
print(f"\n=== Results ===")
print(f"Is idempotent (P² = P): {result['is_idempotent']}")
print(f"||P² - P|| = {result['P_squared_minus_P_norm']:.2e}")
print(f"\nEigenvalues: {np.round(result['eigenvalues'], 4)}")
print(f"Count of eigenvalue 1: {result['count_eigenvalue_1']}")
print(f"Count of eigenvalue 0: {result['count_eigenvalue_0']}")
print(f"\nTrace(P) = {result['trace']:.4f}")
print(f"Rank(P) from eigenvalues = {result['rank']}")
print(f"Dimension of Col(A) = {A.shape[1]}")
 
# Demonstrate idempotency with vectors
print("\n=== Demonstrating Idempotency ===")
b = np.array([1.0, 2.0, 3.0])
Pb = P @ b
PPb = P @ Pb
 
print(f"Original b: {b}")
print(f"First projection Pb: {np.round(Pb, 4)}")
print(f"Second projection P(Pb): {np.round(PPb, 4)}")
print(f"||Pb - P(Pb)|| = {np.linalg.norm(Pb - PPb):.2e}")

The Symmetry Property: Pᵀ = P

Orthogonal projection matrices are symmetric: Pᵀ = P. This symmetry is deeply connected to the geometric notion of orthogonality.

Proof that P = A(AᵀA)⁻¹Aᵀ is symmetric:

Pᵀ = [A(AᵀA)⁻¹Aᵀ]ᵀ = (Aᵀ)ᵀ[(AᵀA)⁻¹]ᵀAᵀ = A[(AᵀA)ᵀ]⁻¹Aᵀ = A(AᵀA)⁻¹Aᵀ [since AᵀA is symmetric, so is its inverse] = P

Why symmetry matters:

Orthogonal projection equivalence: A matrix P is an orthogonal projection matrix if and only if P² = P and Pᵀ = P
Real eigenvalues: Symmetric matrices have real eigenvalues (for projections, these are 0 and 1)
Orthogonal eigenvectors: The eigenvectors of P form an orthonormal basis—those with eigenvalue 1 span Col(A), those with eigenvalue 0 span Col(A)⊥
Self-adjoint property: ⟨Pu, v⟩ = ⟨u, Pv⟩ for all u, v (projection respects the inner product structure)

Oblique Projections

Not all projection matrices are symmetric! An oblique projection projects onto a subspace along a direction that isn't perpendicular. Oblique projections satisfy P² = P but not Pᵀ = P. In machine learning and least squares, we almost always use orthogonal projections, which are symmetric.

Characterization theorem:

A matrix P is an orthogonal projection matrix if and only if:

P² = P (idempotent)
Pᵀ = P (symmetric)

Equivalently: P is an orthogonal projection ⟺ P² = P and P is symmetric.

If P satisfies both conditions, it projects onto Col(P) = {v : Pv = v} orthogonally.

Connection to spectral decomposition:

Since P is symmetric, it has a spectral decomposition:

P = QΛQᵀ

where Q is orthogonal and Λ is diagonal with entries 0 or 1. If r entries are 1:

P = q₁q₁ᵀ + q₂q₂ᵀ + ... + qᵣqᵣᵀ

This shows P as the sum of rank-1 projections onto orthogonal directions—exactly the projection onto the r-dimensional subspace spanned by q₁, ..., qᵣ.

Properties of Orthogonal Projection Matrices
Property	Formula	Interpretation
Idempotent	P² = P	Projecting twice equals projecting once
Symmetric	Pᵀ = P	The projection is self-adjoint
Positive semi-definite	xᵀPx ≥ 0	Squared projection length is non-negative
Bounded eigenvalues	0 ≤ λ ≤ 1	Projections shrink vectors (in norm)
Trace equals rank	tr(P) = rank(P)	Sum of eigenvalues equals projection dimension

The Complementary Projection: I - P

If P projects onto a subspace S, then (I - P) projects onto the orthogonal complement S⊥. This is the complementary projection or residual maker matrix.

Key properties of (I - P):

Let Q = I - P (the complementary projection matrix). Then:

Also a projection: Q² = (I - P)² = I - 2P + P² = I - 2P + P = I - P = Q
Also symmetric: Qᵀ = (I - P)ᵀ = I - Pᵀ = I - P = Q
Orthogonal to P: PQ = P(I - P) = P - P² = P - P = 0
Decomposition: P + Q = P + (I - P) = I (they sum to identity)
For any vector: b = Pb + (I - P)b = projection + error

The residual maker interpretation:

In regression, (I - P) extracts the residual:

e = y - ŷ = y - Py = (I - P)y = My

where M = I - P is sometimes called the residual maker or annihilator matrix. It annihilates any component of y that lies in Col(X).

complementary_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import numpy as np
from numpy.linalg import inv
 
def analyze_complementary_projections(A: np.ndarray, b: np.ndarray):
    """
    Analyze the projection matrix P and its complement I - P.
    
    Demonstrates that:
    - P projects onto Col(A)
    - I - P projects onto Col(A)⊥
    - Together they decompose any vector
    """
    m = A.shape[0]
    I = np.eye(m)
    
    # Projection onto Col(A)
    P = A @ inv(A.T @ A) @ A.T
    
    # Complementary projection onto Col(A)⊥
    Q = I - P
    
    # Apply to vector b
    projection = P @ b         # Component in Col(A)
    residual = Q @ b           # Component in Col(A)⊥
    
    print("=== Complementary Projections ===")
    print(f"\nProjection matrix P (onto Col(A)):")
    print(np.round(P, 4))
    print(f"\nComplementary matrix Q = I - P (onto Col(A)⊥):")
    print(np.round(Q, 4))
    
    print(f"\n=== Applied to vector b = {b} ===")
    print(f"Projection Pb (in Col(A)): {np.round(projection, 4)}")
    print(f"Residual (I-P)b (in Col(A)⊥): {np.round(residual, 4)}")
    print(f"Sum Pb + (I-P)b: {np.round(projection + residual, 4)}")
    
    print(f"\n=== Verifying Properties ===")
    
    # Both are idempotent
    print(f"P² = P: {np.allclose(P @ P, P)}")
    print(f"Q² = Q: {np.allclose(Q @ Q, Q)}")
    
    # Both are symmetric
    print(f"Pᵀ = P: {np.allclose(P.T, P)}")
    print(f"Qᵀ = Q: {np.allclose(Q.T, Q)}")
    
    # Orthogonal to each other
    print(f"PQ = 0: {np.allclose(P @ Q, 0)}")
    print(f"QP = 0: {np.allclose(Q @ P, 0)}")
    
    # Sum to identity
    print(f"P + Q = I: {np.allclose(P + Q, I)}")
    
    # Projection and residual are orthogonal
    print(f"\n=== Orthogonality of Components ===")
    print(f"Pb · (I-P)b = {np.dot(projection, residual):.10f} (should be ≈ 0)")
    
    # Pythagorean theorem
    print(f"\n=== Pythagorean Theorem ===")
    print(f"||b||² = {np.linalg.norm(b)**2:.4f}")
    print(f"||Pb||² + ||(I-P)b||² = {np.linalg.norm(projection)**2:.4f} + {np.linalg.norm(residual)**2:.4f}")
    print(f"                      = {np.linalg.norm(projection)**2 + np.linalg.norm(residual)**2:.4f}")
    
    return P, Q, projection, residual
 
# Example
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
b = np.array([1.0, 2.0, 3.0])
 
P, Q, proj, res = analyze_complementary_projections(A, b)

Rank Relationships

If P has rank r (projects onto an r-dimensional subspace), then I - P has rank m - r (projects onto the (m-r)-dimensional orthogonal complement). Together: rank(P) + rank(I-P) = r + (m-r) = m = rank(I). This is consistent with tr(P) + tr(I-P) = tr(I) = m.

Trace, Rank, and Leverage Scores

The trace of a projection matrix equals its rank, which equals the dimension of the subspace it projects onto:

tr(P) = rank(P) = dim(Col(A)) = n (when A is m×n with full column rank)

This elegant fact follows from the eigenvalue structure: P has n eigenvalues equal to 1 and (m-n) eigenvalues equal to 0, so tr(P) = sum of eigenvalues = n.

Leverage scores:

The diagonal entries of the projection matrix P are called leverage scores in statistics. For observation i:

hᵢᵢ = Pᵢᵢ = [projection matrix]ᵢᵢ

Leverage scores measure how influential each data point is on its own fitted value.

Properties of leverage scores:

Bounded: 0 ≤ hᵢᵢ ≤ 1 for all i
Sum to rank: Σᵢ hᵢᵢ = tr(P) = n (number of features)
Self-influence: hᵢᵢ measures how much observation i influences its own prediction ŷᵢ
High leverage points: Points with hᵢᵢ close to 1 have high leverage—they can strongly influence the fitted model

Why 0 ≤ hᵢᵢ ≤ 1:

Since P is idempotent and symmetric:

Let eᵢ be the i-th standard basis vector
hᵢᵢ = eᵢᵀPeᵢ = eᵢᵀP²eᵢ = (Peᵢ)ᵀ(Peᵢ) = ||Peᵢ||² ≥ 0
Also hᵢᵢ = eᵢᵀPeᵢ ≤ ||eᵢ|| · ||Peᵢ|| ≤ ||eᵢ||² = 1 (since ||Pv|| ≤ ||v||)

Interpretation in regression:

In linear regression with ŷ = Py where P = X(XᵀX)⁻¹Xᵀ:

ŷᵢ = Σⱼ Pᵢⱼ yⱼ = hᵢᵢ yᵢ + Σⱼ≠ᵢ hᵢⱼ yⱼ

The leverage hᵢᵢ is the weight of yᵢ in predicting itself. High leverage means that point strongly affects the regression fit.

leverage_scores.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import numpy as np
from numpy.linalg import inv
import matplotlib.pyplot as plt
 
def compute_leverage_scores(X: np.ndarray) -> np.ndarray:
    """
    Compute leverage scores for the design matrix X.
    
    Leverage score for observation i is the i-th diagonal of the hat matrix:
    h_ii = [X(XᵀX)⁻¹Xᵀ]_ii
    
    These measure how influential each observation is on its own fitted value.
    """
    # Compute hat matrix
    H = X @ inv(X.T @ X) @ X.T
    
    # Extract diagonal
    leverage = np.diag(H)
    
    return leverage, H
 
def analyze_leverage(X: np.ndarray, y: np.ndarray):
    """
    Comprehensive analysis of leverage scores in regression context.
    """
    n, p = X.shape
    
    leverage, H = compute_leverage_scores(X)
    
    # Fitted values
    y_hat = H @ y
    
    # Residuals
    residuals = y - y_hat
    
    print("=== Leverage Score Analysis ===")
    print(f"\nDesign matrix X: {n} observations, {p} features")
    print(f"\nLeverage scores (diagonal of H):")
    for i, h in enumerate(leverage):
        print(f"  Observation {i+1}: h_{i+1},{i+1} = {h:.4f}")
    
    print(f"\n=== Properties ===")
    print(f"Sum of leverage scores: {np.sum(leverage):.4f}")
    print(f"Number of features p: {p}")
    print(f"(These should be equal)")
    
    print(f"\nMin leverage: {np.min(leverage):.4f}")
    print(f"Max leverage: {np.max(leverage):.4f}")
    print(f"Mean leverage: {np.mean(leverage):.4f} (= p/n = {p/n:.4f})")
    
    # High leverage points (common rule: h_ii > 2p/n)
    threshold = 2 * p / n
    high_leverage = np.where(leverage > threshold)[0]
    print(f"\nHigh leverage threshold (2p/n): {threshold:.4f}")
    print(f"High leverage points: {high_leverage + 1 if len(high_leverage) > 0 else 'None'}")
    
    print(f"\n=== Influence on Fitted Values ===")
    print(f"The fitted value ŷ_i = Σⱼ H_ij * y_j")
    print(f"\nFor each observation, how much does y_i influence ŷ_i?")
    for i in range(n):
        self_contribution = H[i, i] * y[i]
        others_contribution = y_hat[i] - self_contribution
        print(f"  ŷ_{i+1} = {y_hat[i]:.3f} = {self_contribution:.3f} (self: {H[i,i]*100:.1f}%) + {others_contribution:.3f} (others)")
    
    return leverage, H
 
# Example: Simple linear regression
np.random.seed(42)
n = 10
X = np.column_stack([np.ones(n), np.arange(n)])  # Intercept + linear term
y = 2 + 0.5 * X[:, 1] + np.random.randn(n) * 0.3
 
# Add an outlier with high leverage (extreme x value)
X_with_outlier = np.vstack([X, [1, 20]])  # Point far from others
y_with_outlier = np.append(y, 12)
 
print("Original data:")
leverage, H = analyze_leverage(X, y)
 
print("\n" + "="*60)
print("\nWith high-leverage outlier (x=20):")
leverage_outlier, H_outlier = analyze_leverage(X_with_outlier, y_with_outlier)

High Leverage vs High Influence

High leverage doesn't always mean high influence. An observation can have high leverage (unusual x values) but still fit the model well. Influence also depends on the residual. Cook's distance combines leverage and residuals to measure actual influence on fitted values.

Verifying Projection Matrix Properties

Given a matrix that might be a projection, how do we verify its properties? Here's a comprehensive checklist and implementation:

Complete verification criteria:

Idempotent: P² = P
Symmetric: Pᵀ = P (for orthogonal projections)
Eigenvalues: All eigenvalues are 0 or 1
Trace = Rank: tr(P) equals the number of eigenvalues equal to 1
Fixed points: Pv = v for all v ∈ Col(P)
Annihilation: Pv = 0 for all v ⊥ Col(P)
Norm contraction: ||Pv|| ≤ ||v|| for all v

verify_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
from numpy.linalg import eig, norm, matrix_rank
 
def comprehensive_projection_verification(P: np.ndarray, tol: float = 1e-10) -> dict:
                                            """
    Comprehensive verification that P is an orthogonal projection matrix.
    
    Checks:
    1. Idempotent: P² = P
    2. Symmetric: Pᵀ = P
    3. Eigenvalues: only 0 and 1
    4. Trace = Rank
    5. Positive semi-definite
    6. Norm contraction (for random vectors)
    
    Returns:
        Dictionary with all verification results
    """
    results = {}
    
    m = P.shape[0]
    
    # 1. Idempotent: P² = P
    P_squared = P @ P
    results['idempotent'] = np.allclose(P_squared, P, atol=tol)
    results['idempotent_error'] = norm(P_squared - P)
    
    # 2. Symmetric: Pᵀ = P
    results['symmetric'] = np.allclose(P.T, P, atol=tol)
    results['symmetry_error'] = norm(P.T - P)
    
    # 3. Eigenvalue analysis
    eigenvalues, eigenvectors = eig(P)
    eigenvalues_real = np.real(eigenvalues)
    
    # Check all eigenvalues are 0 or 1
    is_zero_or_one = np.all(
        (np.abs(eigenvalues_real) < tol) | 
        (np.abs(eigenvalues_real - 1) < tol)
    )
    results['eigenvalues_valid'] = is_zero_or_one
    results['eigenvalues'] = np.sort(eigenvalues_real)[::-1]
    
    count_one = np.sum(np.abs(eigenvalues_real - 1) < tol)
    count_zero = np.sum(np.abs(eigenvalues_real) < tol)
    results['eigenvalue_1_count'] = int(count_one)
    results['eigenvalue_0_count'] = int(count_zero)
    
    # 4. Trace = Rank
    trace_P = np.real(np.trace(P))
    rank_P = matrix_rank(P, tol=tol)
    results['trace'] = trace_P
    results['rank'] = rank_P
    results['trace_equals_rank'] = np.abs(trace_P - rank_P) < tol
    
    # 5. Positive semi-definite (for random vectors xᵀPx ≥ 0)
    test_vectors = np.random.randn(m, 100)
    quadratic_forms = np.array([v @ P @ v for v in test_vectors.T])
    results['positive_semidefinite'] = np.all(quadratic_forms >= -tol)
    results['min_quadratic_form'] = np.min(quadratic_forms)
    
    # 6. Norm contraction ||Pv|| ≤ ||v||
    norms_before = norm(test_vectors, axis=0)
    norms_after = norm(P @ test_vectors, axis=0)
    results['norm_contraction'] = np.all(norms_after <= norms_before + tol)
    
    # Overall verdict
    results['is_orthogonal_projection'] = (
        results['idempotent'] and 
        results['symmetric'] and 
        results['eigenvalues_valid']
    )
    
    return results
 
def print_verification(P: np.ndarray, name: str = "P"):
    """Print comprehensive verification results."""
    results = comprehensive_projection_verification(P)
    
    print(f"=== Verification of {name} as Orthogonal Projection Matrix ===")
    print(f"\nMatrix shape: {P.shape}")
    print(f"\n1. IDEMPOTENT (P² = P): {'✓ PASS' if results['idempotent'] else '✗ FAIL'}")
    print(f"   Error ||P² - P|| = {results['idempotent_error']:.2e}")
    
    print(f"\n2. SYMMETRIC (Pᵀ = P): {'✓ PASS' if results['symmetric'] else '✗ FAIL'}")
    print(f"   Error ||Pᵀ - P|| = {results['symmetry_error']:.2e}")
    
    print(f"\n3. EIGENVALUES (only 0 or 1): {'✓ PASS' if results['eigenvalues_valid'] else '✗ FAIL'}")
    print(f"   Eigenvalues: {np.round(results['eigenvalues'], 4)}")
    print(f"   Count of 1's: {results['eigenvalue_1_count']}, Count of 0's: {results['eigenvalue_0_count']}")
    
    print(f"\n4. TRACE = RANK: {'✓ PASS' if results['trace_equals_rank'] else '✗ FAIL'}")
    print(f"   Trace = {results['trace']:.4f}, Rank = {results['rank']}")
    
    print(f"\n5. POSITIVE SEMI-DEFINITE: {'✓ PASS' if results['positive_semidefinite'] else '✗ FAIL'}")
    print(f"   Min quadratic form: {results['min_quadratic_form']:.2e}")
    
    print(f"\n6. NORM CONTRACTION (||Pv|| ≤ ||v||): {'✓ PASS' if results['norm_contraction'] else '✗ FAIL'}")
    
    print(f"\n{'='*50}")
    verdict = "IS" if results['is_orthogonal_projection'] else "is NOT"
    print(f"VERDICT: {name} {verdict} an orthogonal projection matrix")
    
    return results
 
# Test with a valid projection matrix
A = np.array([[1, 0], [1, 1], [0, 1]])
P_valid = A @ np.linalg.inv(A.T @ A) @ A.T
 
print_verification(P_valid, "P (from A(AᵀA)⁻¹Aᵀ)")
 
print("\n" + "="*60 + "\n")
 
# Test with an invalid matrix (not idempotent)
P_invalid = np.array([[0.5, 0.5, 0], [0.5, 0.5, 0], [0, 0, 1]])
print_verification(P_invalid, "P_invalid")

Projection Matrices in Linear Regression

The projection matrix is central to linear regression. Given design matrix X (n × p) and response vector y (n × 1), the linear regression fitted values are:

ŷ = Xβ̂ = X(XᵀX)⁻¹Xᵀy = Hy

where H = X(XᵀX)⁻¹Xᵀ is the hat matrix (it puts the "hat" on y).

Geometric interpretation:

y is the observed response vector in ℝⁿ
Col(X) is the space of all possible predictions (linear combinations of feature columns)
ŷ = Hy is the orthogonal projection of y onto Col(X)
e = y - ŷ = (I - H)y is the residual, orthogonal to Col(X)

The least squares solution minimizes ||y - Xβ||², and the minimum is achieved when Xβ is the closest point in Col(X) to y—which is exactly the projection.

Projection Matrix in Regression Context
Concept	Formula	Interpretation
Hat matrix	H = X(XᵀX)⁻¹Xᵀ	Projects y onto column space of X
Fitted values	ŷ = Hy	Prediction = projection of y
Residuals	e = (I-H)y	Error = orthogonal complement projection
Residual variance	(I-H)σ²	Covariance of residuals (when y ~ N(Xβ, σ²I))
Degrees of freedom	n - tr(H)	= n - p for full rank model

regression_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import numpy as np
from numpy.linalg import inv
 
def regression_as_projection(X: np.ndarray, y: np.ndarray):
    """
    Demonstrate that linear regression is orthogonal projection.
    
    Shows:
    1. ŷ = Hy is the projection of y onto Col(X)
    2. Residuals e are orthogonal to Col(X)
    3. Pythagorean theorem: ||y||² = ||ŷ||² + ||e||²
    4. R² = ||ŷ||²/||y||² (proportion in model space)
    """
    n, p = X.shape
    
    # Hat matrix
    H = X @ inv(X.T @ X) @ X.T
    
    # Fitted values (projection)
    y_hat = H @ y
    
    # Residuals
    e = y - y_hat
    
    # Coefficients (for interpretation)
    beta = inv(X.T @ X) @ X.T @ y
    
    print("=== Linear Regression as Orthogonal Projection ===")
    print(f"\nDesign matrix X: {n} observations, {p} features")
    print(f"Response y: {y}")
    print(f"\nCoefficients β̂: {beta}")
    print(f"Fitted values ŷ = Xβ̂ = Hy: {np.round(y_hat, 4)}")
    print(f"Residuals e = y - ŷ = (I-H)y: {np.round(e, 4)}")
    
    print(f"\n=== Verifying Orthogonality ===")
    print(f"Residuals orthogonal to each column of X:")
    for j in range(p):
        dot_product = np.dot(X[:, j], e)
        print(f"  X[:, {j}] · e = {dot_product:.10f}")
    
    print(f"\n=== Pythagorean Theorem ===")
    print(f"||y||² = {np.dot(y, y):.4f}")
    print(f"||ŷ||² = {np.dot(y_hat, y_hat):.4f}")
    print(f"||e||² = {np.dot(e, e):.4f}")
    print(f"||ŷ||² + ||e||² = {np.dot(y_hat, y_hat) + np.dot(e, e):.4f}")
    
    # R-squared
    y_centered = y - np.mean(y)
    SS_tot = np.dot(y_centered, y_centered)
    SS_res = np.dot(e, e)
    SS_reg = np.dot(y_hat - np.mean(y), y_hat - np.mean(y))
    R_squared = 1 - SS_res / SS_tot
    
    print(f"\n=== R-Squared Interpretation ===")
    print(f"R² = 1 - SS_res/SS_tot = {R_squared:.4f}")
    print(f"Proportion of variance explained by projection onto Col(X)")
    
    print(f"\n=== Degrees of Freedom ===")
    print(f"Model df = tr(H) = {np.trace(H):.1f} = p = {p}")
    print(f"Residual df = tr(I-H) = {np.trace(np.eye(n) - H):.1f} = n - p = {n - p}")
    
    return H, y_hat, e, beta, R_squared
 
# Example: Simple linear regression
np.random.seed(42)
n = 20
x = np.linspace(0, 10, n)
X = np.column_stack([np.ones(n), x])  # Intercept + linear term
y_true = 2 + 3 * x
y = y_true + np.random.randn(n) * 2  # Add noise
 
H, y_hat, e, beta, R2 = regression_as_projection(X, y)
 
print(f"\nTrue parameters: intercept=2, slope=3")
print(f"Estimated: intercept={beta[0]:.4f}, slope={beta[1]:.4f}")

Summary: Projection Matrices

We've developed a complete understanding of projection matrices—the linear operators that perform orthogonal projection. These matrices are fundamental tools in linear regression, principal component analysis, and throughout machine learning.

Key Takeaways

•The projection matrix P = A(AᵀA)⁻¹Aᵀ projects vectors onto Col(A)
•Idempotent (P² = P) — projecting twice equals projecting once
•Symmetric (Pᵀ = P) — orthogonal projections are self-adjoint
•Eigenvalues are 0 or 1 — corresponding to annihilated or fixed directions
•Trace equals rank — tr(P) = dimension of the projection subspace
•Complementary projection (I - P) — projects onto the orthogonal complement
•Leverage scores hᵢᵢ — diagonal entries measure data point influence
•In regression, ŷ = Hy — fitted values are projections of y onto Col(X)

What's next:

In the next page, we'll derive the normal equations from multiple perspectives: as the gradient condition for least squares, as the orthogonality condition encoded algebraically, and as the equation satisfied by the projection. This provides the foundation for understanding how linear regression coefficients are computed.

Page Complete

You now understand projection matrices deeply—their algebraic properties, eigenstructure, and central role in linear regression. You can verify if a matrix is a projection, compute leverage scores, and interpret fitted values geometrically. Next, we formalize the normal equations.

2 / 5

Loading learning content...

Machine LearningLinear Algebra Foundations

Projections and Least Squares

LevelIntermediate

Duration90 mins

TopicLinear Algebra Foundations

2 / 5

Projection Matrices

From Operation to Operator

In the previous page, we learned that orthogonal projection maps a vector b to its closest point in a subspace Col(A):

proj_{Col(A)}(b) = A(AᵀA)⁻¹Aᵀb

P = A(AᵀA)⁻¹Aᵀ

Once P is computed, projecting any vector is a simple matrix-vector multiplication: p = Pb. More importantly, the projection matrix has remarkable algebraic properties that reveal deep structure:

•Idempotent: P² = P — projecting twice is the same as projecting once
•Symmetric: Pᵀ = P — the matrix equals its transpose
•Fixed points: If v ∈ Col(A), then Pv = v — vectors in the subspace are unchanged
•Annihilates orthogonal complement: If v ⊥ Col(A), then Pv = 0

What You Will Learn

Defining the Projection Matrix

Given a matrix A ∈ ℝᵐˣⁿ with full column rank (rank(A) = n), the projection matrix onto Col(A) is:

P = A(AᵀA)⁻¹Aᵀ

This is an m × m matrix that projects any vector b ∈ ℝᵐ onto the column space of A.

Key observations:

Dimensions: A is m×n, (AᵀA)⁻¹ is n×n, so A(AᵀA)⁻¹Aᵀ is m×m
Independence from b: P depends only on A, not on the vector being projected
Full column rank requirement: (AᵀA)⁻¹ exists only when A has full column rank
Computational note: We rarely compute P explicitly—it's expensive (m² storage) and unnecessary when we just need Pb

For an orthonormal basis Q (where QᵀQ = I), the projection matrix simplifies beautifully:

P = Q(QᵀQ)⁻¹Qᵀ = QQᵀ

This is computationally cheaper and more numerically stable.

projection_matrix.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from numpy.linalg import inv, qr, matrix_rank
 
def projection_matrix(A: np.ndarray) -> np.ndarray:
    """
    Compute the projection matrix P = A(AᵀA)⁻¹Aᵀ for projecting onto Col(A).
    
    Parameters:
        A: Matrix with full column rank (m x n where m >= n)
    
    Returns:
        P: The m x m projection matrix onto Col(A)
    
    Note: Computing P explicitly is O(m²n + n³) and requires O(m²) storage.
          In practice, compute Pb directly when possible.
    """
    m, n = A.shape
    
    # Verify full column rank
    if matrix_rank(A) < n:
        raise ValueError("A must have full column rank")
    
    # Compute the projection matrix
    ATA = A.T @ A              # n x n
    ATA_inv = inv(ATA)         # n x n
    P = A @ ATA_inv @ A.T      # m x m
    
    return P
 
def projection_matrix_orthonormal(Q: np.ndarray) -> np.ndarray:
    """
    Compute projection matrix when basis is orthonormal: P = QQᵀ
    
    Much simpler and more stable than the general formula.
    """
    # Verify orthonormality
    n = Q.shape[1]
    if not np.allclose(Q.T @ Q, np.eye(n)):
        raise ValueError("Columns of Q must be orthonormal")
    
    return Q @ Q.T
 
def project_via_matrix(b: np.ndarray, A: np.ndarray) -> dict:
    """
    Project b onto Col(A) and analyze the projection matrix.
    """
    P = projection_matrix(A)
    projection = P @ b
    
    return {
        'projection_matrix': P,
        'projection': projection,
        'error': b - projection
    }
 
# Example: Create a projection matrix for a 2D plane in ℝ³
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
P = projection_matrix(A)
 
print("=== Projection Matrix Construction ===")
print(f"\nMatrix A (defines the subspace):")
print(A)
print(f"\nProjection matrix P = A(AᵀA)⁻¹Aᵀ:")
print(np.round(P, 4))
print(f"\nShape of P: {P.shape}")
 
# Project a vector
b = np.array([1.0, 2.0, 3.0])
result = project_via_matrix(b, A)
 
print(f"\n=== Projecting b = {b} ===")
print(f"Projection Pb: {result['projection']}")
print(f"Error (I-P)b: {result['error']}")
 
# Compare with orthonormal version
Q, R = qr(A)
P_ortho = projection_matrix_orthonormal(Q)
 
print(f"\n=== Orthonormal Basis Version ===")
print(f"Orthonormal Q (from QR):")
print(np.round(Q, 4))
print(f"\nP = QQᵀ:")
print(np.round(P_ortho, 4))
print(f"\nDifference from A(AᵀA)⁻¹Aᵀ: {np.linalg.norm(P - P_ortho):.2e}")

The Idempotent Property: P² = P

Proof that P = A(AᵀA)⁻¹Aᵀ is idempotent:

P² = [A(AᵀA)⁻¹Aᵀ][A(AᵀA)⁻¹Aᵀ] = A(AᵀA)⁻¹AᵀA⁻¹Aᵀ = A(AᵀA)⁻¹[I]Aᵀ = A(AᵀA)⁻¹Aᵀ = P

The key step is recognizing that (AᵀA)⁻¹(AᵀA) = I, the identity matrix.

Geometric interpretation:

Idempotency captures the essence of projection:

Pb is already in Col(A)
Projecting Pb again onto Col(A) gives the same result
Formally: P(Pb) = Pb, which says P² = P when applied to any b

Idempotent ≠ Identity

Eigenvalue characterization of idempotent matrices:

If P is idempotent, its eigenvalues can only be 0 or 1. Here's why:

If Pv = λv for some eigenvector v ≠ 0, then:

P²v = P(Pv) = P(λv) = λPv = λ²v

But since P² = P:

P²v = Pv = λv

Therefore λ²v = λv, which gives (λ² - λ)v = 0.

Since v ≠ 0, we have λ² - λ = 0, so λ(λ - 1) = 0, meaning λ = 0 or λ = 1.

For the projection matrix P onto a rank-r subspace:

r eigenvalues equal to 1 (corresponding to vectors in Col(A), which are unchanged)
m - r eigenvalues equal to 0 (corresponding to vectors in Col(A)⊥, which are annihilated)

idempotent_verification.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
from numpy.linalg import eig, matrix_rank
 
def verify_idempotent(P: np.ndarray, tol: float = 1e-10) -> dict:
    """
    Verify that P is idempotent (P² = P) and analyze its eigenstructure.
    """
    P_squared = P @ P
    
    # Check idempotency
    is_idempotent = np.allclose(P_squared, P, atol=tol)
    
    # Compute eigenvalues
    eigenvalues, eigenvectors = eig(P)
    eigenvalues_real = np.real(eigenvalues)  # Should be real for symmetric matrices
    
    # Count eigenvalues near 0 and 1
    count_zero = np.sum(np.abs(eigenvalues_real) < tol)
    count_one = np.sum(np.abs(eigenvalues_real - 1) < tol)
    
    # Rank equals trace for projection matrices
    rank_from_eigenvalues = int(round(np.sum(eigenvalues_real)))
    trace_P = np.trace(P)
    
    return {
        'is_idempotent': is_idempotent,
        'P_squared_minus_P_norm': np.linalg.norm(P_squared - P),
        'eigenvalues': np.sort(eigenvalues_real)[::-1],
        'count_eigenvalue_1': count_one,
        'count_eigenvalue_0': count_zero,
        'trace': trace_P,
        'rank': rank_from_eigenvalues
    }
 
# Example: Projection matrix for a 2D plane in ℝ³
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
P = A @ np.linalg.inv(A.T @ A) @ A.T
 
print("=== Idempotent Property Verification ===")
print(f"\nProjection matrix P:")
print(np.round(P, 4))
 
result = verify_idempotent(P)
 
print(f"\n=== Results ===")
print(f"Is idempotent (P² = P): {result['is_idempotent']}")
print(f"||P² - P|| = {result['P_squared_minus_P_norm']:.2e}")
print(f"\nEigenvalues: {np.round(result['eigenvalues'], 4)}")
print(f"Count of eigenvalue 1: {result['count_eigenvalue_1']}")
print(f"Count of eigenvalue 0: {result['count_eigenvalue_0']}")
print(f"\nTrace(P) = {result['trace']:.4f}")
print(f"Rank(P) from eigenvalues = {result['rank']}")
print(f"Dimension of Col(A) = {A.shape[1]}")
 
# Demonstrate idempotency with vectors
print("\n=== Demonstrating Idempotency ===")
b = np.array([1.0, 2.0, 3.0])
Pb = P @ b
PPb = P @ Pb
 
print(f"Original b: {b}")
print(f"First projection Pb: {np.round(Pb, 4)}")
print(f"Second projection P(Pb): {np.round(PPb, 4)}")
print(f"||Pb - P(Pb)|| = {np.linalg.norm(Pb - PPb):.2e}")

The Symmetry Property: Pᵀ = P

Orthogonal projection matrices are symmetric: Pᵀ = P. This symmetry is deeply connected to the geometric notion of orthogonality.

Proof that P = A(AᵀA)⁻¹Aᵀ is symmetric:

Pᵀ = [A(AᵀA)⁻¹Aᵀ]ᵀ = (Aᵀ)ᵀ[(AᵀA)⁻¹]ᵀAᵀ = A[(AᵀA)ᵀ]⁻¹Aᵀ = A(AᵀA)⁻¹Aᵀ [since AᵀA is symmetric, so is its inverse] = P

Why symmetry matters:

Orthogonal projection equivalence: A matrix P is an orthogonal projection matrix if and only if P² = P and Pᵀ = P
Real eigenvalues: Symmetric matrices have real eigenvalues (for projections, these are 0 and 1)
Orthogonal eigenvectors: The eigenvectors of P form an orthonormal basis—those with eigenvalue 1 span Col(A), those with eigenvalue 0 span Col(A)⊥
Self-adjoint property: ⟨Pu, v⟩ = ⟨u, Pv⟩ for all u, v (projection respects the inner product structure)

Oblique Projections

Characterization theorem:

A matrix P is an orthogonal projection matrix if and only if:

P² = P (idempotent)
Pᵀ = P (symmetric)

Equivalently: P is an orthogonal projection ⟺ P² = P and P is symmetric.

If P satisfies both conditions, it projects onto Col(P) = {v : Pv = v} orthogonally.

Connection to spectral decomposition:

Since P is symmetric, it has a spectral decomposition:

P = QΛQᵀ

where Q is orthogonal and Λ is diagonal with entries 0 or 1. If r entries are 1:

P = q₁q₁ᵀ + q₂q₂ᵀ + ... + qᵣqᵣᵀ

This shows P as the sum of rank-1 projections onto orthogonal directions—exactly the projection onto the r-dimensional subspace spanned by q₁, ..., qᵣ.

Properties of Orthogonal Projection Matrices
Property	Formula	Interpretation
Idempotent	P² = P	Projecting twice equals projecting once
Symmetric	Pᵀ = P	The projection is self-adjoint
Positive semi-definite	xᵀPx ≥ 0	Squared projection length is non-negative
Bounded eigenvalues	0 ≤ λ ≤ 1	Projections shrink vectors (in norm)
Trace equals rank	tr(P) = rank(P)	Sum of eigenvalues equals projection dimension

The Complementary Projection: I - P

If P projects onto a subspace S, then (I - P) projects onto the orthogonal complement S⊥. This is the complementary projection or residual maker matrix.

Key properties of (I - P):

Let Q = I - P (the complementary projection matrix). Then:

Also a projection: Q² = (I - P)² = I - 2P + P² = I - 2P + P = I - P = Q
Also symmetric: Qᵀ = (I - P)ᵀ = I - Pᵀ = I - P = Q
Orthogonal to P: PQ = P(I - P) = P - P² = P - P = 0
Decomposition: P + Q = P + (I - P) = I (they sum to identity)
For any vector: b = Pb + (I - P)b = projection + error

The residual maker interpretation:

In regression, (I - P) extracts the residual:

e = y - ŷ = y - Py = (I - P)y = My

where M = I - P is sometimes called the residual maker or annihilator matrix. It annihilates any component of y that lies in Col(X).

complementary_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import numpy as np
from numpy.linalg import inv
 
def analyze_complementary_projections(A: np.ndarray, b: np.ndarray):
    """
    Analyze the projection matrix P and its complement I - P.
    
    Demonstrates that:
    - P projects onto Col(A)
    - I - P projects onto Col(A)⊥
    - Together they decompose any vector
    """
    m = A.shape[0]
    I = np.eye(m)
    
    # Projection onto Col(A)
    P = A @ inv(A.T @ A) @ A.T
    
    # Complementary projection onto Col(A)⊥
    Q = I - P
    
    # Apply to vector b
    projection = P @ b         # Component in Col(A)
    residual = Q @ b           # Component in Col(A)⊥
    
    print("=== Complementary Projections ===")
    print(f"\nProjection matrix P (onto Col(A)):")
    print(np.round(P, 4))
    print(f"\nComplementary matrix Q = I - P (onto Col(A)⊥):")
    print(np.round(Q, 4))
    
    print(f"\n=== Applied to vector b = {b} ===")
    print(f"Projection Pb (in Col(A)): {np.round(projection, 4)}")
    print(f"Residual (I-P)b (in Col(A)⊥): {np.round(residual, 4)}")
    print(f"Sum Pb + (I-P)b: {np.round(projection + residual, 4)}")
    
    print(f"\n=== Verifying Properties ===")
    
    # Both are idempotent
    print(f"P² = P: {np.allclose(P @ P, P)}")
    print(f"Q² = Q: {np.allclose(Q @ Q, Q)}")
    
    # Both are symmetric
    print(f"Pᵀ = P: {np.allclose(P.T, P)}")
    print(f"Qᵀ = Q: {np.allclose(Q.T, Q)}")
    
    # Orthogonal to each other
    print(f"PQ = 0: {np.allclose(P @ Q, 0)}")
    print(f"QP = 0: {np.allclose(Q @ P, 0)}")
    
    # Sum to identity
    print(f"P + Q = I: {np.allclose(P + Q, I)}")
    
    # Projection and residual are orthogonal
    print(f"\n=== Orthogonality of Components ===")
    print(f"Pb · (I-P)b = {np.dot(projection, residual):.10f} (should be ≈ 0)")
    
    # Pythagorean theorem
    print(f"\n=== Pythagorean Theorem ===")
    print(f"||b||² = {np.linalg.norm(b)**2:.4f}")
    print(f"||Pb||² + ||(I-P)b||² = {np.linalg.norm(projection)**2:.4f} + {np.linalg.norm(residual)**2:.4f}")
    print(f"                      = {np.linalg.norm(projection)**2 + np.linalg.norm(residual)**2:.4f}")
    
    return P, Q, projection, residual
 
# Example
A = np.array([
    [1.0, 0.0],
    [1.0, 1.0],
    [0.0, 1.0]
])
 
b = np.array([1.0, 2.0, 3.0])
 
P, Q, proj, res = analyze_complementary_projections(A, b)

Rank Relationships

Trace, Rank, and Leverage Scores

The trace of a projection matrix equals its rank, which equals the dimension of the subspace it projects onto:

tr(P) = rank(P) = dim(Col(A)) = n (when A is m×n with full column rank)

This elegant fact follows from the eigenvalue structure: P has n eigenvalues equal to 1 and (m-n) eigenvalues equal to 0, so tr(P) = sum of eigenvalues = n.

Leverage scores:

The diagonal entries of the projection matrix P are called leverage scores in statistics. For observation i:

hᵢᵢ = Pᵢᵢ = [projection matrix]ᵢᵢ

Leverage scores measure how influential each data point is on its own fitted value.

Properties of leverage scores:

Bounded: 0 ≤ hᵢᵢ ≤ 1 for all i
Sum to rank: Σᵢ hᵢᵢ = tr(P) = n (number of features)
Self-influence: hᵢᵢ measures how much observation i influences its own prediction ŷᵢ
High leverage points: Points with hᵢᵢ close to 1 have high leverage—they can strongly influence the fitted model

Why 0 ≤ hᵢᵢ ≤ 1:

Since P is idempotent and symmetric:

Let eᵢ be the i-th standard basis vector
hᵢᵢ = eᵢᵀPeᵢ = eᵢᵀP²eᵢ = (Peᵢ)ᵀ(Peᵢ) = ||Peᵢ||² ≥ 0
Also hᵢᵢ = eᵢᵀPeᵢ ≤ ||eᵢ|| · ||Peᵢ|| ≤ ||eᵢ||² = 1 (since ||Pv|| ≤ ||v||)

Interpretation in regression:

In linear regression with ŷ = Py where P = X(XᵀX)⁻¹Xᵀ:

ŷᵢ = Σⱼ Pᵢⱼ yⱼ = hᵢᵢ yᵢ + Σⱼ≠ᵢ hᵢⱼ yⱼ

The leverage hᵢᵢ is the weight of yᵢ in predicting itself. High leverage means that point strongly affects the regression fit.

leverage_scores.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import numpy as np
from numpy.linalg import inv
import matplotlib.pyplot as plt
 
def compute_leverage_scores(X: np.ndarray) -> np.ndarray:
    """
    Compute leverage scores for the design matrix X.
    
    Leverage score for observation i is the i-th diagonal of the hat matrix:
    h_ii = [X(XᵀX)⁻¹Xᵀ]_ii
    
    These measure how influential each observation is on its own fitted value.
    """
    # Compute hat matrix
    H = X @ inv(X.T @ X) @ X.T
    
    # Extract diagonal
    leverage = np.diag(H)
    
    return leverage, H
 
def analyze_leverage(X: np.ndarray, y: np.ndarray):
    """
    Comprehensive analysis of leverage scores in regression context.
    """
    n, p = X.shape
    
    leverage, H = compute_leverage_scores(X)
    
    # Fitted values
    y_hat = H @ y
    
    # Residuals
    residuals = y - y_hat
    
    print("=== Leverage Score Analysis ===")
    print(f"\nDesign matrix X: {n} observations, {p} features")
    print(f"\nLeverage scores (diagonal of H):")
    for i, h in enumerate(leverage):
        print(f"  Observation {i+1}: h_{i+1},{i+1} = {h:.4f}")
    
    print(f"\n=== Properties ===")
    print(f"Sum of leverage scores: {np.sum(leverage):.4f}")
    print(f"Number of features p: {p}")
    print(f"(These should be equal)")
    
    print(f"\nMin leverage: {np.min(leverage):.4f}")
    print(f"Max leverage: {np.max(leverage):.4f}")
    print(f"Mean leverage: {np.mean(leverage):.4f} (= p/n = {p/n:.4f})")
    
    # High leverage points (common rule: h_ii > 2p/n)
    threshold = 2 * p / n
    high_leverage = np.where(leverage > threshold)[0]
    print(f"\nHigh leverage threshold (2p/n): {threshold:.4f}")
    print(f"High leverage points: {high_leverage + 1 if len(high_leverage) > 0 else 'None'}")
    
    print(f"\n=== Influence on Fitted Values ===")
    print(f"The fitted value ŷ_i = Σⱼ H_ij * y_j")
    print(f"\nFor each observation, how much does y_i influence ŷ_i?")
    for i in range(n):
        self_contribution = H[i, i] * y[i]
        others_contribution = y_hat[i] - self_contribution
        print(f"  ŷ_{i+1} = {y_hat[i]:.3f} = {self_contribution:.3f} (self: {H[i,i]*100:.1f}%) + {others_contribution:.3f} (others)")
    
    return leverage, H
 
# Example: Simple linear regression
np.random.seed(42)
n = 10
X = np.column_stack([np.ones(n), np.arange(n)])  # Intercept + linear term
y = 2 + 0.5 * X[:, 1] + np.random.randn(n) * 0.3
 
# Add an outlier with high leverage (extreme x value)
X_with_outlier = np.vstack([X, [1, 20]])  # Point far from others
y_with_outlier = np.append(y, 12)
 
print("Original data:")
leverage, H = analyze_leverage(X, y)
 
print("\n" + "="*60)
print("\nWith high-leverage outlier (x=20):")
leverage_outlier, H_outlier = analyze_leverage(X_with_outlier, y_with_outlier)

High Leverage vs High Influence

Verifying Projection Matrix Properties

Given a matrix that might be a projection, how do we verify its properties? Here's a comprehensive checklist and implementation:

Complete verification criteria:

Idempotent: P² = P
Symmetric: Pᵀ = P (for orthogonal projections)
Eigenvalues: All eigenvalues are 0 or 1
Trace = Rank: tr(P) equals the number of eigenvalues equal to 1
Fixed points: Pv = v for all v ∈ Col(P)
Annihilation: Pv = 0 for all v ⊥ Col(P)
Norm contraction: ||Pv|| ≤ ||v|| for all v

verify_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
from numpy.linalg import eig, norm, matrix_rank
 
def comprehensive_projection_verification(P: np.ndarray, tol: float = 1e-10) -> dict:
                                            """
    Comprehensive verification that P is an orthogonal projection matrix.
    
    Checks:
    1. Idempotent: P² = P
    2. Symmetric: Pᵀ = P
    3. Eigenvalues: only 0 and 1
    4. Trace = Rank
    5. Positive semi-definite
    6. Norm contraction (for random vectors)
    
    Returns:
        Dictionary with all verification results
    """
    results = {}
    
    m = P.shape[0]
    
    # 1. Idempotent: P² = P
    P_squared = P @ P
    results['idempotent'] = np.allclose(P_squared, P, atol=tol)
    results['idempotent_error'] = norm(P_squared - P)
    
    # 2. Symmetric: Pᵀ = P
    results['symmetric'] = np.allclose(P.T, P, atol=tol)
    results['symmetry_error'] = norm(P.T - P)
    
    # 3. Eigenvalue analysis
    eigenvalues, eigenvectors = eig(P)
    eigenvalues_real = np.real(eigenvalues)
    
    # Check all eigenvalues are 0 or 1
    is_zero_or_one = np.all(
        (np.abs(eigenvalues_real) < tol) | 
        (np.abs(eigenvalues_real - 1) < tol)
    )
    results['eigenvalues_valid'] = is_zero_or_one
    results['eigenvalues'] = np.sort(eigenvalues_real)[::-1]
    
    count_one = np.sum(np.abs(eigenvalues_real - 1) < tol)
    count_zero = np.sum(np.abs(eigenvalues_real) < tol)
    results['eigenvalue_1_count'] = int(count_one)
    results['eigenvalue_0_count'] = int(count_zero)
    
    # 4. Trace = Rank
    trace_P = np.real(np.trace(P))
    rank_P = matrix_rank(P, tol=tol)
    results['trace'] = trace_P
    results['rank'] = rank_P
    results['trace_equals_rank'] = np.abs(trace_P - rank_P) < tol
    
    # 5. Positive semi-definite (for random vectors xᵀPx ≥ 0)
    test_vectors = np.random.randn(m, 100)
    quadratic_forms = np.array([v @ P @ v for v in test_vectors.T])
    results['positive_semidefinite'] = np.all(quadratic_forms >= -tol)
    results['min_quadratic_form'] = np.min(quadratic_forms)
    
    # 6. Norm contraction ||Pv|| ≤ ||v||
    norms_before = norm(test_vectors, axis=0)
    norms_after = norm(P @ test_vectors, axis=0)
    results['norm_contraction'] = np.all(norms_after <= norms_before + tol)
    
    # Overall verdict
    results['is_orthogonal_projection'] = (
        results['idempotent'] and 
        results['symmetric'] and 
        results['eigenvalues_valid']
    )
    
    return results
 
def print_verification(P: np.ndarray, name: str = "P"):
    """Print comprehensive verification results."""
    results = comprehensive_projection_verification(P)
    
    print(f"=== Verification of {name} as Orthogonal Projection Matrix ===")
    print(f"\nMatrix shape: {P.shape}")
    print(f"\n1. IDEMPOTENT (P² = P): {'✓ PASS' if results['idempotent'] else '✗ FAIL'}")
    print(f"   Error ||P² - P|| = {results['idempotent_error']:.2e}")
    
    print(f"\n2. SYMMETRIC (Pᵀ = P): {'✓ PASS' if results['symmetric'] else '✗ FAIL'}")
    print(f"   Error ||Pᵀ - P|| = {results['symmetry_error']:.2e}")
    
    print(f"\n3. EIGENVALUES (only 0 or 1): {'✓ PASS' if results['eigenvalues_valid'] else '✗ FAIL'}")
    print(f"   Eigenvalues: {np.round(results['eigenvalues'], 4)}")
    print(f"   Count of 1's: {results['eigenvalue_1_count']}, Count of 0's: {results['eigenvalue_0_count']}")
    
    print(f"\n4. TRACE = RANK: {'✓ PASS' if results['trace_equals_rank'] else '✗ FAIL'}")
    print(f"   Trace = {results['trace']:.4f}, Rank = {results['rank']}")
    
    print(f"\n5. POSITIVE SEMI-DEFINITE: {'✓ PASS' if results['positive_semidefinite'] else '✗ FAIL'}")
    print(f"   Min quadratic form: {results['min_quadratic_form']:.2e}")
    
    print(f"\n6. NORM CONTRACTION (||Pv|| ≤ ||v||): {'✓ PASS' if results['norm_contraction'] else '✗ FAIL'}")
    
    print(f"\n{'='*50}")
    verdict = "IS" if results['is_orthogonal_projection'] else "is NOT"
    print(f"VERDICT: {name} {verdict} an orthogonal projection matrix")
    
    return results
 
# Test with a valid projection matrix
A = np.array([[1, 0], [1, 1], [0, 1]])
P_valid = A @ np.linalg.inv(A.T @ A) @ A.T
 
print_verification(P_valid, "P (from A(AᵀA)⁻¹Aᵀ)")
 
print("\n" + "="*60 + "\n")
 
# Test with an invalid matrix (not idempotent)
P_invalid = np.array([[0.5, 0.5, 0], [0.5, 0.5, 0], [0, 0, 1]])
print_verification(P_invalid, "P_invalid")

Projection Matrices in Linear Regression

The projection matrix is central to linear regression. Given design matrix X (n × p) and response vector y (n × 1), the linear regression fitted values are:

ŷ = Xβ̂ = X(XᵀX)⁻¹Xᵀy = Hy

where H = X(XᵀX)⁻¹Xᵀ is the hat matrix (it puts the "hat" on y).

Geometric interpretation:

y is the observed response vector in ℝⁿ
Col(X) is the space of all possible predictions (linear combinations of feature columns)
ŷ = Hy is the orthogonal projection of y onto Col(X)
e = y - ŷ = (I - H)y is the residual, orthogonal to Col(X)

The least squares solution minimizes ||y - Xβ||², and the minimum is achieved when Xβ is the closest point in Col(X) to y—which is exactly the projection.

Projection Matrix in Regression Context
Concept	Formula	Interpretation
Hat matrix	H = X(XᵀX)⁻¹Xᵀ	Projects y onto column space of X
Fitted values	ŷ = Hy	Prediction = projection of y
Residuals	e = (I-H)y	Error = orthogonal complement projection
Residual variance	(I-H)σ²	Covariance of residuals (when y ~ N(Xβ, σ²I))
Degrees of freedom	n - tr(H)	= n - p for full rank model

regression_projection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import numpy as np
from numpy.linalg import inv
 
def regression_as_projection(X: np.ndarray, y: np.ndarray):
    """
    Demonstrate that linear regression is orthogonal projection.
    
    Shows:
    1. ŷ = Hy is the projection of y onto Col(X)
    2. Residuals e are orthogonal to Col(X)
    3. Pythagorean theorem: ||y||² = ||ŷ||² + ||e||²
    4. R² = ||ŷ||²/||y||² (proportion in model space)
    """
    n, p = X.shape
    
    # Hat matrix
    H = X @ inv(X.T @ X) @ X.T
    
    # Fitted values (projection)
    y_hat = H @ y
    
    # Residuals
    e = y - y_hat
    
    # Coefficients (for interpretation)
    beta = inv(X.T @ X) @ X.T @ y
    
    print("=== Linear Regression as Orthogonal Projection ===")
    print(f"\nDesign matrix X: {n} observations, {p} features")
    print(f"Response y: {y}")
    print(f"\nCoefficients β̂: {beta}")
    print(f"Fitted values ŷ = Xβ̂ = Hy: {np.round(y_hat, 4)}")
    print(f"Residuals e = y - ŷ = (I-H)y: {np.round(e, 4)}")
    
    print(f"\n=== Verifying Orthogonality ===")
    print(f"Residuals orthogonal to each column of X:")
    for j in range(p):
        dot_product = np.dot(X[:, j], e)
        print(f"  X[:, {j}] · e = {dot_product:.10f}")
    
    print(f"\n=== Pythagorean Theorem ===")
    print(f"||y||² = {np.dot(y, y):.4f}")
    print(f"||ŷ||² = {np.dot(y_hat, y_hat):.4f}")
    print(f"||e||² = {np.dot(e, e):.4f}")
    print(f"||ŷ||² + ||e||² = {np.dot(y_hat, y_hat) + np.dot(e, e):.4f}")
    
    # R-squared
    y_centered = y - np.mean(y)
    SS_tot = np.dot(y_centered, y_centered)
    SS_res = np.dot(e, e)
    SS_reg = np.dot(y_hat - np.mean(y), y_hat - np.mean(y))
    R_squared = 1 - SS_res / SS_tot
    
    print(f"\n=== R-Squared Interpretation ===")
    print(f"R² = 1 - SS_res/SS_tot = {R_squared:.4f}")
    print(f"Proportion of variance explained by projection onto Col(X)")
    
    print(f"\n=== Degrees of Freedom ===")
    print(f"Model df = tr(H) = {np.trace(H):.1f} = p = {p}")
    print(f"Residual df = tr(I-H) = {np.trace(np.eye(n) - H):.1f} = n - p = {n - p}")
    
    return H, y_hat, e, beta, R_squared
 
# Example: Simple linear regression
np.random.seed(42)
n = 20
x = np.linspace(0, 10, n)
X = np.column_stack([np.ones(n), x])  # Intercept + linear term
y_true = 2 + 3 * x
y = y_true + np.random.randn(n) * 2  # Add noise
 
H, y_hat, e, beta, R2 = regression_as_projection(X, y)
 
print(f"\nTrue parameters: intercept=2, slope=3")
print(f"Estimated: intercept={beta[0]:.4f}, slope={beta[1]:.4f}")

Summary: Projection Matrices

Key Takeaways

•The projection matrix P = A(AᵀA)⁻¹Aᵀ projects vectors onto Col(A)
•Idempotent (P² = P) — projecting twice equals projecting once
•Symmetric (Pᵀ = P) — orthogonal projections are self-adjoint
•Eigenvalues are 0 or 1 — corresponding to annihilated or fixed directions
•Trace equals rank — tr(P) = dimension of the projection subspace
•Complementary projection (I - P) — projects onto the orthogonal complement
•Leverage scores hᵢᵢ — diagonal entries measure data point influence
•In regression, ŷ = Hy — fitted values are projections of y onto Col(X)

What's next:

Page Complete

2 / 5