Loading learning content...
In the previous page, we learned that orthogonal projection maps a vector b to its closest point in a subspace Col(A):
proj_{Col(A)}(b) = A(AᵀA)⁻¹Aᵀb
Notice that this is a linear function of b—the expression A(AᵀA)⁻¹Aᵀ is a matrix that can be computed independently of b. This matrix is called the projection matrix (or hat matrix in statistics, because it puts the "hat" on y to get ŷ).
P = A(AᵀA)⁻¹Aᵀ
Once P is computed, projecting any vector is a simple matrix-vector multiplication: p = Pb. More importantly, the projection matrix has remarkable algebraic properties that reveal deep structure:
By the end of this page, you will understand projection matrices deeply—their defining properties, how to construct and verify them, their geometric interpretation, and their role in linear regression. You'll also learn about the complementary projection (I - P) and leverage scores, which measure how influential each data point is.
Given a matrix A ∈ ℝᵐˣⁿ with full column rank (rank(A) = n), the projection matrix onto Col(A) is:
P = A(AᵀA)⁻¹Aᵀ
This is an m × m matrix that projects any vector b ∈ ℝᵐ onto the column space of A.
Key observations:
For an orthonormal basis Q (where QᵀQ = I), the projection matrix simplifies beautifully:
P = Q(QᵀQ)⁻¹Qᵀ = QQᵀ
This is computationally cheaper and more numerically stable.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
import numpy as npfrom numpy.linalg import inv, qr, matrix_rank def projection_matrix(A: np.ndarray) -> np.ndarray: """ Compute the projection matrix P = A(AᵀA)⁻¹Aᵀ for projecting onto Col(A). Parameters: A: Matrix with full column rank (m x n where m >= n) Returns: P: The m x m projection matrix onto Col(A) Note: Computing P explicitly is O(m²n + n³) and requires O(m²) storage. In practice, compute Pb directly when possible. """ m, n = A.shape # Verify full column rank if matrix_rank(A) < n: raise ValueError("A must have full column rank") # Compute the projection matrix ATA = A.T @ A # n x n ATA_inv = inv(ATA) # n x n P = A @ ATA_inv @ A.T # m x m return P def projection_matrix_orthonormal(Q: np.ndarray) -> np.ndarray: """ Compute projection matrix when basis is orthonormal: P = QQᵀ Much simpler and more stable than the general formula. """ # Verify orthonormality n = Q.shape[1] if not np.allclose(Q.T @ Q, np.eye(n)): raise ValueError("Columns of Q must be orthonormal") return Q @ Q.T def project_via_matrix(b: np.ndarray, A: np.ndarray) -> dict: """ Project b onto Col(A) and analyze the projection matrix. """ P = projection_matrix(A) projection = P @ b return { 'projection_matrix': P, 'projection': projection, 'error': b - projection } # Example: Create a projection matrix for a 2D plane in ℝ³A = np.array([ [1.0, 0.0], [1.0, 1.0], [0.0, 1.0]]) P = projection_matrix(A) print("=== Projection Matrix Construction ===")print(f"\nMatrix A (defines the subspace):")print(A)print(f"\nProjection matrix P = A(AᵀA)⁻¹Aᵀ:")print(np.round(P, 4))print(f"\nShape of P: {P.shape}") # Project a vectorb = np.array([1.0, 2.0, 3.0])result = project_via_matrix(b, A) print(f"\n=== Projecting b = {b} ===")print(f"Projection Pb: {result['projection']}")print(f"Error (I-P)b: {result['error']}") # Compare with orthonormal versionQ, R = qr(A)P_ortho = projection_matrix_orthonormal(Q) print(f"\n=== Orthonormal Basis Version ===")print(f"Orthonormal Q (from QR):")print(np.round(Q, 4))print(f"\nP = QQᵀ:")print(np.round(P_ortho, 4))print(f"\nDifference from A(AᵀA)⁻¹Aᵀ: {np.linalg.norm(P - P_ortho):.2e}")A matrix P is idempotent if P² = P, meaning applying it twice is the same as applying it once. This is the algebraic expression of a fundamental geometric fact: once a vector is in the subspace, projecting again doesn't move it.
Proof that P = A(AᵀA)⁻¹Aᵀ is idempotent:
P² = [A(AᵀA)⁻¹Aᵀ][A(AᵀA)⁻¹Aᵀ] = A(AᵀA)⁻¹AᵀA⁻¹Aᵀ = A(AᵀA)⁻¹[I]Aᵀ = A(AᵀA)⁻¹Aᵀ = P
The key step is recognizing that (AᵀA)⁻¹(AᵀA) = I, the identity matrix.
Geometric interpretation:
Idempotency captures the essence of projection:
Don't confuse idempotent with the identity matrix. The identity I satisfies I² = I, but it does nothing to vectors. A non-trivial projection P projects onto a proper subspace—it actually moves points. But once projected, they're 'stuck' in the subspace.
Eigenvalue characterization of idempotent matrices:
If P is idempotent, its eigenvalues can only be 0 or 1. Here's why:
If Pv = λv for some eigenvector v ≠ 0, then:
P²v = P(Pv) = P(λv) = λPv = λ²v
But since P² = P:
P²v = Pv = λv
Therefore λ²v = λv, which gives (λ² - λ)v = 0.
Since v ≠ 0, we have λ² - λ = 0, so λ(λ - 1) = 0, meaning λ = 0 or λ = 1.
For the projection matrix P onto a rank-r subspace:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
import numpy as npfrom numpy.linalg import eig, matrix_rank def verify_idempotent(P: np.ndarray, tol: float = 1e-10) -> dict: """ Verify that P is idempotent (P² = P) and analyze its eigenstructure. """ P_squared = P @ P # Check idempotency is_idempotent = np.allclose(P_squared, P, atol=tol) # Compute eigenvalues eigenvalues, eigenvectors = eig(P) eigenvalues_real = np.real(eigenvalues) # Should be real for symmetric matrices # Count eigenvalues near 0 and 1 count_zero = np.sum(np.abs(eigenvalues_real) < tol) count_one = np.sum(np.abs(eigenvalues_real - 1) < tol) # Rank equals trace for projection matrices rank_from_eigenvalues = int(round(np.sum(eigenvalues_real))) trace_P = np.trace(P) return { 'is_idempotent': is_idempotent, 'P_squared_minus_P_norm': np.linalg.norm(P_squared - P), 'eigenvalues': np.sort(eigenvalues_real)[::-1], 'count_eigenvalue_1': count_one, 'count_eigenvalue_0': count_zero, 'trace': trace_P, 'rank': rank_from_eigenvalues } # Example: Projection matrix for a 2D plane in ℝ³A = np.array([ [1.0, 0.0], [1.0, 1.0], [0.0, 1.0]]) P = A @ np.linalg.inv(A.T @ A) @ A.T print("=== Idempotent Property Verification ===")print(f"\nProjection matrix P:")print(np.round(P, 4)) result = verify_idempotent(P) print(f"\n=== Results ===")print(f"Is idempotent (P² = P): {result['is_idempotent']}")print(f"||P² - P|| = {result['P_squared_minus_P_norm']:.2e}")print(f"\nEigenvalues: {np.round(result['eigenvalues'], 4)}")print(f"Count of eigenvalue 1: {result['count_eigenvalue_1']}")print(f"Count of eigenvalue 0: {result['count_eigenvalue_0']}")print(f"\nTrace(P) = {result['trace']:.4f}")print(f"Rank(P) from eigenvalues = {result['rank']}")print(f"Dimension of Col(A) = {A.shape[1]}") # Demonstrate idempotency with vectorsprint("\n=== Demonstrating Idempotency ===")b = np.array([1.0, 2.0, 3.0])Pb = P @ bPPb = P @ Pb print(f"Original b: {b}")print(f"First projection Pb: {np.round(Pb, 4)}")print(f"Second projection P(Pb): {np.round(PPb, 4)}")print(f"||Pb - P(Pb)|| = {np.linalg.norm(Pb - PPb):.2e}")Orthogonal projection matrices are symmetric: Pᵀ = P. This symmetry is deeply connected to the geometric notion of orthogonality.
Proof that P = A(AᵀA)⁻¹Aᵀ is symmetric:
Pᵀ = [A(AᵀA)⁻¹Aᵀ]ᵀ = (Aᵀ)ᵀ[(AᵀA)⁻¹]ᵀAᵀ = A[(AᵀA)ᵀ]⁻¹Aᵀ = A(AᵀA)⁻¹Aᵀ [since AᵀA is symmetric, so is its inverse] = P
Why symmetry matters:
Orthogonal projection equivalence: A matrix P is an orthogonal projection matrix if and only if P² = P and Pᵀ = P
Real eigenvalues: Symmetric matrices have real eigenvalues (for projections, these are 0 and 1)
Orthogonal eigenvectors: The eigenvectors of P form an orthonormal basis—those with eigenvalue 1 span Col(A), those with eigenvalue 0 span Col(A)⊥
Self-adjoint property: ⟨Pu, v⟩ = ⟨u, Pv⟩ for all u, v (projection respects the inner product structure)
Not all projection matrices are symmetric! An oblique projection projects onto a subspace along a direction that isn't perpendicular. Oblique projections satisfy P² = P but not Pᵀ = P. In machine learning and least squares, we almost always use orthogonal projections, which are symmetric.
Characterization theorem:
A matrix P is an orthogonal projection matrix if and only if:
Equivalently: P is an orthogonal projection ⟺ P² = P and P is symmetric.
If P satisfies both conditions, it projects onto Col(P) = {v : Pv = v} orthogonally.
Connection to spectral decomposition:
Since P is symmetric, it has a spectral decomposition:
P = QΛQᵀ
where Q is orthogonal and Λ is diagonal with entries 0 or 1. If r entries are 1:
P = q₁q₁ᵀ + q₂q₂ᵀ + ... + qᵣqᵣᵀ
This shows P as the sum of rank-1 projections onto orthogonal directions—exactly the projection onto the r-dimensional subspace spanned by q₁, ..., qᵣ.
| Property | Formula | Interpretation |
|---|---|---|
| Idempotent | P² = P | Projecting twice equals projecting once |
| Symmetric | Pᵀ = P | The projection is self-adjoint |
| Positive semi-definite | xᵀPx ≥ 0 | Squared projection length is non-negative |
| Bounded eigenvalues | 0 ≤ λ ≤ 1 | Projections shrink vectors (in norm) |
| Trace equals rank | tr(P) = rank(P) | Sum of eigenvalues equals projection dimension |
If P projects onto a subspace S, then (I - P) projects onto the orthogonal complement S⊥. This is the complementary projection or residual maker matrix.
Key properties of (I - P):
Let Q = I - P (the complementary projection matrix). Then:
The residual maker interpretation:
In regression, (I - P) extracts the residual:
e = y - ŷ = y - Py = (I - P)y = My
where M = I - P is sometimes called the residual maker or annihilator matrix. It annihilates any component of y that lies in Col(X).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
import numpy as npfrom numpy.linalg import inv def analyze_complementary_projections(A: np.ndarray, b: np.ndarray): """ Analyze the projection matrix P and its complement I - P. Demonstrates that: - P projects onto Col(A) - I - P projects onto Col(A)⊥ - Together they decompose any vector """ m = A.shape[0] I = np.eye(m) # Projection onto Col(A) P = A @ inv(A.T @ A) @ A.T # Complementary projection onto Col(A)⊥ Q = I - P # Apply to vector b projection = P @ b # Component in Col(A) residual = Q @ b # Component in Col(A)⊥ print("=== Complementary Projections ===") print(f"\nProjection matrix P (onto Col(A)):") print(np.round(P, 4)) print(f"\nComplementary matrix Q = I - P (onto Col(A)⊥):") print(np.round(Q, 4)) print(f"\n=== Applied to vector b = {b} ===") print(f"Projection Pb (in Col(A)): {np.round(projection, 4)}") print(f"Residual (I-P)b (in Col(A)⊥): {np.round(residual, 4)}") print(f"Sum Pb + (I-P)b: {np.round(projection + residual, 4)}") print(f"\n=== Verifying Properties ===") # Both are idempotent print(f"P² = P: {np.allclose(P @ P, P)}") print(f"Q² = Q: {np.allclose(Q @ Q, Q)}") # Both are symmetric print(f"Pᵀ = P: {np.allclose(P.T, P)}") print(f"Qᵀ = Q: {np.allclose(Q.T, Q)}") # Orthogonal to each other print(f"PQ = 0: {np.allclose(P @ Q, 0)}") print(f"QP = 0: {np.allclose(Q @ P, 0)}") # Sum to identity print(f"P + Q = I: {np.allclose(P + Q, I)}") # Projection and residual are orthogonal print(f"\n=== Orthogonality of Components ===") print(f"Pb · (I-P)b = {np.dot(projection, residual):.10f} (should be ≈ 0)") # Pythagorean theorem print(f"\n=== Pythagorean Theorem ===") print(f"||b||² = {np.linalg.norm(b)**2:.4f}") print(f"||Pb||² + ||(I-P)b||² = {np.linalg.norm(projection)**2:.4f} + {np.linalg.norm(residual)**2:.4f}") print(f" = {np.linalg.norm(projection)**2 + np.linalg.norm(residual)**2:.4f}") return P, Q, projection, residual # ExampleA = np.array([ [1.0, 0.0], [1.0, 1.0], [0.0, 1.0]]) b = np.array([1.0, 2.0, 3.0]) P, Q, proj, res = analyze_complementary_projections(A, b)If P has rank r (projects onto an r-dimensional subspace), then I - P has rank m - r (projects onto the (m-r)-dimensional orthogonal complement). Together: rank(P) + rank(I-P) = r + (m-r) = m = rank(I). This is consistent with tr(P) + tr(I-P) = tr(I) = m.
The trace of a projection matrix equals its rank, which equals the dimension of the subspace it projects onto:
tr(P) = rank(P) = dim(Col(A)) = n (when A is m×n with full column rank)
This elegant fact follows from the eigenvalue structure: P has n eigenvalues equal to 1 and (m-n) eigenvalues equal to 0, so tr(P) = sum of eigenvalues = n.
Leverage scores:
The diagonal entries of the projection matrix P are called leverage scores in statistics. For observation i:
hᵢᵢ = Pᵢᵢ = [projection matrix]ᵢᵢ
Leverage scores measure how influential each data point is on its own fitted value.
Properties of leverage scores:
Bounded: 0 ≤ hᵢᵢ ≤ 1 for all i
Sum to rank: Σᵢ hᵢᵢ = tr(P) = n (number of features)
Self-influence: hᵢᵢ measures how much observation i influences its own prediction ŷᵢ
High leverage points: Points with hᵢᵢ close to 1 have high leverage—they can strongly influence the fitted model
Why 0 ≤ hᵢᵢ ≤ 1:
Since P is idempotent and symmetric:
Interpretation in regression:
In linear regression with ŷ = Py where P = X(XᵀX)⁻¹Xᵀ:
ŷᵢ = Σⱼ Pᵢⱼ yⱼ = hᵢᵢ yᵢ + Σⱼ≠ᵢ hᵢⱼ yⱼ
The leverage hᵢᵢ is the weight of yᵢ in predicting itself. High leverage means that point strongly affects the regression fit.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
import numpy as npfrom numpy.linalg import invimport matplotlib.pyplot as plt def compute_leverage_scores(X: np.ndarray) -> np.ndarray: """ Compute leverage scores for the design matrix X. Leverage score for observation i is the i-th diagonal of the hat matrix: h_ii = [X(XᵀX)⁻¹Xᵀ]_ii These measure how influential each observation is on its own fitted value. """ # Compute hat matrix H = X @ inv(X.T @ X) @ X.T # Extract diagonal leverage = np.diag(H) return leverage, H def analyze_leverage(X: np.ndarray, y: np.ndarray): """ Comprehensive analysis of leverage scores in regression context. """ n, p = X.shape leverage, H = compute_leverage_scores(X) # Fitted values y_hat = H @ y # Residuals residuals = y - y_hat print("=== Leverage Score Analysis ===") print(f"\nDesign matrix X: {n} observations, {p} features") print(f"\nLeverage scores (diagonal of H):") for i, h in enumerate(leverage): print(f" Observation {i+1}: h_{i+1},{i+1} = {h:.4f}") print(f"\n=== Properties ===") print(f"Sum of leverage scores: {np.sum(leverage):.4f}") print(f"Number of features p: {p}") print(f"(These should be equal)") print(f"\nMin leverage: {np.min(leverage):.4f}") print(f"Max leverage: {np.max(leverage):.4f}") print(f"Mean leverage: {np.mean(leverage):.4f} (= p/n = {p/n:.4f})") # High leverage points (common rule: h_ii > 2p/n) threshold = 2 * p / n high_leverage = np.where(leverage > threshold)[0] print(f"\nHigh leverage threshold (2p/n): {threshold:.4f}") print(f"High leverage points: {high_leverage + 1 if len(high_leverage) > 0 else 'None'}") print(f"\n=== Influence on Fitted Values ===") print(f"The fitted value ŷ_i = Σⱼ H_ij * y_j") print(f"\nFor each observation, how much does y_i influence ŷ_i?") for i in range(n): self_contribution = H[i, i] * y[i] others_contribution = y_hat[i] - self_contribution print(f" ŷ_{i+1} = {y_hat[i]:.3f} = {self_contribution:.3f} (self: {H[i,i]*100:.1f}%) + {others_contribution:.3f} (others)") return leverage, H # Example: Simple linear regressionnp.random.seed(42)n = 10X = np.column_stack([np.ones(n), np.arange(n)]) # Intercept + linear termy = 2 + 0.5 * X[:, 1] + np.random.randn(n) * 0.3 # Add an outlier with high leverage (extreme x value)X_with_outlier = np.vstack([X, [1, 20]]) # Point far from othersy_with_outlier = np.append(y, 12) print("Original data:")leverage, H = analyze_leverage(X, y) print("\n" + "="*60)print("\nWith high-leverage outlier (x=20):")leverage_outlier, H_outlier = analyze_leverage(X_with_outlier, y_with_outlier)High leverage doesn't always mean high influence. An observation can have high leverage (unusual x values) but still fit the model well. Influence also depends on the residual. Cook's distance combines leverage and residuals to measure actual influence on fitted values.
Given a matrix that might be a projection, how do we verify its properties? Here's a comprehensive checklist and implementation:
Complete verification criteria:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
import numpy as npfrom numpy.linalg import eig, norm, matrix_rank def comprehensive_projection_verification(P: np.ndarray, tol: float = 1e-10) -> dict: """ Comprehensive verification that P is an orthogonal projection matrix. Checks: 1. Idempotent: P² = P 2. Symmetric: Pᵀ = P 3. Eigenvalues: only 0 and 1 4. Trace = Rank 5. Positive semi-definite 6. Norm contraction (for random vectors) Returns: Dictionary with all verification results """ results = {} m = P.shape[0] # 1. Idempotent: P² = P P_squared = P @ P results['idempotent'] = np.allclose(P_squared, P, atol=tol) results['idempotent_error'] = norm(P_squared - P) # 2. Symmetric: Pᵀ = P results['symmetric'] = np.allclose(P.T, P, atol=tol) results['symmetry_error'] = norm(P.T - P) # 3. Eigenvalue analysis eigenvalues, eigenvectors = eig(P) eigenvalues_real = np.real(eigenvalues) # Check all eigenvalues are 0 or 1 is_zero_or_one = np.all( (np.abs(eigenvalues_real) < tol) | (np.abs(eigenvalues_real - 1) < tol) ) results['eigenvalues_valid'] = is_zero_or_one results['eigenvalues'] = np.sort(eigenvalues_real)[::-1] count_one = np.sum(np.abs(eigenvalues_real - 1) < tol) count_zero = np.sum(np.abs(eigenvalues_real) < tol) results['eigenvalue_1_count'] = int(count_one) results['eigenvalue_0_count'] = int(count_zero) # 4. Trace = Rank trace_P = np.real(np.trace(P)) rank_P = matrix_rank(P, tol=tol) results['trace'] = trace_P results['rank'] = rank_P results['trace_equals_rank'] = np.abs(trace_P - rank_P) < tol # 5. Positive semi-definite (for random vectors xᵀPx ≥ 0) test_vectors = np.random.randn(m, 100) quadratic_forms = np.array([v @ P @ v for v in test_vectors.T]) results['positive_semidefinite'] = np.all(quadratic_forms >= -tol) results['min_quadratic_form'] = np.min(quadratic_forms) # 6. Norm contraction ||Pv|| ≤ ||v|| norms_before = norm(test_vectors, axis=0) norms_after = norm(P @ test_vectors, axis=0) results['norm_contraction'] = np.all(norms_after <= norms_before + tol) # Overall verdict results['is_orthogonal_projection'] = ( results['idempotent'] and results['symmetric'] and results['eigenvalues_valid'] ) return results def print_verification(P: np.ndarray, name: str = "P"): """Print comprehensive verification results.""" results = comprehensive_projection_verification(P) print(f"=== Verification of {name} as Orthogonal Projection Matrix ===") print(f"\nMatrix shape: {P.shape}") print(f"\n1. IDEMPOTENT (P² = P): {'✓ PASS' if results['idempotent'] else '✗ FAIL'}") print(f" Error ||P² - P|| = {results['idempotent_error']:.2e}") print(f"\n2. SYMMETRIC (Pᵀ = P): {'✓ PASS' if results['symmetric'] else '✗ FAIL'}") print(f" Error ||Pᵀ - P|| = {results['symmetry_error']:.2e}") print(f"\n3. EIGENVALUES (only 0 or 1): {'✓ PASS' if results['eigenvalues_valid'] else '✗ FAIL'}") print(f" Eigenvalues: {np.round(results['eigenvalues'], 4)}") print(f" Count of 1's: {results['eigenvalue_1_count']}, Count of 0's: {results['eigenvalue_0_count']}") print(f"\n4. TRACE = RANK: {'✓ PASS' if results['trace_equals_rank'] else '✗ FAIL'}") print(f" Trace = {results['trace']:.4f}, Rank = {results['rank']}") print(f"\n5. POSITIVE SEMI-DEFINITE: {'✓ PASS' if results['positive_semidefinite'] else '✗ FAIL'}") print(f" Min quadratic form: {results['min_quadratic_form']:.2e}") print(f"\n6. NORM CONTRACTION (||Pv|| ≤ ||v||): {'✓ PASS' if results['norm_contraction'] else '✗ FAIL'}") print(f"\n{'='*50}") verdict = "IS" if results['is_orthogonal_projection'] else "is NOT" print(f"VERDICT: {name} {verdict} an orthogonal projection matrix") return results # Test with a valid projection matrixA = np.array([[1, 0], [1, 1], [0, 1]])P_valid = A @ np.linalg.inv(A.T @ A) @ A.T print_verification(P_valid, "P (from A(AᵀA)⁻¹Aᵀ)") print("\n" + "="*60 + "\n") # Test with an invalid matrix (not idempotent)P_invalid = np.array([[0.5, 0.5, 0], [0.5, 0.5, 0], [0, 0, 1]])print_verification(P_invalid, "P_invalid")The projection matrix is central to linear regression. Given design matrix X (n × p) and response vector y (n × 1), the linear regression fitted values are:
ŷ = Xβ̂ = X(XᵀX)⁻¹Xᵀy = Hy
where H = X(XᵀX)⁻¹Xᵀ is the hat matrix (it puts the "hat" on y).
Geometric interpretation:
The least squares solution minimizes ||y - Xβ||², and the minimum is achieved when Xβ is the closest point in Col(X) to y—which is exactly the projection.
| Concept | Formula | Interpretation |
|---|---|---|
| Hat matrix | H = X(XᵀX)⁻¹Xᵀ | Projects y onto column space of X |
| Fitted values | ŷ = Hy | Prediction = projection of y |
| Residuals | e = (I-H)y | Error = orthogonal complement projection |
| Residual variance | (I-H)σ² | Covariance of residuals (when y ~ N(Xβ, σ²I)) |
| Degrees of freedom | n - tr(H) | = n - p for full rank model |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
import numpy as npfrom numpy.linalg import inv def regression_as_projection(X: np.ndarray, y: np.ndarray): """ Demonstrate that linear regression is orthogonal projection. Shows: 1. ŷ = Hy is the projection of y onto Col(X) 2. Residuals e are orthogonal to Col(X) 3. Pythagorean theorem: ||y||² = ||ŷ||² + ||e||² 4. R² = ||ŷ||²/||y||² (proportion in model space) """ n, p = X.shape # Hat matrix H = X @ inv(X.T @ X) @ X.T # Fitted values (projection) y_hat = H @ y # Residuals e = y - y_hat # Coefficients (for interpretation) beta = inv(X.T @ X) @ X.T @ y print("=== Linear Regression as Orthogonal Projection ===") print(f"\nDesign matrix X: {n} observations, {p} features") print(f"Response y: {y}") print(f"\nCoefficients β̂: {beta}") print(f"Fitted values ŷ = Xβ̂ = Hy: {np.round(y_hat, 4)}") print(f"Residuals e = y - ŷ = (I-H)y: {np.round(e, 4)}") print(f"\n=== Verifying Orthogonality ===") print(f"Residuals orthogonal to each column of X:") for j in range(p): dot_product = np.dot(X[:, j], e) print(f" X[:, {j}] · e = {dot_product:.10f}") print(f"\n=== Pythagorean Theorem ===") print(f"||y||² = {np.dot(y, y):.4f}") print(f"||ŷ||² = {np.dot(y_hat, y_hat):.4f}") print(f"||e||² = {np.dot(e, e):.4f}") print(f"||ŷ||² + ||e||² = {np.dot(y_hat, y_hat) + np.dot(e, e):.4f}") # R-squared y_centered = y - np.mean(y) SS_tot = np.dot(y_centered, y_centered) SS_res = np.dot(e, e) SS_reg = np.dot(y_hat - np.mean(y), y_hat - np.mean(y)) R_squared = 1 - SS_res / SS_tot print(f"\n=== R-Squared Interpretation ===") print(f"R² = 1 - SS_res/SS_tot = {R_squared:.4f}") print(f"Proportion of variance explained by projection onto Col(X)") print(f"\n=== Degrees of Freedom ===") print(f"Model df = tr(H) = {np.trace(H):.1f} = p = {p}") print(f"Residual df = tr(I-H) = {np.trace(np.eye(n) - H):.1f} = n - p = {n - p}") return H, y_hat, e, beta, R_squared # Example: Simple linear regressionnp.random.seed(42)n = 20x = np.linspace(0, 10, n)X = np.column_stack([np.ones(n), x]) # Intercept + linear termy_true = 2 + 3 * xy = y_true + np.random.randn(n) * 2 # Add noise H, y_hat, e, beta, R2 = regression_as_projection(X, y) print(f"\nTrue parameters: intercept=2, slope=3")print(f"Estimated: intercept={beta[0]:.4f}, slope={beta[1]:.4f}")We've developed a complete understanding of projection matrices—the linear operators that perform orthogonal projection. These matrices are fundamental tools in linear regression, principal component analysis, and throughout machine learning.
What's next:
In the next page, we'll derive the normal equations from multiple perspectives: as the gradient condition for least squares, as the orthogonality condition encoded algebraically, and as the equation satisfied by the projection. This provides the foundation for understanding how linear regression coefficients are computed.
You now understand projection matrices deeply—their algebraic properties, eigenstructure, and central role in linear regression. You can verify if a matrix is a projection, compute leverage scores, and interpret fitted values geometrically. Next, we formalize the normal equations.