Loading content...
Matrix operations become dramatically simpler when expressed in the right coordinate system. Diagonalization is the art of finding that coordinate system—a basis of eigenvectors where the matrix becomes diagonal.
Why does this matter? Consider computing A¹⁰⁰ for some matrix A. Direct multiplication requires 99 matrix-matrix products. But if A = VΛV⁻¹, then A¹⁰⁰ = VΛ¹⁰⁰V⁻¹. Since Λ is diagonal, Λ¹⁰⁰ just raises each diagonal entry to the 100th power—trivial!
Diagonalization underlies PCA (the spectral decomposition of covariance matrices), Markov chain analysis (finding steady states), differential equations (solving systems of ODEs), and countless optimization algorithms. It's not just a computational trick—it reveals the intrinsic structure of linear transformations.
By the end of this page, you will understand the conditions for diagonalizability, construct the diagonalization A = VΛV⁻¹, recognize when matrices are NOT diagonalizable (defective matrices), master the Spectral Theorem for symmetric matrices, and apply diagonalization to simplify matrix powers, exponentials, and functions.
A matrix A is diagonalizable if there exists an invertible matrix V and a diagonal matrix Λ such that:
$$\mathbf{A} = \mathbf{V} \mathbf{\Lambda} \mathbf{V}^{-1}$$
Equivalently:
$$\mathbf{V}^{-1} \mathbf{A} \mathbf{V} = \mathbf{\Lambda}$$
The columns of V are eigenvectors of A, and the diagonal entries of Λ are the corresponding eigenvalues:
$$\mathbf{V} = \begin{pmatrix} | & | & & | \ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \ | & | & & | \end{pmatrix}, \quad \mathbf{\Lambda} = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \ 0 & \lambda_2 & \cdots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & 0 & \cdots & \lambda_n \end{pmatrix}$$
For V to be invertible, its columns (eigenvectors) must be linearly independent. This is why we need n linearly independent eigenvectors to diagonalize an n×n matrix. If eigenvectors are dependent (or there aren't enough), diagonalization fails.
Verification:
We can verify A = VΛV⁻¹ by checking Av_i = λ_i v_i for each eigenvector:
$$\mathbf{A}\mathbf{V} = \mathbf{A}\begin{pmatrix} \mathbf{v}_1 & \cdots & \mathbf{v}_n \end{pmatrix} = \begin{pmatrix} \lambda_1\mathbf{v}_1 & \cdots & \lambda_n\mathbf{v}_n \end{pmatrix} = \mathbf{V}\mathbf{\Lambda}$$
Multiplying both sides by V⁻¹ on the right: A = VΛV⁻¹.
Geometric interpretation:
Diagonalization says: "In the coordinate system defined by eigenvectors, A acts as simple scaling along each axis." The transformation V converts from eigenvector coordinates to standard coordinates; Λ scales along eigenvector directions; V⁻¹ converts back.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
import numpy as np def diagonalize(A): """ Attempt to diagonalize matrix A. Returns: V: Matrix of eigenvectors (columns) Lambda: Diagonal matrix of eigenvalues V_inv: Inverse of V Raises ValueError if matrix is not diagonalizable. """ n = A.shape[0] # Compute eigenvalues and eigenvectors eigenvalues, V = np.linalg.eig(A) # Check if V is invertible (eigenvectors are linearly independent) det_V = np.linalg.det(V) if np.abs(det_V) < 1e-10: raise ValueError("Matrix is not diagonalizable: eigenvectors are not linearly independent") # Create diagonal matrix Lambda = np.diag(eigenvalues) # Compute inverse V_inv = np.linalg.inv(V) return V, Lambda, V_inv def verify_diagonalization(A, V, Lambda, V_inv): """Verify that A = V @ Lambda @ V_inv.""" A_reconstructed = V @ Lambda @ V_inv # Handle potential complex numbers from eigendecomposition if np.iscomplexobj(A_reconstructed) and not np.iscomplexobj(A): A_reconstructed = A_reconstructed.real error = np.linalg.norm(A - A_reconstructed) return error < 1e-10, error # Example 1: Diagonalizable matrixA = np.array([ [4, 1], [2, 3]], dtype=float) print("Example 1: Diagonalizable matrix")print("=" * 50)print(f"A ={A}")print() V, Lambda, V_inv = diagonalize(A) print(f"Eigenvalues (diagonal of Λ): {np.diag(Lambda)}")print()print(f"V (eigenvectors as columns) ={V}")print()print(f"Λ (diagonal matrix) ={Lambda}")print() # Verifyis_correct, error = verify_diagonalization(A, V, Lambda, V_inv)print(f"Verification: A = VΛV⁻¹? {is_correct} (error: {error:.2e})")print() # Show the power of diagonalization: compute A^10A_10_direct = np.linalg.matrix_power(A, 10)Lambda_10 = np.diag(np.diag(Lambda)**10)A_10_via_diag = V @ Lambda_10 @ V_inv print("Computing A¹⁰:")print(f" Direct computation: A¹⁰[0,0] = {A_10_direct[0,0]:.6f}")print(f" Via diagonalization: VΛ¹⁰V⁻¹[0,0] = {A_10_via_diag[0,0].real:.6f}")print(f" Match: {np.allclose(A_10_direct, A_10_via_diag.real)}")Not all matrices are diagonalizable. Understanding when diagonalization is possible is crucial.
Theorem: Diagonalizability Conditions
An n×n matrix A is diagonalizable if and only if it has n linearly independent eigenvectors.
This happens when:
| Matrix | Eigenvalues | Multiplicities (Alg/Geo) | Diagonalizable? |
|---|---|---|---|
| [[3,0],[0,5]] | λ₁=3, λ₂=5 | Each: 1/1 | Yes (distinct eigenvalues) |
| [[2,0],[0,2]] | λ=2 (double) | 2/2 | Yes (enough eigenvectors) |
| [[2,1],[0,2]] | λ=2 (double) | 2/1 ❌ | No (defective) |
| [[0,-1],[1,0]] | λ=±i (complex) | Each: 1/1 | Yes (over ℂ), No (over ℝ) |
| Symmetric | All real | Geo = Alg always | Always yes |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
import numpy as npfrom collections import Counter def analyze_diagonalizability(A, tol=1e-10): """ Analyze whether a matrix is diagonalizable and why. Returns detailed information about multiplicities. """ n = A.shape[0] print(f"Matrix A ({n}x{n}):") print(A) print() # Get eigenvalues eigenvalues = np.linalg.eigvals(A) # Round eigenvalues for grouping (handle numerical noise) def round_complex(z, decimals=8): return complex(round(z.real, decimals), round(z.imag, decimals)) rounded_eigenvalues = [round_complex(e) for e in eigenvalues] # Count algebraic multiplicities algebraic_mult = Counter(rounded_eigenvalues) print("Eigenvalue Analysis:") print("-" * 50) is_diagonalizable = True for eigenvalue, alg_mult in algebraic_mult.items(): # Compute geometric multiplicity = dim(null(A - λI)) shifted = A - eigenvalue * np.eye(n) # Geometric multiplicity = n - rank(A - λI) rank = np.linalg.matrix_rank(shifted, tol=tol) geo_mult = n - rank status = "✓" if geo_mult == alg_mult else "✗" print(f"λ = {eigenvalue}") print(f" Algebraic multiplicity: {alg_mult}") print(f" Geometric multiplicity: {geo_mult}") print(f" Status: {status}") print() if geo_mult < alg_mult: is_diagonalizable = False print("-" * 50) if is_diagonalizable: print("RESULT: Matrix IS diagonalizable") else: print("RESULT: Matrix is NOT diagonalizable (defective)") return is_diagonalizable # Example 1: Distinct eigenvalues - always diagonalizableprint("=" * 60)print("Example 1: Distinct eigenvalues")print("=" * 60)A1 = np.array([ [4, 1], [2, 3]], dtype=float)analyze_diagonalizability(A1) # Example 2: Repeated eigenvalue, diagonalizable (identity-like)print("" + "=" * 60)print("Example 2: Repeated eigenvalue, IS diagonalizable")print("=" * 60)A2 = np.array([ [2, 0], [0, 2]], dtype=float)analyze_diagonalizability(A2) # Example 3: Defective matrix (NOT diagonalizable)print("" + "=" * 60)print("Example 3: Defective matrix (Jordan block)")print("=" * 60)A3 = np.array([ [2, 1], [0, 2]], dtype=float)analyze_diagonalizability(A3) # Example 4: 3x3 with repeated eigenvalue, check carefullyprint("" + "=" * 60)print("Example 4: 3x3 with repeated eigenvalue")print("=" * 60)A4 = np.array([ [3, 0, 0], [0, 3, 0], [0, 0, 5]], dtype=float)analyze_diagonalizability(A4)Defective matrices (like the Jordan block [[2,1],[0,2]]) have repeated eigenvalues but not enough eigenvectors. They require the more general Jordan normal form instead of diagonalization. Fortunately, defective matrices are 'rare' in a precise mathematical sense—almost any symmetric matrix is diagonalizable.
The Spectral Theorem is one of the most important results in linear algebra for ML. It guarantees that symmetric matrices have especially nice diagonalizations.
Theorem (Spectral Theorem for Real Symmetric Matrices):
If A is a real symmetric matrix (A = Aᵀ), then:
Consequently, A can be written as:
$$\mathbf{A} = \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^T$$
where Q is an orthogonal matrix (Qᵀ = Q⁻¹) whose columns are orthonormal eigenvectors.
Covariance matrices are symmetric (Σ = Σᵀ) and positive semi-definite. The Spectral Theorem guarantees we can always find orthonormal principal components with real, non-negative eigenvalues. This makes PCA well-defined and numerically stable. The orthogonality means principal components are uncorrelated—each captures independent variation.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import numpy as np def spectral_decomposition(A): """ Compute the spectral decomposition of a symmetric matrix. For symmetric A: A = Q @ Λ @ Q.T where Q is orthogonal (Q.T = Q^{-1}) and Λ is diagonal. """ # Check symmetry if not np.allclose(A, A.T): raise ValueError("Matrix must be symmetric for spectral decomposition") # Use eigh for symmetric matrices (guaranteed real eigenvalues, orthogonal eigenvectors) eigenvalues, Q = np.linalg.eigh(A) # Sort by eigenvalue magnitude (descending for PCA-like interpretation) idx = np.argsort(eigenvalues)[::-1] eigenvalues = eigenvalues[idx] Q = Q[:, idx] Lambda = np.diag(eigenvalues) return Q, Lambda, eigenvalues def verify_spectral_properties(A, Q, Lambda): """ Verify all properties guaranteed by the Spectral Theorem. """ print("Verifying Spectral Theorem properties:") print("-" * 50) eigenvalues = np.diag(Lambda) # Property 1: All eigenvalues are real all_real = np.allclose(eigenvalues.imag, 0) if np.iscomplexobj(eigenvalues) else True print(f"1. All eigenvalues real: {all_real}") print(f" Eigenvalues: {eigenvalues}") # Property 2: Q is orthogonal (Q.T @ Q = I) is_orthogonal = np.allclose(Q.T @ Q, np.eye(Q.shape[1])) print(f"2. Q is orthogonal (QᵀQ = I): {is_orthogonal}") # Property 3: Reconstruction A = QΛQᵀ A_reconstructed = Q @ Lambda @ Q.T reconstruction_correct = np.allclose(A, A_reconstructed) print(f"3. A = QΛQᵀ: {reconstruction_correct}") # Property 4: Qᵀ = Q⁻¹ (no explicit inverse needed) Q_T = Q.T Q_inv = np.linalg.inv(Q) transpose_equals_inverse = np.allclose(Q_T, Q_inv) print(f"4. Qᵀ = Q⁻¹: {transpose_equals_inverse}") print("-" * 50) return all([all_real, is_orthogonal, reconstruction_correct, transpose_equals_inverse]) # Example: Covariance matrix (symmetric positive semi-definite)print("Spectral Decomposition of a Covariance Matrix")print("=" * 60) # Generate random data and compute covariancenp.random.seed(42)n_samples, n_features = 100, 4X = np.random.randn(n_samples, n_features) # Make the data have different variances in different directionsX[:, 0] *= 5 # High varianceX[:, 1] *= 2 # Medium varianceX[:, 2] *= 1 # Unit varianceX[:, 3] *= 0.5 # Low variance # Center the dataX_centered = X - X.mean(axis=0) # Covariance matrixSigma = X_centered.T @ X_centered / (n_samples - 1) print("Covariance matrix Σ:")print(Sigma.round(4))print() # Spectral decompositionQ, Lambda, eigenvalues = spectral_decomposition(Sigma) print("Eigenvectors (columns of Q):")print(Q.round(4))print() print("Eigenvalues (diagonal of Λ):")print(eigenvalues.round(4))print() # Verify propertiesall_verified = verify_spectral_properties(Sigma, Q, Lambda)print(f"All Spectral Theorem properties verified: {all_verified}") # Show orthogonality of eigenvectors explicitlyprint("" + "=" * 60)print("Eigenvector Orthogonality:")print("=" * 60)for i in range(len(eigenvalues)): for j in range(i+1, len(eigenvalues)): dot_product = np.dot(Q[:, i], Q[:, j]) print(f"v{i+1} · v{j+1} = {dot_product:.2e}")Spectral decomposition vs eigendecomposition:
Both write A = VΛV⁻¹, but the spectral decomposition is special:
| Property | General Eigendecomposition | Spectral Decomposition |
|---|---|---|
| Matrix type | Any diagonalizable matrix | Symmetric matrices only |
| V⁻¹ | Must compute inverse | V⁻¹ = Vᵀ (just transpose!) |
| Eigenvectors | Linearly independent | Orthonormal |
| Eigenvalues | Can be complex | Always real |
| Numerical stability | Depends on condition | Highly stable |
For PCA, kernel methods, and graph Laplacians, we always work with symmetric matrices and use the spectral decomposition.
Diagonalization makes computing matrix powers and functions trivial. If A = VΛV⁻¹, then:
Matrix powers: $$\mathbf{A}^k = \mathbf{V} \mathbf{\Lambda}^k \mathbf{V}^{-1}$$
Since Λ is diagonal, Λᵏ just raises each eigenvalue to the kth power: $$\mathbf{\Lambda}^k = \text{diag}(\lambda_1^k, \lambda_2^k, \ldots, \lambda_n^k)$$
Matrix exponential: $$e^{\mathbf{A}} = \mathbf{V} e^{\mathbf{\Lambda}} \mathbf{V}^{-1} = \mathbf{V} \text{diag}(e^{\lambda_1}, e^{\lambda_2}, \ldots, e^{\lambda_n}) \mathbf{V}^{-1}$$
General matrix functions: For any function f analytic at the eigenvalues: $$f(\mathbf{A}) = \mathbf{V} f(\mathbf{\Lambda}) \mathbf{V}^{-1} = \mathbf{V} \text{diag}(f(\lambda_1), f(\lambda_2), \ldots, f(\lambda_n)) \mathbf{V}^{-1}$$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
import numpy as npfrom scipy import linalgimport matplotlib.pyplot as plt def matrix_power_via_diagonalization(A, k): """ Compute A^k using eigendecomposition. A^k = V @ Λ^k @ V^{-1} """ eigenvalues, V = np.linalg.eig(A) V_inv = np.linalg.inv(V) # Λ^k: raise each eigenvalue to power k Lambda_k = np.diag(eigenvalues ** k) result = V @ Lambda_k @ V_inv # Return real part if input was real if not np.iscomplexobj(A): result = result.real return result def matrix_exp_via_diagonalization(A): """ Compute e^A using eigendecomposition. e^A = V @ e^Λ @ V^{-1} """ eigenvalues, V = np.linalg.eig(A) V_inv = np.linalg.inv(V) # e^Λ: exponentiate each eigenvalue exp_Lambda = np.diag(np.exp(eigenvalues)) result = V @ exp_Lambda @ V_inv if not np.iscomplexobj(A): result = result.real return result def matrix_function(A, f): """ Apply function f to matrix A via eigendecomposition. f(A) = V @ f(Λ) @ V^{-1} """ eigenvalues, V = np.linalg.eig(A) V_inv = np.linalg.inv(V) # Apply f to each eigenvalue f_Lambda = np.diag(f(eigenvalues)) result = V @ f_Lambda @ V_inv if not np.iscomplexobj(A): result = result.real return result # Example 1: Matrix powersprint("Matrix Powers via Diagonalization")print("=" * 60) A = np.array([ [0.9, 0.2], [0.1, 0.8]], dtype=float) print(f"A ={A}")print() # Compute A^10 both waysA_10_direct = np.linalg.matrix_power(A, 10)A_10_diag = matrix_power_via_diagonalization(A, 10) print(f"A¹⁰ (direct): {A_10_direct.round(6)}")print()print(f"A¹⁰ (via eigendecomp):{A_10_diag.round(6)}")print()print(f"Match: {np.allclose(A_10_direct, A_10_diag)}") # Large power - this is where diagonalization shinesprint("" + "-" * 40)print("Large power: A¹⁰⁰⁰")A_1000_diag = matrix_power_via_diagonalization(A, 1000)print(f"A¹⁰⁰⁰ ={A_1000_diag.round(6)}") eigenvalues = np.linalg.eigvals(A)print(f"Eigenvalues: {eigenvalues}")print(f"λ₁¹⁰⁰⁰ = {eigenvalues[0]**1000:.6f}")print(f"λ₂¹⁰⁰⁰ = {eigenvalues[1]**1000:.6e}")print("Note: Largest eigenvalue (1.0) dominates; other decays to 0") # Example 2: Matrix exponentialprint("" + "=" * 60)print("Matrix Exponential")print("=" * 60) B = np.array([ [0, -1], [1, 0]], dtype=float) # Rotation generator print(f"B (90° rotation generator) ={B}")print() # e^B should give rotation by 1 radianexp_B_diag = matrix_exp_via_diagonalization(B)exp_B_scipy = linalg.expm(B) print(f"e^B (via eigendecomp): {exp_B_diag.round(6)}")print()print(f"e^B (scipy.linalg.expm):{exp_B_scipy.round(6)}")print() # This should be rotation by 1 radiantheta = 1.0expected = np.array([ [np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])print(f"Expected (rotation matrix for θ=1 rad):{expected.round(6)}")print(f"Match: {np.allclose(exp_B_diag, expected)}") # Example 3: Matrix square rootprint("" + "=" * 60)print("Matrix Square Root (√A)")print("=" * 60) C = np.array([ [4, 2], [2, 5]], dtype=float) # Positive definite sqrt_C = matrix_function(C, np.sqrt)print(f"C ={C}")print()print(f"√C ={sqrt_C.round(6)}")print()print(f"(√C)² ={(sqrt_C @ sqrt_C).round(6)}")print(f"Match C: {np.allclose(sqrt_C @ sqrt_C, C)}")Computing A^k directly requires O(k) matrix multiplications. Via diagonalization, we compute eigendecomposition once (O(n³)), then the power is O(n) regardless of k. For large k, this is a massive speedup. This is why Markov chain steady-state analysis (A^∞) uses eigendecomposition.
Diagonalization is essential for analyzing Markov chains—stochastic processes where the probability of the next state depends only on the current state.
A Markov chain is defined by a transition matrix P where P_ij = P(next state = j | current state = i). The rows sum to 1.
If π₀ is the initial state distribution (row vector), after k steps:
$$\pi_k = \pi_0 \mathbf{P}^k$$
Key questions:
Both questions are answered by eigenanalysis of P.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
import numpy as npimport matplotlib.pyplot as plt def analyze_markov_chain(P, name="Markov Chain"): """ Analyze a Markov chain transition matrix using eigendecomposition. """ print(f"{'='*60}") print(f"Analysis: {name}") print(f"{'='*60}") print("Transition matrix P:") print(P.round(4)) # Verify it's a valid transition matrix row_sums = P.sum(axis=1) print(f"Row sums (should be 1): {row_sums}") # Eigendecomposition - eigenvalues of P^T give left eigenvectors of P eigenvalues, eigenvectors = np.linalg.eig(P.T) # Sort by eigenvalue magnitude idx = np.argsort(np.abs(eigenvalues))[::-1] eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] print(f"Eigenvalues: {eigenvalues.round(6)}") # The steady state corresponds to eigenvalue 1 # Find the eigenvalue closest to 1 steady_state_idx = np.argmin(np.abs(eigenvalues - 1)) if np.abs(eigenvalues[steady_state_idx] - 1) < 1e-10: # Normalize to be a probability distribution steady_state = eigenvectors[:, steady_state_idx].real steady_state = steady_state / steady_state.sum() print(f"Steady state distribution (π∞):") print(f" {steady_state.round(6)}") # Verify: π∞ P = π∞ verification = steady_state @ P print(f" Verify π∞P = π∞: {np.allclose(verification, steady_state)}") else: print("No steady state (no eigenvalue = 1)") steady_state = None # Convergence rate determined by second-largest eigenvalue if len(eigenvalues) > 1: second_eigenvalue = eigenvalues[1] mixing_rate = np.abs(second_eigenvalue) print(f"Second largest eigenvalue: {second_eigenvalue.round(6)}") print(f"Mixing rate (|λ₂|): {mixing_rate:.6f}") print(f"Approximate mixing time: {int(-1/np.log(mixing_rate + 1e-10)) if mixing_rate < 1 else 'slow/infinite'} steps") return eigenvalues, eigenvectors, steady_state def simulate_markov_chain(P, initial_state, num_steps): """ Simulate how the state distribution evolves over time. """ n_states = P.shape[0] distribution = np.zeros(n_states) distribution[initial_state] = 1.0 history = [distribution.copy()] for _ in range(num_steps): distribution = distribution @ P history.append(distribution.copy()) return np.array(history) # Example 1: Weather model (Sunny/Rainy)P_weather = np.array([ [0.8, 0.2], # Sunny -> [Sunny, Rainy] [0.4, 0.6] # Rainy -> [Sunny, Rainy]])eigenvalues, _, steady_state = analyze_markov_chain(P_weather, "Weather Model") # Simulate convergencefig, axes = plt.subplots(1, 2, figsize=(14, 5)) history = simulate_markov_chain(P_weather, 0, 20) # Start sunnyax = axes[0]ax.plot(history[:, 0], 'o-', label='P(Sunny)', color='orange')ax.plot(history[:, 1], 's-', label='P(Rainy)', color='blue')ax.axhline(y=steady_state[0], color='orange', linestyle='--', alpha=0.5)ax.axhline(y=steady_state[1], color='blue', linestyle='--', alpha=0.5)ax.set_xlabel('Step')ax.set_ylabel('Probability')ax.set_title('Convergence to Steady State(Starting from Sunny)')ax.legend()ax.grid(True, alpha=0.3) # Example 2: PageRank-style matrixP_pagerank = np.array([ [0.0, 0.5, 0.5], [0.0, 0.0, 1.0], [1.0, 0.0, 0.0]])eigenvalues_pr, _, steady_state_pr = analyze_markov_chain(P_pagerank, "Simple Web Graph") # Plot eigenvalues in complex planeax = axes[1]theta = np.linspace(0, 2*np.pi, 100)ax.plot(np.cos(theta), np.sin(theta), 'k--', alpha=0.3, label='Unit circle')ax.scatter(eigenvalues_pr.real, eigenvalues_pr.imag, s=100, c='red', zorder=5)for i, ev in enumerate(eigenvalues_pr): ax.annotate(f'λ{i+1}={ev:.2f}', (ev.real + 0.1, ev.imag + 0.1))ax.set_xlim(-1.5, 1.5)ax.set_ylim(-1.5, 1.5)ax.set_aspect('equal')ax.set_xlabel('Real')ax.set_ylabel('Imaginary')ax.set_title('Eigenvalues in Complex Plane(λ=1 gives steady state)')ax.grid(True, alpha=0.3) plt.tight_layout()plt.savefig('markov_chain_analysis.png', dpi=150)plt.show()Google's original PageRank is a Markov chain on web pages. The steady state (eigenvector for λ=1) gives page importance scores. The 'damping factor' ensures convergence by making the transition matrix primitive (all eigenvalues except 1 have |λ| < 1).
Symmetric Positive Definite (SPD) matrices are the best-behaved matrices in linear algebra. They appear everywhere in ML: covariance matrices, kernel matrices, Hessians of convex functions.
Definition: A symmetric matrix A is positive definite if xᵀAx > 0 for all non-zero x ∈ ℝⁿ.
Characterization via eigenvalues: A symmetric matrix A is positive definite if and only if all eigenvalues are strictly positive (λᵢ > 0 for all i).
Positive semi-definite: λᵢ ≥ 0 for all i.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
import numpy as npfrom scipy import linalg def analyze_spd_matrix(A, name="Matrix"): """ Analyze properties of a symmetric positive (semi-)definite matrix. """ print(f"Analyzing: {name}") print("=" * 60) print(A) print() # Check symmetry is_symmetric = np.allclose(A, A.T) print(f"Symmetric: {is_symmetric}") # Eigenanalysis eigenvalues, Q = np.linalg.eigh(A) print(f"Eigenvalues: {eigenvalues.round(6)}") # Positive definiteness is_pd = np.all(eigenvalues > 0) is_psd = np.all(eigenvalues >= 0) print(f"Positive definite: {is_pd}") print(f"Positive semi-definite: {is_psd}") # Condition number if is_psd and eigenvalues.min() > 0: condition = eigenvalues.max() / eigenvalues.min() print(f"Condition number: {condition:.4f}") if is_pd: # Cholesky decomposition L = linalg.cholesky(A, lower=True) print(f"Cholesky factor L (A = LLᵀ):") print(L.round(4)) # Verify print(f"LLᵀ ≈ A: {np.allclose(L @ L.T, A)}") # Matrix square root via eigendecomposition sqrt_eigenvalues = np.sqrt(eigenvalues) sqrt_A = Q @ np.diag(sqrt_eigenvalues) @ Q.T print(f"Matrix square root √A:") print(sqrt_A.round(4)) print(f"(√A)² ≈ A: {np.allclose(sqrt_A @ sqrt_A, A)}") # Inverse A_inv = np.linalg.inv(A) inv_eigenvalues = np.linalg.eigvalsh(A_inv) print(f"Inverse A⁻¹ eigenvalues: {inv_eigenvalues.round(6)}") print(f"These are 1/λ: {(1/eigenvalues[::-1]).round(6)}") return eigenvalues, Q # Example 1: Create SPD matrix from data covariancenp.random.seed(42)X = np.random.randn(100, 3)X[:, 0] *= 3 # Different variancesX[:, 1] *= 2cov = X.T @ X / (100 - 1) eigenvalues, Q = analyze_spd_matrix(cov, "Covariance Matrix") # Example 2: Kernel matrix (always PSD)from scipy.spatial.distance import cdist points = np.random.randn(5, 2)# RBF kernel: K_ij = exp(-||x_i - x_j||^2 / (2σ^2))sigma = 1.0distances = cdist(points, points, 'sqeuclidean')K = np.exp(-distances / (2 * sigma**2)) print("" + "=" * 60)analyze_spd_matrix(K, "RBF Kernel Matrix") # Example 3: Nearly singular (ill-conditioned) SPDA_ill = np.array([ [1.0, 0.99], [0.99, 1.0]])analyze_spd_matrix(A_ill, "Near-Singular SPD Matrix")Not all matrices are diagonalizable. Defective matrices have repeated eigenvalues with insufficient eigenvectors. What can we do?
Jordan Normal Form:
Every square matrix A (over ℂ) can be written as:
$$\mathbf{A} = \mathbf{P} \mathbf{J} \mathbf{P}^{-1}$$
where J is in Jordan normal form—block diagonal with Jordan blocks:
$$J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \cdots \ 0 & \lambda & 1 & \cdots \ \vdots & & \ddots & \ddots \ 0 & \cdots & 0 & \lambda \end{pmatrix}$$
For diagonalizable matrices, all Jordan blocks are 1×1 (just eigenvalues on diagonal).
In ML practice, defective matrices are rare. Symmetric matrices (covariance, kernel matrices) are always diagonalizable. Random matrices are almost surely diagonalizable. Jordan form matters more for theoretical analysis than practical computation—but it's good to know the theory is complete.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import numpy as npfrom scipy import linalg # The canonical defective matrix: a Jordan blockJ = np.array([ [2, 1], [0, 2]], dtype=float) print("Defective Matrix (Jordan block):")print(J)print() # Eigenvalueseigenvalues = np.linalg.eigvals(J)print(f"Eigenvalues: {eigenvalues}")print("Note: λ=2 has algebraic multiplicity 2") # Try to find eigenvectorseigenvalues, eigenvectors = np.linalg.eig(J)print(f"Eigenvectors (columns):")print(eigenvectors)print() # Check if eigenvectors are linearly independentdet = np.linalg.det(eigenvectors)print(f"Determinant of eigenvector matrix: {det:.2e}")print("Since det ≈ 0, eigenvectors are NOT linearly independent!")print() # What goes wrong with A^k?print("Powers of J:")for k in [1, 2, 3, 5, 10]: J_k = np.linalg.matrix_power(J, k) print(f"J^{k} = {J_k}") print() print("Notice: J^k = [[2^k, k*2^(k-1)], [0, 2^k]]")print("The off-diagonal grows with k (not fixed by eigendecomposition)") # Schur decomposition always worksprint("" + "=" * 60)print("Schur Decomposition (Always Works)")print("=" * 60) # For any matrix, A = Q T Q^H where T is upper triangularT, Q = linalg.schur(J)print(f"Schur form T (upper triangular):")print(T)print(f"Q (unitary):")print(Q)print(f"Verify A = Q T Qᴴ: {np.allclose(J, Q @ T @ Q.T.conj())}")We've explored diagonalization—the process of expressing matrices in their simplest form. Let's consolidate:
What's Next:
We've built the complete theoretical foundation of eigenanalysis. The final page of this module explores real-world ML applications—how eigenvalues and eigenvectors power PCA for dimensionality reduction, PageRank for web search, spectral clustering for community detection, and more. We'll see eigenanalysis in action!
You now understand matrix diagonalization—when it's possible, how it works, and why it's powerful. You've mastered the Spectral Theorem for symmetric matrices and seen applications to matrix functions and Markov chains. Next, we'll apply these concepts to PCA, PageRank, and spectral clustering.