Loading content...
We've defined eigenvalues mathematically and interpreted them geometrically. But how do we actually compute them? This is where theory meets practice—and where naive approaches encounter computational walls.
The characteristic polynomial det(A - λI) = 0 gives us eigenvalues in theory. But for matrices larger than 4×4, there's no algebraic formula for polynomial roots (the Abel-Ruffini theorem). And even when formulas exist, they're numerically unstable.
Industry-strength eigenvalue computation relies on iterative numerical algorithms—methods that refine approximations until they converge. Understanding these algorithms demystifies how NumPy, LAPACK, and other libraries actually compute eigenvalues, and helps you choose the right tool for your ML problem.
By the end of this page, you will understand the characteristic polynomial method (and its limitations), master the power iteration algorithm for finding the dominant eigenvalue, learn the QR algorithm—the workhorse of modern eigenvalue computation, and know when to use specialized algorithms for symmetric, sparse, or large matrices.
We start with the textbook approach: form the characteristic polynomial and find its roots.
Procedure:
For a 2×2 matrix:
$$A = \begin{pmatrix} a & b \ c & d \end{pmatrix}$$
$$\det(A - \lambda I) = (a-\lambda)(d-\lambda) - bc = \lambda^2 - (a+d)\lambda + (ad-bc)$$
Using the quadratic formula:
$$\lambda = \frac{(a+d) \pm \sqrt{(a+d)^2 - 4(ad-bc)}}{2} = \frac{\text{tr}(A) \pm \sqrt{\text{tr}(A)^2 - 4\det(A)}}{2}$$
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
import numpy as np def eigenvalues_2x2(A): """ Compute eigenvalues of a 2x2 matrix using the characteristic polynomial. For A = [[a, b], [c, d]]: λ = (trace ± sqrt(trace² - 4*det)) / 2 """ a, b = A[0, 0], A[0, 1] c, d = A[1, 0], A[1, 1] trace = a + d det = a * d - b * c discriminant = trace**2 - 4 * det if discriminant >= 0: sqrt_disc = np.sqrt(discriminant) λ1 = (trace + sqrt_disc) / 2 λ2 = (trace - sqrt_disc) / 2 return λ1, λ2, "Real" else: sqrt_disc = np.sqrt(-discriminant) λ1 = complex(trace/2, sqrt_disc/2) λ2 = complex(trace/2, -sqrt_disc/2) return λ1, λ2, "Complex" def eigenvalues_3x3(A): """ Compute eigenvalues of a 3x3 matrix using Cardano's formula. The characteristic polynomial is: λ³ - tr(A)λ² + ... = 0 This is already getting complicated! """ # Compute coefficients of characteristic polynomial # p(λ) = -λ³ + c2*λ² + c1*λ + c0 c2 = np.trace(A) # Sum of eigenvalues = trace # c1 = sum of 2x2 principal minors c1 = (A[0,0]*A[1,1] - A[0,1]*A[1,0] + A[0,0]*A[2,2] - A[0,2]*A[2,0] + A[1,1]*A[2,2] - A[1,2]*A[2,1]) c0 = -np.linalg.det(A) # Negative determinant # Solve λ³ - c2*λ² - c1*λ + c0 = 0 using numpy coeffs = [1, -c2, c1, c0] roots = np.roots(coeffs) return roots # ExampleA = np.array([ [4, 1], [2, 3]]) λ1, λ2, eigentype = eigenvalues_2x2(A)print("2x2 Matrix:")print(A)print(f"Eigenvalues ({eigentype}): λ₁ = {λ1}, λ₂ = {λ2}")print(f"Verification with NumPy: {np.linalg.eigvals(A)}")print() # 3x3 exampleB = np.array([ [4, 1, 0], [2, 3, 1], [0, 1, 2]])eigenvalues_3x3_result = eigenvalues_3x3(B)print("3x3 Matrix:")print(B)print(f"Eigenvalues: {eigenvalues_3x3_result}")print(f"Verification with NumPy: {np.linalg.eigvals(B)}")The characteristic polynomial method has fatal flaws for large matrices: (1) Computing the determinant symbolically is exponentially expensive, (2) For n ≥ 5, there's no algebraic formula for polynomial roots (Abel-Ruffini theorem), (3) Even for smaller n, root-finding is numerically unstable—small errors in coefficients cause large errors in roots.
The numerical instability problem:
Consider the polynomial (x - 1)(x - 2)...(x - 20) = 0, which has roots 1, 2, ..., 20. If we expand this and perturb the coefficients slightly, the computed roots can be wildly wrong. This phenomenon, known as ill-conditioning, makes the characteristic polynomial approach impractical.
Wilkinson's polynomial is a famous example where a perturbation of 10⁻²³ in one coefficient causes root errors of order 10⁸.
For this reason, production eigenvalue algorithms never form the characteristic polynomial explicitly. Instead, they use iterative matrix transformations.
The power iteration (or power method) is the simplest iterative algorithm for eigenvalue computation. It finds the dominant eigenvalue—the one with largest absolute value—and its corresponding eigenvector.
Algorithm:
Why it works:
Express v₀ in the eigenvector basis: v₀ = c₁e₁ + c₂e₂ + ... + cₙeₙ
After k iterations:
Aᵏv₀ = c₁λ₁ᵏe₁ + c₂λ₂ᵏe₂ + ... + cₙλₙᵏeₙ
If |λ₁| > |λ₂| ≥ ... ≥ |λₙ|, then λ₁ᵏ dominates as k → ∞:
Aᵏv₀ ≈ c₁λ₁ᵏe₁
So Aᵏv₀ tends toward the direction of e₁.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
import numpy as npimport matplotlib.pyplot as plt def power_iteration(A, num_iterations=100, tol=1e-10): """ Power iteration to find the dominant eigenvalue and eigenvector. Parameters: A: Square matrix num_iterations: Maximum number of iterations tol: Convergence tolerance Returns: eigenvalue: Dominant eigenvalue eigenvector: Corresponding eigenvector (normalized) history: List of eigenvalue estimates at each iteration """ n = A.shape[0] # Start with random vector v = np.random.randn(n) v = v / np.linalg.norm(v) eigenvalue_history = [] for i in range(num_iterations): # Multiply by A Av = A @ v # Estimate eigenvalue via Rayleigh quotient eigenvalue = np.dot(v, Av) # v^T A v / v^T v, but v is normalized eigenvalue_history.append(eigenvalue) # Normalize Av_norm = np.linalg.norm(Av) if Av_norm < 1e-15: raise ValueError("Matrix may be singular or v is in null space") v_new = Av / Av_norm # Check convergence if i > 0 and abs(eigenvalue_history[-1] - eigenvalue_history[-2]) < tol: print(f"Converged after {i+1} iterations") break v = v_new return eigenvalue, v, eigenvalue_history # ExampleA = np.array([ [4, 1, 0], [1, 3, 1], [0, 1, 2]], dtype=float) print("Matrix A:")print(A)print() # Power iterationdominant_eigenvalue, dominant_eigenvector, history = power_iteration(A) print(f"Power iteration result:")print(f" Dominant eigenvalue: {dominant_eigenvalue:.10f}")print(f" Dominant eigenvector: {dominant_eigenvector}")print() # Compare with NumPyeigenvalues, eigenvectors = np.linalg.eig(A)idx = np.argmax(np.abs(eigenvalues))print(f"NumPy result:")print(f" All eigenvalues: {eigenvalues}")print(f" Dominant eigenvalue: {eigenvalues[idx]:.10f}")print(f" Dominant eigenvector: {eigenvectors[:, idx]}")print() # Verify Av = λvAv = A @ dominant_eigenvectorλv = dominant_eigenvalue * dominant_eigenvectorprint(f"Verification (should be ~0): ||Av - λv|| = {np.linalg.norm(Av - λv):.2e}") # Plot convergenceplt.figure(figsize=(10, 4))plt.subplot(1, 2, 1)plt.plot(history, 'b-o', markersize=3)plt.axhline(y=eigenvalues[idx], color='r', linestyle='--', label='True value')plt.xlabel('Iteration')plt.ylabel('Eigenvalue Estimate')plt.title('Power Iteration Convergence')plt.legend()plt.grid(True, alpha=0.3) plt.subplot(1, 2, 2)errors = np.abs(np.array(history) - eigenvalues[idx])plt.semilogy(errors, 'g-o', markersize=3)plt.xlabel('Iteration')plt.ylabel('|Error| (log scale)')plt.title('Error Convergence (Linear on Log Scale)')plt.grid(True, alpha=0.3) plt.tight_layout()plt.savefig('power_iteration_convergence.png', dpi=150)plt.show()Convergence rate:
Power iteration converges at rate |λ₂/λ₁| per iteration, where λ₁ is the dominant eigenvalue and λ₂ is the second-largest. If |λ₂/λ₁| is close to 1, convergence is slow. If there's a clear spectral gap (|λ₁| >> |λ₂|), convergence is fast.
Limitations:
Google's original PageRank algorithm is essentially power iteration on the web's link matrix. The dominant eigenvector gives page importance scores. Power iteration is ideal here because (1) we only need the dominant eigenvector, (2) the matrix is sparse (most pages link to few others), and (3) we can stream through data without storing the full matrix.
Power iteration finds the largest eigenvalue. What if we want the smallest? Or an eigenvalue near a specific value?
Inverse iteration:
Apply power iteration to A⁻¹. The eigenvalues of A⁻¹ are 1/λᵢ. The largest eigenvalue of A⁻¹ is 1/λₘᵢₙ, so inverse iteration finds the smallest eigenvalue of A.
Algorithm: Solve Avₖ₊₁ = vₖ at each step (instead of multiplying by A⁻¹, which is expensive).
Shift-invert iteration:
To find the eigenvalue closest to a target σ, apply inverse iteration to (A - σI)⁻¹.
The eigenvalues of (A - σI)⁻¹ are 1/(λᵢ - σ). The largest of these corresponds to the λᵢ closest to σ.
This is extremely powerful: if you have a good estimate σ of an eigenvalue, shift-invert converges very fast.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
import numpy as npfrom scipy import linalg def inverse_iteration(A, num_iterations=100, tol=1e-10): """ Inverse iteration to find the smallest eigenvalue. """ n = A.shape[0] v = np.random.randn(n) v = v / np.linalg.norm(v) # LU decomposition for efficient solving lu, piv = linalg.lu_factor(A) for i in range(num_iterations): # Solve A @ v_new = v (instead of v_new = A^(-1) @ v) v_new = linalg.lu_solve((lu, piv), v) # Normalize v_new = v_new / np.linalg.norm(v_new) # Estimate eigenvalue via Rayleigh quotient eigenvalue = np.dot(v_new, A @ v_new) if np.linalg.norm(v_new - v) < tol or np.linalg.norm(v_new + v) < tol: print(f"Inverse iteration converged after {i+1} iterations") break v = v_new return eigenvalue, v def shift_invert_iteration(A, sigma, num_iterations=100, tol=1e-10): """ Shift-invert iteration to find eigenvalue closest to sigma. """ n = A.shape[0] v = np.random.randn(n) v = v / np.linalg.norm(v) # Shifted matrix: (A - σI) A_shifted = A - sigma * np.eye(n) # LU decomposition for efficient solving lu, piv = linalg.lu_factor(A_shifted) for i in range(num_iterations): # Solve (A - σI) @ v_new = v v_new = linalg.lu_solve((lu, piv), v) # Normalize v_new = v_new / np.linalg.norm(v_new) # Estimate eigenvalue via Rayleigh quotient on ORIGINAL matrix eigenvalue = np.dot(v_new, A @ v_new) if np.linalg.norm(v_new - v) < tol or np.linalg.norm(v_new + v) < tol: print(f"Shift-invert converged after {i+1} iterations") break v = v_new return eigenvalue, v # ExampleA = np.array([ [5, 1, 0], [1, 3, 1], [0, 1, 1]], dtype=float) print("Matrix A:")print(A)print() # Get all eigenvalues for referenceall_eigenvalues = np.linalg.eigvals(A)print(f"All eigenvalues: {sorted(all_eigenvalues)}")print() # Inverse iteration: find smallest eigenvaluesmallest_λ, smallest_v = inverse_iteration(A)print(f"Smallest eigenvalue (inverse iteration): {smallest_λ:.10f}")print(f"True smallest: {min(all_eigenvalues):.10f}")print() # Shift-invert: find eigenvalue closest to 3.0target = 3.0closest_λ, closest_v = shift_invert_iteration(A, target)print(f"Eigenvalue closest to {target} (shift-invert): {closest_λ:.10f}")# Find actual closestactual_closest = all_eigenvalues[np.argmin(np.abs(all_eigenvalues - target))]print(f"True closest to {target}: {actual_closest:.10f}")Inverse iteration requires solving a linear system at each step (O(n³) for dense matrices, or O(nnz) for sparse with good preconditioners). This is costlier than power iteration per step, but the fast convergence often compensates. For sparse matrices from graphs (like in spectral clustering), sparse solvers make this very efficient.
The QR algorithm is the workhorse of eigenvalue computation. It finds all eigenvalues of a matrix and is the foundation of numpy.linalg.eig and LAPACK's eigenvalue routines.
Basic QR iteration:
Why it works:
Aₖ₊₁ = RₖQₖ = Qₖ⁻¹(QₖRₖ)Qₖ = Qₖ⁻¹AₖQₖ = Qₖᵀ Aₖ Qₖ
So Aₖ₊₁ is similar to Aₖ (same eigenvalues). The iterates converge to Schur form, where eigenvalues appear on the diagonal.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
import numpy as np def basic_qr_algorithm(A, num_iterations=100, tol=1e-10): """ Basic QR algorithm for eigenvalue computation. Returns the matrix converged toward upper triangular form, whose diagonal contains the eigenvalues. """ Ak = A.copy().astype(float) n = A.shape[0] for i in range(num_iterations): # QR decomposition Q, R = np.linalg.qr(Ak) # Form next iterate: A_{k+1} = R @ Q Ak_new = R @ Q # Check convergence (off-diagonal elements should go to 0) off_diagonal_norm = np.linalg.norm(np.tril(Ak_new, -1)) if off_diagonal_norm < tol: print(f"QR algorithm converged after {i+1} iterations") break Ak = Ak_new # Eigenvalues are on the diagonal eigenvalues = np.diag(Ak) return eigenvalues, Ak def qr_algorithm_with_shift(A, num_iterations=1000, tol=1e-10): """ QR algorithm with Wilkinson shift for faster convergence. The shift accelerates convergence dramatically for the smallest eigenvalue at each step. """ Ak = A.copy().astype(float) n = A.shape[0] eigenvalues = [] for m in range(n, 1, -1): for i in range(num_iterations): # Wilkinson shift: eigenvalue of bottom 2x2 block # closer to A[m-1, m-1] a = Ak[m-2, m-2] b = Ak[m-2, m-1] c = Ak[m-1, m-2] d = Ak[m-1, m-1] trace = a + d det = a * d - b * c discriminant = trace**2 / 4 - det if discriminant >= 0: sqrt_disc = np.sqrt(discriminant) λ1 = trace/2 + sqrt_disc λ2 = trace/2 - sqrt_disc # Choose shift closer to d shift = λ1 if abs(λ1 - d) < abs(λ2 - d) else λ2 else: shift = d # Fallback # Shifted QR step Am_shifted = Ak[:m, :m] - shift * np.eye(m) Q, R = np.linalg.qr(Am_shifted) Ak[:m, :m] = R @ Q + shift * np.eye(m) # Check if deflation is possible if abs(Ak[m-1, m-2]) < tol: eigenvalues.append(Ak[m-1, m-1]) break else: eigenvalues.append(Ak[m-1, m-1]) eigenvalues.append(Ak[0, 0]) return np.array(eigenvalues) # ExampleA = np.array([ [4, 1, 0, 0], [1, 3, 1, 0], [0, 1, 2, 1], [0, 0, 1, 1]], dtype=float) print("Matrix A:")print(A)print() # Basic QR algorithmeigenvalues_basic, A_final = basic_qr_algorithm(A)print("Basic QR algorithm:")print(f" Eigenvalues: {sorted(eigenvalues_basic)}")print(f" Final matrix (should be upper triangular):")print(A_final.round(6))print() # QR with shifteigenvalues_shifted = qr_algorithm_with_shift(A)print("QR with shift:")print(f" Eigenvalues: {sorted(eigenvalues_shifted)}")print() # NumPy comparisoneigenvalues_numpy = np.linalg.eigvals(A)print("NumPy eigenvalues:")print(f" {sorted(eigenvalues_numpy)}")The production QR algorithm includes: (1) Reduction to Hessenberg form first (upper triangular plus one subdiagonal) to reduce cost per iteration from O(n³) to O(n²), (2) Shifts to accelerate convergence, (3) Implicit QR steps for numerical stability, (4) Deflation when subdiagonal elements become negligible. These enhancements make the algorithm O(n³) overall.
Symmetric matrices (A = Aᵀ) are ubiquitous in ML—covariance matrices, kernels, graph Laplacians. Their special structure enables faster, more stable algorithms.
Key advantages:
| Algorithm | Use Case | Complexity | Notes |
|---|---|---|---|
| Symmetric QR | All eigenvalues, moderate n | O(n³) | Reliable, always works |
| Divide and Conquer | All eigenvalues, large n | O(n³) but faster in practice | Excellent for large dense matrices |
| Jacobi Method | High accuracy needed | O(n³) but slow | Best accuracy, parallelizable |
| MRRR (Multiple Relatively Robust Representations) | All eigenpairs efficiently | O(n²) | Newest, used by LAPACK |
| Lanczos | Extreme eigenvalues, sparse matrices | O(k·nnz) | k = number of eigenvalues wanted |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
import numpy as npfrom scipy import linalgimport time def compare_symmetric_solvers(n=500): """ Compare different approaches for symmetric eigenvalue problems. """ # Create symmetric positive definite matrix np.random.seed(42) X = np.random.randn(n, n) A = X @ X.T # Guaranteed symmetric positive semi-definite print(f"Testing symmetric eigensolvers on {n}x{n} matrix") print("-" * 60) # Method 1: numpy.linalg.eig (general purpose) start = time.perf_counter() eigenvalues_eig, eigenvectors_eig = np.linalg.eig(A) time_eig = time.perf_counter() - start print(f"np.linalg.eig: {time_eig:.4f}s") # Method 2: numpy.linalg.eigh (for symmetric/Hermitian) start = time.perf_counter() eigenvalues_eigh, eigenvectors_eigh = np.linalg.eigh(A) time_eigh = time.perf_counter() - start print(f"np.linalg.eigh: {time_eigh:.4f}s (specialized for symmetric)") # Method 3: scipy.linalg.eigh with divide-and-conquer start = time.perf_counter() eigenvalues_dc, eigenvectors_dc = linalg.eigh(A, driver='evd') time_dc = time.perf_counter() - start print(f"scipy evd (divide&conquer): {time_dc:.4f}s") # Method 4: scipy.linalg.eigh with MRRR start = time.perf_counter() eigenvalues_mrrr, eigenvectors_mrrr = linalg.eigh(A, driver='evr') time_mrrr = time.perf_counter() - start print(f"scipy evr (MRRR): {time_mrrr:.4f}s") # Method 5: Only some eigenvalues (e.g., top 10) k = 10 start = time.perf_counter() eigenvalues_partial, eigenvectors_partial = linalg.eigh( A, subset_by_index=[n-k, n-1] ) time_partial = time.perf_counter() - start print(f"scipy (only top {k}): {time_partial:.4f}s") print("-" * 60) print(f"Speedup of eigh over eig: {time_eig/time_eigh:.2f}x") # Verify results are consistent eigenvalues_sorted = np.sort(eigenvalues_eigh)[::-1] print(f"Top 5 eigenvalues: {eigenvalues_sorted[:5]}") print(f"All methods agree: {np.allclose(np.sort(eigenvalues_eigh), np.sort(eigenvalues_dc))}") return { 'eig': time_eig, 'eigh': time_eigh, 'divide_conquer': time_dc, 'mrrr': time_mrrr, 'partial': time_partial } # Run comparisontimes = compare_symmetric_solvers(500) # Demonstrate scipy.sparse.linalg.eigsh for sparse matricesprint("" + "="*60)print("For SPARSE symmetric matrices (e.g., graph Laplacians):")print("="*60) from scipy.sparse import random as sparse_randomfrom scipy.sparse.linalg import eigsh # Create sparse symmetric matrixn_sparse = 5000density = 0.01 # 1% nonzeroSparse = sparse_random(n_sparse, n_sparse, density=density, format='csr')Sparse = Sparse @ Sparse.T # Make symmetric # Find only top k eigenvaluesk = 10start = time.perf_counter()eigenvalues_sparse, eigenvectors_sparse = eigsh(Sparse, k=k, which='LM')time_sparse = time.perf_counter() - startprint(f"scipy.sparse.linalg.eigsh for {n_sparse}x{n_sparse} matrix:")print(f" Density: {Sparse.nnz / n_sparse**2 * 100:.4f}%")print(f" Time for top {k} eigenvalues: {time_sparse:.4f}s")print(f" Top eigenvalues: {eigenvalues_sparse[::-1]}")In Python, always use np.linalg.eigh() or scipy.linalg.eigh() for symmetric/Hermitian matrices instead of eig(). It's faster (exploits structure), more stable (real eigenvalues guaranteed), and returns eigenvalues in sorted order. This is critical for PCA and spectral methods.
In ML, we frequently encounter sparse matrices—matrices with mostly zero entries. Graph adjacency matrices, document-term matrices, and many kernel matrices are sparse. For these, dense algorithms are wasteful and often infeasible.
The Lanczos algorithm (for symmetric matrices) and Arnoldi algorithm (for general matrices) extend power iteration to find multiple eigenvalues while exploiting sparsity.
Key idea:
Build an orthonormal basis for the Krylov subspace:
K_k(A, v) = span{v, Av, A²v, ..., Aᵏ⁻¹v}
In this subspace, A is represented by a small tridiagonal (symmetric) or upper Hessenberg (general) matrix. Finding eigenvalues of this small matrix is cheap.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
import numpy as npfrom scipy import sparsefrom scipy.sparse.linalg import eigsh, eigsimport matplotlib.pyplot as plt def simple_lanczos(A, k, v0=None): """ Simplified Lanczos algorithm for symmetric matrices. Builds a tridiagonal matrix T whose eigenvalues approximate A's. Parameters: A: Symmetric matrix (can be sparse) k: Number of Lanczos iterations (= size of Krylov subspace) v0: Starting vector Returns: T: Tridiagonal matrix Q: Orthonormal Lanczos vectors """ n = A.shape[0] # Handle sparse matrices if sparse.issparse(A): matvec = lambda x: A @ x else: matvec = lambda x: A @ x # Initialize if v0 is None: v0 = np.random.randn(n) v = v0 / np.linalg.norm(v0) # Storage Q = np.zeros((n, k)) alpha = np.zeros(k) # Diagonal of T beta = np.zeros(k-1) # Off-diagonal of T Q[:, 0] = v # First iteration w = matvec(v) alpha[0] = np.dot(w, v) w = w - alpha[0] * v for j in range(1, k): beta[j-1] = np.linalg.norm(w) if beta[j-1] < 1e-12: print(f"Lanczos breakdown at iteration {j}") k = j break v = w / beta[j-1] Q[:, j] = v w = matvec(v) - beta[j-1] * Q[:, j-1] alpha[j] = np.dot(w, v) w = w - alpha[j] * v # Reorthogonalization (important for numerical stability) for i in range(j+1): w = w - np.dot(w, Q[:, i]) * Q[:, i] # Build tridiagonal matrix T = np.diag(alpha[:k]) + np.diag(beta[:k-1], 1) + np.diag(beta[:k-1], -1) return T, Q[:, :k] # Example: Large sparse matrixn = 1000density = 0.01np.random.seed(42) # Create sparse symmetric matrix (e.g., graph Laplacian structure)data = sparse.random(n, n, density=density, format='csr')A_sparse = data + data.T # Make symmetric print(f"Sparse matrix: {n}x{n}, {A_sparse.nnz} nonzeros ({100*A_sparse.nnz/n**2:.2f}%)")print() # Run Lanczosk = 50 # Number of Lanczos iterationsT, Q = simple_lanczos(A_sparse, k) # Eigenvalues of T approximate extreme eigenvalues of Aeigenvalues_T = np.linalg.eigvalsh(T) # Compare with scipy's eigsheigenvalues_actual, _ = eigsh(A_sparse, k=10, which='BE') # Both extremes print("Lanczos approximation (from tridiagonal T):")print(f" Largest: {max(eigenvalues_T):.6f}")print(f" Smallest: {min(eigenvalues_T):.6f}")print()print("scipy.eigsh result:")print(f" Largest: {max(eigenvalues_actual):.6f}")print(f" Smallest: {min(eigenvalues_actual):.6f}") # Visualize convergence: how Lanczos eigenvalues converge to true valuesall_true_eigenvalues = np.linalg.eigvalsh(A_sparse.toarray()) fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Left: Eigenvalue distributionax = axes[0]ax.hist(all_true_eigenvalues, bins=50, alpha=0.7, label='True eigenvalues')ax.hist(eigenvalues_T, bins=30, alpha=0.7, label=f'Lanczos (k={k})')ax.set_xlabel('Eigenvalue')ax.set_ylabel('Count')ax.set_title('Eigenvalue Distribution')ax.legend()ax.grid(True, alpha=0.3) # Right: Lanczos captures extremesax = axes[1]ax.plot(sorted(all_true_eigenvalues), 'b-', label='True (sorted)', alpha=0.5)ax.scatter([0, n-1], [min(all_true_eigenvalues), max(all_true_eigenvalues)], c='blue', s=100, zorder=5)ax.axhline(y=min(eigenvalues_T), color='red', linestyle='--', label=f'Lanczos min: {min(eigenvalues_T):.3f}')ax.axhline(y=max(eigenvalues_T), color='red', linestyle='--', label=f'Lanczos max: {max(eigenvalues_T):.3f}')ax.set_xlabel('Index (sorted)')ax.set_ylabel('Eigenvalue')ax.set_title('Lanczos Captures Extreme Eigenvalues')ax.legend()ax.grid(True, alpha=0.3) plt.tight_layout()plt.savefig('lanczos_demo.png', dpi=150)plt.show()Use Lanczos/Arnoldi (scipy.sparse.linalg.eigsh/eigs) when: (1) Matrix is sparse or only available as a matrix-vector product, (2) You only need a few eigenvalues (not all), (3) Matrix is too large to store densely. These are common in spectral clustering, graph analysis, and kernel methods.
When computing eigenvalues in ML applications, several practical issues arise:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
import numpy as np def safe_pca_eigenvalues(X, n_components=None): """ Compute eigenvalues for PCA with practical safeguards. Parameters: X: Data matrix (n_samples, n_features), assumed centered n_components: Number of components to return (default: all) Returns: eigenvalues: Sorted in descending order eigenvectors: Corresponding eigenvectors explained_variance_ratio: Fraction of variance per component """ n_samples, n_features = X.shape # Compute covariance matrix (use n_samples - 1 for unbiased estimate) cov = X.T @ X / (n_samples - 1) # Use eigh for symmetric matrix eigenvalues, eigenvectors = np.linalg.eigh(cov) # eigh returns ascending order; reverse to descending eigenvalues = eigenvalues[::-1] eigenvectors = eigenvectors[:, ::-1] # SAFEGUARD 1: Clip negative eigenvalues (numerical artifact) eigenvalues = np.maximum(eigenvalues, 0) # SAFEGUARD 2: Make eigenvector signs consistent # Convention: largest absolute component is positive for i in range(eigenvectors.shape[1]): max_abs_idx = np.argmax(np.abs(eigenvectors[:, i])) if eigenvectors[max_abs_idx, i] < 0: eigenvectors[:, i] *= -1 # Compute explained variance ratio total_variance = eigenvalues.sum() if total_variance > 0: explained_variance_ratio = eigenvalues / total_variance else: explained_variance_ratio = np.zeros_like(eigenvalues) # Select top n_components if n_components is not None: eigenvalues = eigenvalues[:n_components] eigenvectors = eigenvectors[:, :n_components] explained_variance_ratio = explained_variance_ratio[:n_components] return eigenvalues, eigenvectors, explained_variance_ratio # Example usagenp.random.seed(42) # Generate some data with known structuren_samples = 1000n_features = 10 # True low-rank structure: 3 important componentslatent = np.random.randn(n_samples, 3)projection = np.random.randn(3, n_features)X = latent @ projection + 0.5 * np.random.randn(n_samples, n_features) # Center the dataX_centered = X - X.mean(axis=0) # Compute PCA eigenstructureeigenvalues, eigenvectors, explained_ratio = safe_pca_eigenvalues(X_centered) print("PCA Eigenvalue Analysis")print("=" * 50)print(f"Top eigenvalues: {eigenvalues[:5].round(4)}")print(f"Explained variance ratio:")for i, ratio in enumerate(explained_ratio[:5]): cumulative = explained_ratio[:i+1].sum() print(f" PC{i+1}: {ratio*100:.2f}% (cumulative: {cumulative*100:.2f}%)") print(f"Note: First 3 components explain {explained_ratio[:3].sum()*100:.2f}%")print("(Matches our 3-dimensional latent structure)") # Demonstrate condition number issueprint("" + "=" * 50)print("Condition Number Analysis")print("=" * 50) cov = X_centered.T @ X_centered / (n_samples - 1)condition_number = max(eigenvalues) / max(eigenvalues[-1], 1e-10)print(f"Condition number: {condition_number:.2e}") if condition_number > 1e10: print("WARNING: Matrix is ill-conditioned. Consider regularization.")We've explored the computational side of eigenvalue problems. Let's consolidate:
| Situation | Algorithm | Python Function |
|---|---|---|
| Dense, general matrix | QR algorithm | np.linalg.eig(A) |
| Dense, symmetric matrix | Symmetric QR / Divide-and-Conquer | np.linalg.eigh(A) |
| Sparse, symmetric, few eigenvalues | Lanczos | scipy.sparse.linalg.eigsh(A, k) |
| Sparse, general, few eigenvalues | Arnoldi | scipy.sparse.linalg.eigs(A, k) |
| Only dominant eigenvalue | Power iteration | Custom implementation |
| Eigenvalue near target σ | Shift-invert | eigsh(A, sigma=σ, k) |
What's Next:
Now that we can compute eigenvalues, the next page explores diagonalization—writing A = VΛV⁻¹. We'll learn when diagonalization is possible, how to check for it, and why it dramatically simplifies matrix operations. Diagonalization is the key to understanding matrix exponentials, powers, and the spectral theorem.
You now understand the practical algorithms for computing eigenvalues—from power iteration to QR to Lanczos. You know which algorithm to use for different matrix types and sizes, and how to handle numerical issues. Next, we'll explore diagonalization and its profound implications.