Eigenvalues And Eigenvectors - Learning Module

Loading content...

0/278

Computing Eigenvalues

From Theory to Computation

We've defined eigenvalues mathematically and interpreted them geometrically. But how do we actually compute them? This is where theory meets practice—and where naive approaches encounter computational walls.

The characteristic polynomial det(A - λI) = 0 gives us eigenvalues in theory. But for matrices larger than 4×4, there's no algebraic formula for polynomial roots (the Abel-Ruffini theorem). And even when formulas exist, they're numerically unstable.

Industry-strength eigenvalue computation relies on iterative numerical algorithms—methods that refine approximations until they converge. Understanding these algorithms demystifies how NumPy, LAPACK, and other libraries actually compute eigenvalues, and helps you choose the right tool for your ML problem.

What You Will Learn

By the end of this page, you will understand the characteristic polynomial method (and its limitations), master the power iteration algorithm for finding the dominant eigenvalue, learn the QR algorithm—the workhorse of modern eigenvalue computation, and know when to use specialized algorithms for symmetric, sparse, or large matrices.

The Characteristic Polynomial Method

We start with the textbook approach: form the characteristic polynomial and find its roots.

Procedure:

Form the matrix (A - λI)
Compute det(A - λI) as a polynomial in λ
Solve det(A - λI) = 0 for λ

For a 2×2 matrix:

$$A = \begin{pmatrix} a & b \ c & d \end{pmatrix}$$

$$\det(A - \lambda I) = (a-\lambda)(d-\lambda) - bc = \lambda^2 - (a+d)\lambda + (ad-bc)$$

Using the quadratic formula:

$$\lambda = \frac{(a+d) \pm \sqrt{(a+d)^2 - 4(ad-bc)}}{2} = \frac{\text{tr}(A) \pm \sqrt{\text{tr}(A)^2 - 4\det(A)}}{2}$$

characteristic_polynomial_method.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import numpy as np
 
def eigenvalues_2x2(A):
    """
    Compute eigenvalues of a 2x2 matrix using the characteristic polynomial.
    
    For A = [[a, b], [c, d]]:
    λ = (trace ± sqrt(trace² - 4*det)) / 2
    """
    a, b = A[0, 0], A[0, 1]
    c, d = A[1, 0], A[1, 1]
    
    trace = a + d
    det = a * d - b * c
    
    discriminant = trace**2 - 4 * det
    
    if discriminant >= 0:
        sqrt_disc = np.sqrt(discriminant)
        λ1 = (trace + sqrt_disc) / 2
        λ2 = (trace - sqrt_disc) / 2
        return λ1, λ2, "Real"
    else:
        sqrt_disc = np.sqrt(-discriminant)
        λ1 = complex(trace/2, sqrt_disc/2)
        λ2 = complex(trace/2, -sqrt_disc/2)
        return λ1, λ2, "Complex"
 
def eigenvalues_3x3(A):
    """
    Compute eigenvalues of a 3x3 matrix using Cardano's formula.
    The characteristic polynomial is: λ³ - tr(A)λ² + ... = 0
    
    This is already getting complicated!
    """
    # Compute coefficients of characteristic polynomial
    # p(λ) = -λ³ + c2*λ² + c1*λ + c0
    
    c2 = np.trace(A)  # Sum of eigenvalues = trace
    
    # c1 = sum of 2x2 principal minors
    c1 = (A[0,0]*A[1,1] - A[0,1]*A[1,0] +
          A[0,0]*A[2,2] - A[0,2]*A[2,0] +
          A[1,1]*A[2,2] - A[1,2]*A[2,1])
    
    c0 = -np.linalg.det(A)  # Negative determinant
    
    # Solve λ³ - c2*λ² - c1*λ + c0 = 0 using numpy
    coeffs = [1, -c2, c1, c0]
    roots = np.roots(coeffs)
    
    return roots
 
# Example
A = np.array([
    [4, 1],
    [2, 3]
])
 
λ1, λ2, eigentype = eigenvalues_2x2(A)
print("2x2 Matrix:")
print(A)
print(f"Eigenvalues ({eigentype}): λ₁ = {λ1}, λ₂ = {λ2}")
print(f"Verification with NumPy: {np.linalg.eigvals(A)}")
print()
 
# 3x3 example
B = np.array([
    [4, 1, 0],
    [2, 3, 1],
    [0, 1, 2]
])
eigenvalues_3x3_result = eigenvalues_3x3(B)
print("3x3 Matrix:")
print(B)
print(f"Eigenvalues: {eigenvalues_3x3_result}")
print(f"Verification with NumPy: {np.linalg.eigvals(B)}")

Why This Method Fails in Practice

The characteristic polynomial method has fatal flaws for large matrices: (1) Computing the determinant symbolically is exponentially expensive, (2) For n ≥ 5, there's no algebraic formula for polynomial roots (Abel-Ruffini theorem), (3) Even for smaller n, root-finding is numerically unstable—small errors in coefficients cause large errors in roots.

The numerical instability problem:

Consider the polynomial (x - 1)(x - 2)...(x - 20) = 0, which has roots 1, 2, ..., 20. If we expand this and perturb the coefficients slightly, the computed roots can be wildly wrong. This phenomenon, known as ill-conditioning, makes the characteristic polynomial approach impractical.

Wilkinson's polynomial is a famous example where a perturbation of 10⁻²³ in one coefficient causes root errors of order 10⁸.

For this reason, production eigenvalue algorithms never form the characteristic polynomial explicitly. Instead, they use iterative matrix transformations.

Power Iteration: Finding the Dominant Eigenvalue

The power iteration (or power method) is the simplest iterative algorithm for eigenvalue computation. It finds the dominant eigenvalue—the one with largest absolute value—and its corresponding eigenvector.

Algorithm:

Start with a random vector v₀
Repeatedly compute vₖ₊₁ = A·vₖ / ‖A·vₖ‖
The vectors vₖ converge to the dominant eigenvector
The ratio ‖A·vₖ‖/‖vₖ‖ converges to the dominant eigenvalue

Why it works:

Express v₀ in the eigenvector basis: v₀ = c₁e₁ + c₂e₂ + ... + cₙeₙ

After k iterations:

Aᵏv₀ = c₁λ₁ᵏe₁ + c₂λ₂ᵏe₂ + ... + cₙλₙᵏeₙ

If |λ₁| > |λ₂| ≥ ... ≥ |λₙ|, then λ₁ᵏ dominates as k → ∞:

Aᵏv₀ ≈ c₁λ₁ᵏe₁

So Aᵏv₀ tends toward the direction of e₁.

power_iteration.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import numpy as np
import matplotlib.pyplot as plt
 
def power_iteration(A, num_iterations=100, tol=1e-10):
    """
    Power iteration to find the dominant eigenvalue and eigenvector.
    
    Parameters:
        A: Square matrix
        num_iterations: Maximum number of iterations
        tol: Convergence tolerance
    
    Returns:
        eigenvalue: Dominant eigenvalue
        eigenvector: Corresponding eigenvector (normalized)
        history: List of eigenvalue estimates at each iteration
    """
    n = A.shape[0]
    
    # Start with random vector
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)
    
    eigenvalue_history = []
    
    for i in range(num_iterations):
        # Multiply by A
        Av = A @ v
        
        # Estimate eigenvalue via Rayleigh quotient
        eigenvalue = np.dot(v, Av)  # v^T A v / v^T v, but v is normalized
        eigenvalue_history.append(eigenvalue)
        
        # Normalize
        Av_norm = np.linalg.norm(Av)
        if Av_norm < 1e-15:
            raise ValueError("Matrix may be singular or v is in null space")
        
        v_new = Av / Av_norm
        
        # Check convergence
        if i > 0 and abs(eigenvalue_history[-1] - eigenvalue_history[-2]) < tol:
            print(f"Converged after {i+1} iterations")
            break
        
        v = v_new
    
    return eigenvalue, v, eigenvalue_history
 
# Example
A = np.array([
    [4, 1, 0],
    [1, 3, 1],
    [0, 1, 2]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Power iteration
dominant_eigenvalue, dominant_eigenvector, history = power_iteration(A)
 
print(f"Power iteration result:")
print(f"  Dominant eigenvalue: {dominant_eigenvalue:.10f}")
print(f"  Dominant eigenvector: {dominant_eigenvector}")
print()
 
# Compare with NumPy
eigenvalues, eigenvectors = np.linalg.eig(A)
idx = np.argmax(np.abs(eigenvalues))
print(f"NumPy result:")
print(f"  All eigenvalues: {eigenvalues}")
print(f"  Dominant eigenvalue: {eigenvalues[idx]:.10f}")
print(f"  Dominant eigenvector: {eigenvectors[:, idx]}")
print()
 
# Verify Av = λv
Av = A @ dominant_eigenvector
λv = dominant_eigenvalue * dominant_eigenvector
print(f"Verification (should be ~0): ||Av - λv|| = {np.linalg.norm(Av - λv):.2e}")
 
# Plot convergence
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(history, 'b-o', markersize=3)
plt.axhline(y=eigenvalues[idx], color='r', linestyle='--', label='True value')
plt.xlabel('Iteration')
plt.ylabel('Eigenvalue Estimate')
plt.title('Power Iteration Convergence')
plt.legend()
plt.grid(True, alpha=0.3)
 
plt.subplot(1, 2, 2)
errors = np.abs(np.array(history) - eigenvalues[idx])
plt.semilogy(errors, 'g-o', markersize=3)
plt.xlabel('Iteration')
plt.ylabel('|Error| (log scale)')
plt.title('Error Convergence (Linear on Log Scale)')
plt.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('power_iteration_convergence.png', dpi=150)
plt.show()

Convergence rate:

Power iteration converges at rate |λ₂/λ₁| per iteration, where λ₁ is the dominant eigenvalue and λ₂ is the second-largest. If |λ₂/λ₁| is close to 1, convergence is slow. If there's a clear spectral gap (|λ₁| >> |λ₂|), convergence is fast.

Limitations:

Only finds the dominant eigenvalue
Slow if eigenvalues are close in magnitude
Fails if dominant eigenvalue is complex (for real matrices)
Requires starting vector to have non-zero component along dominant eigenvector

Power Iteration in Practice: PageRank

Google's original PageRank algorithm is essentially power iteration on the web's link matrix. The dominant eigenvector gives page importance scores. Power iteration is ideal here because (1) we only need the dominant eigenvector, (2) the matrix is sparse (most pages link to few others), and (3) we can stream through data without storing the full matrix.

Inverse Iteration and Shift-Invert

Power iteration finds the largest eigenvalue. What if we want the smallest? Or an eigenvalue near a specific value?

Inverse iteration:

Apply power iteration to A⁻¹. The eigenvalues of A⁻¹ are 1/λᵢ. The largest eigenvalue of A⁻¹ is 1/λₘᵢₙ, so inverse iteration finds the smallest eigenvalue of A.

Algorithm: Solve Avₖ₊₁ = vₖ at each step (instead of multiplying by A⁻¹, which is expensive).

Shift-invert iteration:

To find the eigenvalue closest to a target σ, apply inverse iteration to (A - σI)⁻¹.

The eigenvalues of (A - σI)⁻¹ are 1/(λᵢ - σ). The largest of these corresponds to the λᵢ closest to σ.

This is extremely powerful: if you have a good estimate σ of an eigenvalue, shift-invert converges very fast.

inverse_iteration.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import numpy as np
from scipy import linalg
 
def inverse_iteration(A, num_iterations=100, tol=1e-10):
    """
    Inverse iteration to find the smallest eigenvalue.
    """
    n = A.shape[0]
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)
    
    # LU decomposition for efficient solving
    lu, piv = linalg.lu_factor(A)
    
    for i in range(num_iterations):
        # Solve A @ v_new = v (instead of v_new = A^(-1) @ v)
        v_new = linalg.lu_solve((lu, piv), v)
        
        # Normalize
        v_new = v_new / np.linalg.norm(v_new)
        
        # Estimate eigenvalue via Rayleigh quotient
        eigenvalue = np.dot(v_new, A @ v_new)
        
        if np.linalg.norm(v_new - v) < tol or np.linalg.norm(v_new + v) < tol:
            print(f"Inverse iteration converged after {i+1} iterations")
            break
        
        v = v_new
    
    return eigenvalue, v
 
def shift_invert_iteration(A, sigma, num_iterations=100, tol=1e-10):
    """
    Shift-invert iteration to find eigenvalue closest to sigma.
    """
    n = A.shape[0]
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)
    
    # Shifted matrix: (A - σI)
    A_shifted = A - sigma * np.eye(n)
    
    # LU decomposition for efficient solving
    lu, piv = linalg.lu_factor(A_shifted)
    
    for i in range(num_iterations):
        # Solve (A - σI) @ v_new = v
        v_new = linalg.lu_solve((lu, piv), v)
        
        # Normalize
        v_new = v_new / np.linalg.norm(v_new)
        
        # Estimate eigenvalue via Rayleigh quotient on ORIGINAL matrix
        eigenvalue = np.dot(v_new, A @ v_new)
        
        if np.linalg.norm(v_new - v) < tol or np.linalg.norm(v_new + v) < tol:
            print(f"Shift-invert converged after {i+1} iterations")
            break
        
        v = v_new
    
    return eigenvalue, v
 
# Example
A = np.array([
    [5, 1, 0],
    [1, 3, 1],
    [0, 1, 1]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Get all eigenvalues for reference
all_eigenvalues = np.linalg.eigvals(A)
print(f"All eigenvalues: {sorted(all_eigenvalues)}")
print()
 
# Inverse iteration: find smallest eigenvalue
smallest_λ, smallest_v = inverse_iteration(A)
print(f"Smallest eigenvalue (inverse iteration): {smallest_λ:.10f}")
print(f"True smallest: {min(all_eigenvalues):.10f}")
print()
 
# Shift-invert: find eigenvalue closest to 3.0
target = 3.0
closest_λ, closest_v = shift_invert_iteration(A, target)
print(f"Eigenvalue closest to {target} (shift-invert): {closest_λ:.10f}")
# Find actual closest
actual_closest = all_eigenvalues[np.argmin(np.abs(all_eigenvalues - target))]
print(f"True closest to {target}: {actual_closest:.10f}")

Computational Cost

Inverse iteration requires solving a linear system at each step (O(n³) for dense matrices, or O(nnz) for sparse with good preconditioners). This is costlier than power iteration per step, but the fast convergence often compensates. For sparse matrices from graphs (like in spectral clustering), sparse solvers make this very efficient.

The QR Algorithm: Industry Standard

The QR algorithm is the workhorse of eigenvalue computation. It finds all eigenvalues of a matrix and is the foundation of numpy.linalg.eig and LAPACK's eigenvalue routines.

Basic QR iteration:

Start with A₀ = A
Compute QR decomposition: Aₖ = QₖRₖ
Form next iterate: Aₖ₊₁ = RₖQₖ
Repeat until Aₖ converges to upper triangular (or block diagonal)
Eigenvalues appear on the diagonal

Why it works:

Aₖ₊₁ = RₖQₖ = Qₖ⁻¹(QₖRₖ)Qₖ = Qₖ⁻¹AₖQₖ = Qₖᵀ Aₖ Qₖ

So Aₖ₊₁ is similar to Aₖ (same eigenvalues). The iterates converge to Schur form, where eigenvalues appear on the diagonal.

qr_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import numpy as np
 
def basic_qr_algorithm(A, num_iterations=100, tol=1e-10):
    """
    Basic QR algorithm for eigenvalue computation.
    
    Returns the matrix converged toward upper triangular form,
    whose diagonal contains the eigenvalues.
    """
    Ak = A.copy().astype(float)
    n = A.shape[0]
    
    for i in range(num_iterations):
        # QR decomposition
        Q, R = np.linalg.qr(Ak)
        
        # Form next iterate: A_{k+1} = R @ Q
        Ak_new = R @ Q
        
        # Check convergence (off-diagonal elements should go to 0)
        off_diagonal_norm = np.linalg.norm(np.tril(Ak_new, -1))
        
        if off_diagonal_norm < tol:
            print(f"QR algorithm converged after {i+1} iterations")
            break
        
        Ak = Ak_new
    
    # Eigenvalues are on the diagonal
    eigenvalues = np.diag(Ak)
    
    return eigenvalues, Ak
 
def qr_algorithm_with_shift(A, num_iterations=1000, tol=1e-10):
    """
    QR algorithm with Wilkinson shift for faster convergence.
    
    The shift accelerates convergence dramatically for the 
    smallest eigenvalue at each step.
    """
    Ak = A.copy().astype(float)
    n = A.shape[0]
    eigenvalues = []
    
    for m in range(n, 1, -1):
        for i in range(num_iterations):
            # Wilkinson shift: eigenvalue of bottom 2x2 block
            # closer to A[m-1, m-1]
            a = Ak[m-2, m-2]
            b = Ak[m-2, m-1]
            c = Ak[m-1, m-2]
            d = Ak[m-1, m-1]
            
            trace = a + d
            det = a * d - b * c
            discriminant = trace**2 / 4 - det
            
            if discriminant >= 0:
                sqrt_disc = np.sqrt(discriminant)
                λ1 = trace/2 + sqrt_disc
                λ2 = trace/2 - sqrt_disc
                # Choose shift closer to d
                shift = λ1 if abs(λ1 - d) < abs(λ2 - d) else λ2
            else:
                shift = d  # Fallback
            
            # Shifted QR step
            Am_shifted = Ak[:m, :m] - shift * np.eye(m)
            Q, R = np.linalg.qr(Am_shifted)
            Ak[:m, :m] = R @ Q + shift * np.eye(m)
            
            # Check if deflation is possible
            if abs(Ak[m-1, m-2]) < tol:
                eigenvalues.append(Ak[m-1, m-1])
                break
        else:
            eigenvalues.append(Ak[m-1, m-1])
    
    eigenvalues.append(Ak[0, 0])
    
    return np.array(eigenvalues)
 
# Example
A = np.array([
    [4, 1, 0, 0],
    [1, 3, 1, 0],
    [0, 1, 2, 1],
    [0, 0, 1, 1]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Basic QR algorithm
eigenvalues_basic, A_final = basic_qr_algorithm(A)
print("Basic QR algorithm:")
print(f"  Eigenvalues: {sorted(eigenvalues_basic)}")
print(f"  Final matrix (should be upper triangular):")
print(A_final.round(6))
print()
 
# QR with shift
eigenvalues_shifted = qr_algorithm_with_shift(A)
print("QR with shift:")
print(f"  Eigenvalues: {sorted(eigenvalues_shifted)}")
print()
 
# NumPy comparison
eigenvalues_numpy = np.linalg.eigvals(A)
print("NumPy eigenvalues:")
print(f"  {sorted(eigenvalues_numpy)}")

Practical Enhancements

The production QR algorithm includes: (1) Reduction to Hessenberg form first (upper triangular plus one subdiagonal) to reduce cost per iteration from O(n³) to O(n²), (2) Shifts to accelerate convergence, (3) Implicit QR steps for numerical stability, (4) Deflation when subdiagonal elements become negligible. These enhancements make the algorithm O(n³) overall.

Algorithms for Symmetric Matrices

Symmetric matrices (A = Aᵀ) are ubiquitous in ML—covariance matrices, kernels, graph Laplacians. Their special structure enables faster, more stable algorithms.

Key advantages:

All eigenvalues are real (no complex arithmetic needed)
Eigenvectors are orthogonal (orthonormal basis)
Matrix is always diagonalizable (no defective cases)
Can use symmetric QR or divide-and-conquer algorithms

Algorithms for Symmetric Eigenvalue Problems
Algorithm	Use Case	Complexity	Notes
Symmetric QR	All eigenvalues, moderate n	O(n³)	Reliable, always works
Divide and Conquer	All eigenvalues, large n	O(n³) but faster in practice	Excellent for large dense matrices
Jacobi Method	High accuracy needed	O(n³) but slow	Best accuracy, parallelizable
MRRR (Multiple Relatively Robust Representations)	All eigenpairs efficiently	O(n²)	Newest, used by LAPACK
Lanczos	Extreme eigenvalues, sparse matrices	O(k·nnz)	k = number of eigenvalues wanted

symmetric_eigensolvers.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import numpy as np
from scipy import linalg
import time
 
def compare_symmetric_solvers(n=500):
    """
    Compare different approaches for symmetric eigenvalue problems.
    """
    # Create symmetric positive definite matrix
    np.random.seed(42)
    X = np.random.randn(n, n)
    A = X @ X.T  # Guaranteed symmetric positive semi-definite
    
    print(f"Testing symmetric eigensolvers on {n}x{n} matrix")
    print("-" * 60)
    
    # Method 1: numpy.linalg.eig (general purpose)
    start = time.perf_counter()
    eigenvalues_eig, eigenvectors_eig = np.linalg.eig(A)
    time_eig = time.perf_counter() - start
    print(f"np.linalg.eig:         {time_eig:.4f}s")
    
    # Method 2: numpy.linalg.eigh (for symmetric/Hermitian)
    start = time.perf_counter()
    eigenvalues_eigh, eigenvectors_eigh = np.linalg.eigh(A)
    time_eigh = time.perf_counter() - start
    print(f"np.linalg.eigh:        {time_eigh:.4f}s (specialized for symmetric)")
    
    # Method 3: scipy.linalg.eigh with divide-and-conquer
    start = time.perf_counter()
    eigenvalues_dc, eigenvectors_dc = linalg.eigh(A, driver='evd')
    time_dc = time.perf_counter() - start
    print(f"scipy evd (divide&conquer): {time_dc:.4f}s")
    
    # Method 4: scipy.linalg.eigh with MRRR
    start = time.perf_counter()
    eigenvalues_mrrr, eigenvectors_mrrr = linalg.eigh(A, driver='evr')
    time_mrrr = time.perf_counter() - start
    print(f"scipy evr (MRRR):      {time_mrrr:.4f}s")
    
    # Method 5: Only some eigenvalues (e.g., top 10)
    k = 10
    start = time.perf_counter()
    eigenvalues_partial, eigenvectors_partial = linalg.eigh(
        A, subset_by_index=[n-k, n-1]
    )
    time_partial = time.perf_counter() - start
    print(f"scipy (only top {k}):   {time_partial:.4f}s")
    
    print("-" * 60)
    print(f"
Speedup of eigh over eig: {time_eig/time_eigh:.2f}x")
    
    # Verify results are consistent
    eigenvalues_sorted = np.sort(eigenvalues_eigh)[::-1]
    print(f"
Top 5 eigenvalues: {eigenvalues_sorted[:5]}")
    print(f"All methods agree: {np.allclose(np.sort(eigenvalues_eigh), np.sort(eigenvalues_dc))}")
    
    return {
        'eig': time_eig,
        'eigh': time_eigh,
        'divide_conquer': time_dc,
        'mrrr': time_mrrr,
        'partial': time_partial
    }
 
# Run comparison
times = compare_symmetric_solvers(500)
 
# Demonstrate scipy.sparse.linalg.eigsh for sparse matrices
print("
" + "="*60)
print("For SPARSE symmetric matrices (e.g., graph Laplacians):")
print("="*60)
 
from scipy.sparse import random as sparse_random
from scipy.sparse.linalg import eigsh
 
# Create sparse symmetric matrix
n_sparse = 5000
density = 0.01  # 1% nonzero
Sparse = sparse_random(n_sparse, n_sparse, density=density, format='csr')
Sparse = Sparse @ Sparse.T  # Make symmetric
 
# Find only top k eigenvalues
k = 10
start = time.perf_counter()
eigenvalues_sparse, eigenvectors_sparse = eigsh(Sparse, k=k, which='LM')
time_sparse = time.perf_counter() - start
print(f"scipy.sparse.linalg.eigsh for {n_sparse}x{n_sparse} matrix:")
print(f"  Density: {Sparse.nnz / n_sparse**2 * 100:.4f}%")
print(f"  Time for top {k} eigenvalues: {time_sparse:.4f}s")
print(f"  Top eigenvalues: {eigenvalues_sparse[::-1]}")

Always Use eigh for Symmetric Matrices

In Python, always use np.linalg.eigh() or scipy.linalg.eigh() for symmetric/Hermitian matrices instead of eig(). It's faster (exploits structure), more stable (real eigenvalues guaranteed), and returns eigenvalues in sorted order. This is critical for PCA and spectral methods.

Sparse Matrix Methods: Lanczos and Arnoldi

In ML, we frequently encounter sparse matrices—matrices with mostly zero entries. Graph adjacency matrices, document-term matrices, and many kernel matrices are sparse. For these, dense algorithms are wasteful and often infeasible.

The Lanczos algorithm (for symmetric matrices) and Arnoldi algorithm (for general matrices) extend power iteration to find multiple eigenvalues while exploiting sparsity.

Key idea:

Build an orthonormal basis for the Krylov subspace:

K_k(A, v) = span{v, Av, A²v, ..., Aᵏ⁻¹v}

In this subspace, A is represented by a small tridiagonal (symmetric) or upper Hessenberg (general) matrix. Finding eigenvalues of this small matrix is cheap.

lanczos_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
import numpy as np
from scipy import sparse
from scipy.sparse.linalg import eigsh, eigs
import matplotlib.pyplot as plt
 
def simple_lanczos(A, k, v0=None):
    """
    Simplified Lanczos algorithm for symmetric matrices.
    Builds a tridiagonal matrix T whose eigenvalues approximate A's.
    
    Parameters:
        A: Symmetric matrix (can be sparse)
        k: Number of Lanczos iterations (= size of Krylov subspace)
        v0: Starting vector
    
    Returns:
        T: Tridiagonal matrix
        Q: Orthonormal Lanczos vectors
    """
    n = A.shape[0]
    
    # Handle sparse matrices
    if sparse.issparse(A):
        matvec = lambda x: A @ x
    else:
        matvec = lambda x: A @ x
    
    # Initialize
    if v0 is None:
        v0 = np.random.randn(n)
    v = v0 / np.linalg.norm(v0)
    
    # Storage
    Q = np.zeros((n, k))
    alpha = np.zeros(k)  # Diagonal of T
    beta = np.zeros(k-1)  # Off-diagonal of T
    
    Q[:, 0] = v
    
    # First iteration
    w = matvec(v)
    alpha[0] = np.dot(w, v)
    w = w - alpha[0] * v
    
    for j in range(1, k):
        beta[j-1] = np.linalg.norm(w)
        if beta[j-1] < 1e-12:
            print(f"Lanczos breakdown at iteration {j}")
            k = j
            break
        
        v = w / beta[j-1]
        Q[:, j] = v
        
        w = matvec(v) - beta[j-1] * Q[:, j-1]
        alpha[j] = np.dot(w, v)
        w = w - alpha[j] * v
        
        # Reorthogonalization (important for numerical stability)
        for i in range(j+1):
            w = w - np.dot(w, Q[:, i]) * Q[:, i]
    
    # Build tridiagonal matrix
    T = np.diag(alpha[:k]) + np.diag(beta[:k-1], 1) + np.diag(beta[:k-1], -1)
    
    return T, Q[:, :k]
 
# Example: Large sparse matrix
n = 1000
density = 0.01
np.random.seed(42)
 
# Create sparse symmetric matrix (e.g., graph Laplacian structure)
data = sparse.random(n, n, density=density, format='csr')
A_sparse = data + data.T  # Make symmetric
 
print(f"Sparse matrix: {n}x{n}, {A_sparse.nnz} nonzeros ({100*A_sparse.nnz/n**2:.2f}%)")
print()
 
# Run Lanczos
k = 50  # Number of Lanczos iterations
T, Q = simple_lanczos(A_sparse, k)
 
# Eigenvalues of T approximate extreme eigenvalues of A
eigenvalues_T = np.linalg.eigvalsh(T)
 
# Compare with scipy's eigsh
eigenvalues_actual, _ = eigsh(A_sparse, k=10, which='BE')  # Both extremes
 
print("Lanczos approximation (from tridiagonal T):")
print(f"  Largest: {max(eigenvalues_T):.6f}")
print(f"  Smallest: {min(eigenvalues_T):.6f}")
print()
print("scipy.eigsh result:")
print(f"  Largest: {max(eigenvalues_actual):.6f}")
print(f"  Smallest: {min(eigenvalues_actual):.6f}")
 
# Visualize convergence: how Lanczos eigenvalues converge to true values
all_true_eigenvalues = np.linalg.eigvalsh(A_sparse.toarray())
 
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Left: Eigenvalue distribution
ax = axes[0]
ax.hist(all_true_eigenvalues, bins=50, alpha=0.7, label='True eigenvalues')
ax.hist(eigenvalues_T, bins=30, alpha=0.7, label=f'Lanczos (k={k})')
ax.set_xlabel('Eigenvalue')
ax.set_ylabel('Count')
ax.set_title('Eigenvalue Distribution')
ax.legend()
ax.grid(True, alpha=0.3)
 
# Right: Lanczos captures extremes
ax = axes[1]
ax.plot(sorted(all_true_eigenvalues), 'b-', label='True (sorted)', alpha=0.5)
ax.scatter([0, n-1], [min(all_true_eigenvalues), max(all_true_eigenvalues)], 
          c='blue', s=100, zorder=5)
ax.axhline(y=min(eigenvalues_T), color='red', linestyle='--', 
          label=f'Lanczos min: {min(eigenvalues_T):.3f}')
ax.axhline(y=max(eigenvalues_T), color='red', linestyle='--', 
          label=f'Lanczos max: {max(eigenvalues_T):.3f}')
ax.set_xlabel('Index (sorted)')
ax.set_ylabel('Eigenvalue')
ax.set_title('Lanczos Captures Extreme Eigenvalues')
ax.legend()
ax.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('lanczos_demo.png', dpi=150)
plt.show()

When to Use Sparse Methods

Use Lanczos/Arnoldi (scipy.sparse.linalg.eigsh/eigs) when: (1) Matrix is sparse or only available as a matrix-vector product, (2) You only need a few eigenvalues (not all), (3) Matrix is too large to store densely. These are common in spectral clustering, graph analysis, and kernel methods.

Practical Considerations for ML

When computing eigenvalues in ML applications, several practical issues arise:

Common Pitfalls and Solutions

•Numerical precision: Eigenvalues near zero may be computed as small positive or negative due to floating-point errors. For positive semi-definite matrices (like covariance), clip negative eigenvalues to zero.
•Eigenvector sign ambiguity: Eigenvectors are only unique up to sign. If v is an eigenvector, so is -v. For reproducibility, adopt a convention (e.g., first nonzero component positive).
•Ordering: eigh() returns eigenvalues in ascending order; eig() has no guaranteed order. Always sort if needed.
•Complex vs real: For real non-symmetric matrices, eigenvalues may be complex. Use real Schur decomposition if you want real arithmetic only.
•Memory for large matrices: For PCA on huge datasets, use incremental/streaming PCA (sklearn.decomposition.IncrementalPCA) instead of computing the full covariance matrix.
•Condition number: If max(|λ|)/min(|λ|) is huge, the matrix is ill-conditioned. Small eigenvalues are computed less accurately. Consider regularization.

practical_tips.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import numpy as np
 
def safe_pca_eigenvalues(X, n_components=None):
    """
    Compute eigenvalues for PCA with practical safeguards.
    
    Parameters:
        X: Data matrix (n_samples, n_features), assumed centered
        n_components: Number of components to return (default: all)
    
    Returns:
        eigenvalues: Sorted in descending order
        eigenvectors: Corresponding eigenvectors
        explained_variance_ratio: Fraction of variance per component
    """
    n_samples, n_features = X.shape
    
    # Compute covariance matrix (use n_samples - 1 for unbiased estimate)
    cov = X.T @ X / (n_samples - 1)
    
    # Use eigh for symmetric matrix
    eigenvalues, eigenvectors = np.linalg.eigh(cov)
    
    # eigh returns ascending order; reverse to descending
    eigenvalues = eigenvalues[::-1]
    eigenvectors = eigenvectors[:, ::-1]
    
    # SAFEGUARD 1: Clip negative eigenvalues (numerical artifact)
    eigenvalues = np.maximum(eigenvalues, 0)
    
    # SAFEGUARD 2: Make eigenvector signs consistent
    # Convention: largest absolute component is positive
    for i in range(eigenvectors.shape[1]):
        max_abs_idx = np.argmax(np.abs(eigenvectors[:, i]))
        if eigenvectors[max_abs_idx, i] < 0:
            eigenvectors[:, i] *= -1
    
    # Compute explained variance ratio
    total_variance = eigenvalues.sum()
    if total_variance > 0:
        explained_variance_ratio = eigenvalues / total_variance
    else:
        explained_variance_ratio = np.zeros_like(eigenvalues)
    
    # Select top n_components
    if n_components is not None:
        eigenvalues = eigenvalues[:n_components]
        eigenvectors = eigenvectors[:, :n_components]
        explained_variance_ratio = explained_variance_ratio[:n_components]
    
    return eigenvalues, eigenvectors, explained_variance_ratio
 
# Example usage
np.random.seed(42)
 
# Generate some data with known structure
n_samples = 1000
n_features = 10
 
# True low-rank structure: 3 important components
latent = np.random.randn(n_samples, 3)
projection = np.random.randn(3, n_features)
X = latent @ projection + 0.5 * np.random.randn(n_samples, n_features)
 
# Center the data
X_centered = X - X.mean(axis=0)
 
# Compute PCA eigenstructure
eigenvalues, eigenvectors, explained_ratio = safe_pca_eigenvalues(X_centered)
 
print("PCA Eigenvalue Analysis")
print("=" * 50)
print(f"
Top eigenvalues: {eigenvalues[:5].round(4)}")
print(f"
Explained variance ratio:")
for i, ratio in enumerate(explained_ratio[:5]):
    cumulative = explained_ratio[:i+1].sum()
    print(f"  PC{i+1}: {ratio*100:.2f}% (cumulative: {cumulative*100:.2f}%)")
 
print(f"
Note: First 3 components explain {explained_ratio[:3].sum()*100:.2f}%")
print("(Matches our 3-dimensional latent structure)")
 
# Demonstrate condition number issue
print("
" + "=" * 50)
print("Condition Number Analysis")
print("=" * 50)
 
cov = X_centered.T @ X_centered / (n_samples - 1)
condition_number = max(eigenvalues) / max(eigenvalues[-1], 1e-10)
print(f"Condition number: {condition_number:.2e}")
 
if condition_number > 1e10:
    print("WARNING: Matrix is ill-conditioned. Consider regularization.")

Summary: Computing Eigenvalues

We've explored the computational side of eigenvalue problems. Let's consolidate:

Key Takeaways

•Avoid the characteristic polynomial — It's numerically unstable and doesn't scale. Modern algorithms work directly with the matrix.
•Power iteration finds the dominant eigenvalue — Simple, low memory, good for sparse matrices. Convergence depends on spectral gap.
•Inverse and shift-invert iteration target specific eigenvalues — Find smallest eigenvalue or eigenvalue closest to a target. Essential for spectral clustering.
•QR algorithm finds all eigenvalues — Industry standard, O(n³) complexity. Enhanced versions use shifts and Hessenberg reduction.
•Use eigh() for symmetric matrices — Faster, more stable, guaranteed real eigenvalues. Essential for PCA and kernel methods.
•Lanczos algorithm handles sparse matrices — Builds a small tridiagonal approximation. Perfect when you only need extreme eigenvalues.
•Watch for numerical issues — Clip negative eigenvalues for PSD matrices, handle sign ambiguity, and monitor condition numbers.

Choosing the Right Eigenvalue Algorithm
Situation	Algorithm	Python Function
Dense, general matrix	QR algorithm	np.linalg.eig(A)
Dense, symmetric matrix	Symmetric QR / Divide-and-Conquer	np.linalg.eigh(A)
Sparse, symmetric, few eigenvalues	Lanczos	scipy.sparse.linalg.eigsh(A, k)
Sparse, general, few eigenvalues	Arnoldi	scipy.sparse.linalg.eigs(A, k)
Only dominant eigenvalue	Power iteration	Custom implementation
Eigenvalue near target σ	Shift-invert	eigsh(A, sigma=σ, k)

What's Next:

Now that we can compute eigenvalues, the next page explores diagonalization—writing A = VΛV⁻¹. We'll learn when diagonalization is possible, how to check for it, and why it dramatically simplifies matrix operations. Diagonalization is the key to understanding matrix exponentials, powers, and the spectral theorem.

Page Complete

You now understand the practical algorithms for computing eigenvalues—from power iteration to QR to Lanczos. You know which algorithm to use for different matrix types and sizes, and how to handle numerical issues. Next, we'll explore diagonalization and its profound implications.

Computing Eigenvalues

From Theory to Computation

What You Will Learn

The Characteristic Polynomial Method

We start with the textbook approach: form the characteristic polynomial and find its roots.

Procedure:

Form the matrix (A - λI)
Compute det(A - λI) as a polynomial in λ
Solve det(A - λI) = 0 for λ

For a 2×2 matrix:

$$A = \begin{pmatrix} a & b \ c & d \end{pmatrix}$$

$$\det(A - \lambda I) = (a-\lambda)(d-\lambda) - bc = \lambda^2 - (a+d)\lambda + (ad-bc)$$

Using the quadratic formula:

$$\lambda = \frac{(a+d) \pm \sqrt{(a+d)^2 - 4(ad-bc)}}{2} = \frac{\text{tr}(A) \pm \sqrt{\text{tr}(A)^2 - 4\det(A)}}{2}$$

characteristic_polynomial_method.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import numpy as np
 
def eigenvalues_2x2(A):
    """
    Compute eigenvalues of a 2x2 matrix using the characteristic polynomial.
    
    For A = [[a, b], [c, d]]:
    λ = (trace ± sqrt(trace² - 4*det)) / 2
    """
    a, b = A[0, 0], A[0, 1]
    c, d = A[1, 0], A[1, 1]
    
    trace = a + d
    det = a * d - b * c
    
    discriminant = trace**2 - 4 * det
    
    if discriminant >= 0:
        sqrt_disc = np.sqrt(discriminant)
        λ1 = (trace + sqrt_disc) / 2
        λ2 = (trace - sqrt_disc) / 2
        return λ1, λ2, "Real"
    else:
        sqrt_disc = np.sqrt(-discriminant)
        λ1 = complex(trace/2, sqrt_disc/2)
        λ2 = complex(trace/2, -sqrt_disc/2)
        return λ1, λ2, "Complex"
 
def eigenvalues_3x3(A):
    """
    Compute eigenvalues of a 3x3 matrix using Cardano's formula.
    The characteristic polynomial is: λ³ - tr(A)λ² + ... = 0
    
    This is already getting complicated!
    """
    # Compute coefficients of characteristic polynomial
    # p(λ) = -λ³ + c2*λ² + c1*λ + c0
    
    c2 = np.trace(A)  # Sum of eigenvalues = trace
    
    # c1 = sum of 2x2 principal minors
    c1 = (A[0,0]*A[1,1] - A[0,1]*A[1,0] +
          A[0,0]*A[2,2] - A[0,2]*A[2,0] +
          A[1,1]*A[2,2] - A[1,2]*A[2,1])
    
    c0 = -np.linalg.det(A)  # Negative determinant
    
    # Solve λ³ - c2*λ² - c1*λ + c0 = 0 using numpy
    coeffs = [1, -c2, c1, c0]
    roots = np.roots(coeffs)
    
    return roots
 
# Example
A = np.array([
    [4, 1],
    [2, 3]
])
 
λ1, λ2, eigentype = eigenvalues_2x2(A)
print("2x2 Matrix:")
print(A)
print(f"Eigenvalues ({eigentype}): λ₁ = {λ1}, λ₂ = {λ2}")
print(f"Verification with NumPy: {np.linalg.eigvals(A)}")
print()
 
# 3x3 example
B = np.array([
    [4, 1, 0],
    [2, 3, 1],
    [0, 1, 2]
])
eigenvalues_3x3_result = eigenvalues_3x3(B)
print("3x3 Matrix:")
print(B)
print(f"Eigenvalues: {eigenvalues_3x3_result}")
print(f"Verification with NumPy: {np.linalg.eigvals(B)}")

Why This Method Fails in Practice

The numerical instability problem:

Wilkinson's polynomial is a famous example where a perturbation of 10⁻²³ in one coefficient causes root errors of order 10⁸.

For this reason, production eigenvalue algorithms never form the characteristic polynomial explicitly. Instead, they use iterative matrix transformations.

Power Iteration: Finding the Dominant Eigenvalue

Algorithm:

Start with a random vector v₀
Repeatedly compute vₖ₊₁ = A·vₖ / ‖A·vₖ‖
The vectors vₖ converge to the dominant eigenvector
The ratio ‖A·vₖ‖/‖vₖ‖ converges to the dominant eigenvalue

Why it works:

Express v₀ in the eigenvector basis: v₀ = c₁e₁ + c₂e₂ + ... + cₙeₙ

After k iterations:

Aᵏv₀ = c₁λ₁ᵏe₁ + c₂λ₂ᵏe₂ + ... + cₙλₙᵏeₙ

If |λ₁| > |λ₂| ≥ ... ≥ |λₙ|, then λ₁ᵏ dominates as k → ∞:

Aᵏv₀ ≈ c₁λ₁ᵏe₁

So Aᵏv₀ tends toward the direction of e₁.

power_iteration.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import numpy as np
import matplotlib.pyplot as plt
 
def power_iteration(A, num_iterations=100, tol=1e-10):
    """
    Power iteration to find the dominant eigenvalue and eigenvector.
    
    Parameters:
        A: Square matrix
        num_iterations: Maximum number of iterations
        tol: Convergence tolerance
    
    Returns:
        eigenvalue: Dominant eigenvalue
        eigenvector: Corresponding eigenvector (normalized)
        history: List of eigenvalue estimates at each iteration
    """
    n = A.shape[0]
    
    # Start with random vector
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)
    
    eigenvalue_history = []
    
    for i in range(num_iterations):
        # Multiply by A
        Av = A @ v
        
        # Estimate eigenvalue via Rayleigh quotient
        eigenvalue = np.dot(v, Av)  # v^T A v / v^T v, but v is normalized
        eigenvalue_history.append(eigenvalue)
        
        # Normalize
        Av_norm = np.linalg.norm(Av)
        if Av_norm < 1e-15:
            raise ValueError("Matrix may be singular or v is in null space")
        
        v_new = Av / Av_norm
        
        # Check convergence
        if i > 0 and abs(eigenvalue_history[-1] - eigenvalue_history[-2]) < tol:
            print(f"Converged after {i+1} iterations")
            break
        
        v = v_new
    
    return eigenvalue, v, eigenvalue_history
 
# Example
A = np.array([
    [4, 1, 0],
    [1, 3, 1],
    [0, 1, 2]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Power iteration
dominant_eigenvalue, dominant_eigenvector, history = power_iteration(A)
 
print(f"Power iteration result:")
print(f"  Dominant eigenvalue: {dominant_eigenvalue:.10f}")
print(f"  Dominant eigenvector: {dominant_eigenvector}")
print()
 
# Compare with NumPy
eigenvalues, eigenvectors = np.linalg.eig(A)
idx = np.argmax(np.abs(eigenvalues))
print(f"NumPy result:")
print(f"  All eigenvalues: {eigenvalues}")
print(f"  Dominant eigenvalue: {eigenvalues[idx]:.10f}")
print(f"  Dominant eigenvector: {eigenvectors[:, idx]}")
print()
 
# Verify Av = λv
Av = A @ dominant_eigenvector
λv = dominant_eigenvalue * dominant_eigenvector
print(f"Verification (should be ~0): ||Av - λv|| = {np.linalg.norm(Av - λv):.2e}")
 
# Plot convergence
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(history, 'b-o', markersize=3)
plt.axhline(y=eigenvalues[idx], color='r', linestyle='--', label='True value')
plt.xlabel('Iteration')
plt.ylabel('Eigenvalue Estimate')
plt.title('Power Iteration Convergence')
plt.legend()
plt.grid(True, alpha=0.3)
 
plt.subplot(1, 2, 2)
errors = np.abs(np.array(history) - eigenvalues[idx])
plt.semilogy(errors, 'g-o', markersize=3)
plt.xlabel('Iteration')
plt.ylabel('|Error| (log scale)')
plt.title('Error Convergence (Linear on Log Scale)')
plt.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('power_iteration_convergence.png', dpi=150)
plt.show()

Convergence rate:

Limitations:

Only finds the dominant eigenvalue
Slow if eigenvalues are close in magnitude
Fails if dominant eigenvalue is complex (for real matrices)
Requires starting vector to have non-zero component along dominant eigenvector

Power Iteration in Practice: PageRank

Inverse Iteration and Shift-Invert

Power iteration finds the largest eigenvalue. What if we want the smallest? Or an eigenvalue near a specific value?

Inverse iteration:

Apply power iteration to A⁻¹. The eigenvalues of A⁻¹ are 1/λᵢ. The largest eigenvalue of A⁻¹ is 1/λₘᵢₙ, so inverse iteration finds the smallest eigenvalue of A.

Algorithm: Solve Avₖ₊₁ = vₖ at each step (instead of multiplying by A⁻¹, which is expensive).

Shift-invert iteration:

To find the eigenvalue closest to a target σ, apply inverse iteration to (A - σI)⁻¹.

The eigenvalues of (A - σI)⁻¹ are 1/(λᵢ - σ). The largest of these corresponds to the λᵢ closest to σ.

This is extremely powerful: if you have a good estimate σ of an eigenvalue, shift-invert converges very fast.

inverse_iteration.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import numpy as np
from scipy import linalg
 
def inverse_iteration(A, num_iterations=100, tol=1e-10):
    """
    Inverse iteration to find the smallest eigenvalue.
    """
    n = A.shape[0]
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)
    
    # LU decomposition for efficient solving
    lu, piv = linalg.lu_factor(A)
    
    for i in range(num_iterations):
        # Solve A @ v_new = v (instead of v_new = A^(-1) @ v)
        v_new = linalg.lu_solve((lu, piv), v)
        
        # Normalize
        v_new = v_new / np.linalg.norm(v_new)
        
        # Estimate eigenvalue via Rayleigh quotient
        eigenvalue = np.dot(v_new, A @ v_new)
        
        if np.linalg.norm(v_new - v) < tol or np.linalg.norm(v_new + v) < tol:
            print(f"Inverse iteration converged after {i+1} iterations")
            break
        
        v = v_new
    
    return eigenvalue, v
 
def shift_invert_iteration(A, sigma, num_iterations=100, tol=1e-10):
    """
    Shift-invert iteration to find eigenvalue closest to sigma.
    """
    n = A.shape[0]
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)
    
    # Shifted matrix: (A - σI)
    A_shifted = A - sigma * np.eye(n)
    
    # LU decomposition for efficient solving
    lu, piv = linalg.lu_factor(A_shifted)
    
    for i in range(num_iterations):
        # Solve (A - σI) @ v_new = v
        v_new = linalg.lu_solve((lu, piv), v)
        
        # Normalize
        v_new = v_new / np.linalg.norm(v_new)
        
        # Estimate eigenvalue via Rayleigh quotient on ORIGINAL matrix
        eigenvalue = np.dot(v_new, A @ v_new)
        
        if np.linalg.norm(v_new - v) < tol or np.linalg.norm(v_new + v) < tol:
            print(f"Shift-invert converged after {i+1} iterations")
            break
        
        v = v_new
    
    return eigenvalue, v
 
# Example
A = np.array([
    [5, 1, 0],
    [1, 3, 1],
    [0, 1, 1]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Get all eigenvalues for reference
all_eigenvalues = np.linalg.eigvals(A)
print(f"All eigenvalues: {sorted(all_eigenvalues)}")
print()
 
# Inverse iteration: find smallest eigenvalue
smallest_λ, smallest_v = inverse_iteration(A)
print(f"Smallest eigenvalue (inverse iteration): {smallest_λ:.10f}")
print(f"True smallest: {min(all_eigenvalues):.10f}")
print()
 
# Shift-invert: find eigenvalue closest to 3.0
target = 3.0
closest_λ, closest_v = shift_invert_iteration(A, target)
print(f"Eigenvalue closest to {target} (shift-invert): {closest_λ:.10f}")
# Find actual closest
actual_closest = all_eigenvalues[np.argmin(np.abs(all_eigenvalues - target))]
print(f"True closest to {target}: {actual_closest:.10f}")

Computational Cost

The QR Algorithm: Industry Standard

The QR algorithm is the workhorse of eigenvalue computation. It finds all eigenvalues of a matrix and is the foundation of numpy.linalg.eig and LAPACK's eigenvalue routines.

Basic QR iteration:

Start with A₀ = A
Compute QR decomposition: Aₖ = QₖRₖ
Form next iterate: Aₖ₊₁ = RₖQₖ
Repeat until Aₖ converges to upper triangular (or block diagonal)
Eigenvalues appear on the diagonal

Why it works:

Aₖ₊₁ = RₖQₖ = Qₖ⁻¹(QₖRₖ)Qₖ = Qₖ⁻¹AₖQₖ = Qₖᵀ Aₖ Qₖ

So Aₖ₊₁ is similar to Aₖ (same eigenvalues). The iterates converge to Schur form, where eigenvalues appear on the diagonal.

qr_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import numpy as np
 
def basic_qr_algorithm(A, num_iterations=100, tol=1e-10):
    """
    Basic QR algorithm for eigenvalue computation.
    
    Returns the matrix converged toward upper triangular form,
    whose diagonal contains the eigenvalues.
    """
    Ak = A.copy().astype(float)
    n = A.shape[0]
    
    for i in range(num_iterations):
        # QR decomposition
        Q, R = np.linalg.qr(Ak)
        
        # Form next iterate: A_{k+1} = R @ Q
        Ak_new = R @ Q
        
        # Check convergence (off-diagonal elements should go to 0)
        off_diagonal_norm = np.linalg.norm(np.tril(Ak_new, -1))
        
        if off_diagonal_norm < tol:
            print(f"QR algorithm converged after {i+1} iterations")
            break
        
        Ak = Ak_new
    
    # Eigenvalues are on the diagonal
    eigenvalues = np.diag(Ak)
    
    return eigenvalues, Ak
 
def qr_algorithm_with_shift(A, num_iterations=1000, tol=1e-10):
    """
    QR algorithm with Wilkinson shift for faster convergence.
    
    The shift accelerates convergence dramatically for the 
    smallest eigenvalue at each step.
    """
    Ak = A.copy().astype(float)
    n = A.shape[0]
    eigenvalues = []
    
    for m in range(n, 1, -1):
        for i in range(num_iterations):
            # Wilkinson shift: eigenvalue of bottom 2x2 block
            # closer to A[m-1, m-1]
            a = Ak[m-2, m-2]
            b = Ak[m-2, m-1]
            c = Ak[m-1, m-2]
            d = Ak[m-1, m-1]
            
            trace = a + d
            det = a * d - b * c
            discriminant = trace**2 / 4 - det
            
            if discriminant >= 0:
                sqrt_disc = np.sqrt(discriminant)
                λ1 = trace/2 + sqrt_disc
                λ2 = trace/2 - sqrt_disc
                # Choose shift closer to d
                shift = λ1 if abs(λ1 - d) < abs(λ2 - d) else λ2
            else:
                shift = d  # Fallback
            
            # Shifted QR step
            Am_shifted = Ak[:m, :m] - shift * np.eye(m)
            Q, R = np.linalg.qr(Am_shifted)
            Ak[:m, :m] = R @ Q + shift * np.eye(m)
            
            # Check if deflation is possible
            if abs(Ak[m-1, m-2]) < tol:
                eigenvalues.append(Ak[m-1, m-1])
                break
        else:
            eigenvalues.append(Ak[m-1, m-1])
    
    eigenvalues.append(Ak[0, 0])
    
    return np.array(eigenvalues)
 
# Example
A = np.array([
    [4, 1, 0, 0],
    [1, 3, 1, 0],
    [0, 1, 2, 1],
    [0, 0, 1, 1]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Basic QR algorithm
eigenvalues_basic, A_final = basic_qr_algorithm(A)
print("Basic QR algorithm:")
print(f"  Eigenvalues: {sorted(eigenvalues_basic)}")
print(f"  Final matrix (should be upper triangular):")
print(A_final.round(6))
print()
 
# QR with shift
eigenvalues_shifted = qr_algorithm_with_shift(A)
print("QR with shift:")
print(f"  Eigenvalues: {sorted(eigenvalues_shifted)}")
print()
 
# NumPy comparison
eigenvalues_numpy = np.linalg.eigvals(A)
print("NumPy eigenvalues:")
print(f"  {sorted(eigenvalues_numpy)}")

Practical Enhancements

Algorithms for Symmetric Matrices

Symmetric matrices (A = Aᵀ) are ubiquitous in ML—covariance matrices, kernels, graph Laplacians. Their special structure enables faster, more stable algorithms.

Key advantages:

All eigenvalues are real (no complex arithmetic needed)
Eigenvectors are orthogonal (orthonormal basis)
Matrix is always diagonalizable (no defective cases)
Can use symmetric QR or divide-and-conquer algorithms

Algorithms for Symmetric Eigenvalue Problems
Algorithm	Use Case	Complexity	Notes
Symmetric QR	All eigenvalues, moderate n	O(n³)	Reliable, always works
Divide and Conquer	All eigenvalues, large n	O(n³) but faster in practice	Excellent for large dense matrices
Jacobi Method	High accuracy needed	O(n³) but slow	Best accuracy, parallelizable
MRRR (Multiple Relatively Robust Representations)	All eigenpairs efficiently	O(n²)	Newest, used by LAPACK
Lanczos	Extreme eigenvalues, sparse matrices	O(k·nnz)	k = number of eigenvalues wanted

symmetric_eigensolvers.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import numpy as np
from scipy import linalg
import time
 
def compare_symmetric_solvers(n=500):
    """
    Compare different approaches for symmetric eigenvalue problems.
    """
    # Create symmetric positive definite matrix
    np.random.seed(42)
    X = np.random.randn(n, n)
    A = X @ X.T  # Guaranteed symmetric positive semi-definite
    
    print(f"Testing symmetric eigensolvers on {n}x{n} matrix")
    print("-" * 60)
    
    # Method 1: numpy.linalg.eig (general purpose)
    start = time.perf_counter()
    eigenvalues_eig, eigenvectors_eig = np.linalg.eig(A)
    time_eig = time.perf_counter() - start
    print(f"np.linalg.eig:         {time_eig:.4f}s")
    
    # Method 2: numpy.linalg.eigh (for symmetric/Hermitian)
    start = time.perf_counter()
    eigenvalues_eigh, eigenvectors_eigh = np.linalg.eigh(A)
    time_eigh = time.perf_counter() - start
    print(f"np.linalg.eigh:        {time_eigh:.4f}s (specialized for symmetric)")
    
    # Method 3: scipy.linalg.eigh with divide-and-conquer
    start = time.perf_counter()
    eigenvalues_dc, eigenvectors_dc = linalg.eigh(A, driver='evd')
    time_dc = time.perf_counter() - start
    print(f"scipy evd (divide&conquer): {time_dc:.4f}s")
    
    # Method 4: scipy.linalg.eigh with MRRR
    start = time.perf_counter()
    eigenvalues_mrrr, eigenvectors_mrrr = linalg.eigh(A, driver='evr')
    time_mrrr = time.perf_counter() - start
    print(f"scipy evr (MRRR):      {time_mrrr:.4f}s")
    
    # Method 5: Only some eigenvalues (e.g., top 10)
    k = 10
    start = time.perf_counter()
    eigenvalues_partial, eigenvectors_partial = linalg.eigh(
        A, subset_by_index=[n-k, n-1]
    )
    time_partial = time.perf_counter() - start
    print(f"scipy (only top {k}):   {time_partial:.4f}s")
    
    print("-" * 60)
    print(f"
Speedup of eigh over eig: {time_eig/time_eigh:.2f}x")
    
    # Verify results are consistent
    eigenvalues_sorted = np.sort(eigenvalues_eigh)[::-1]
    print(f"
Top 5 eigenvalues: {eigenvalues_sorted[:5]}")
    print(f"All methods agree: {np.allclose(np.sort(eigenvalues_eigh), np.sort(eigenvalues_dc))}")
    
    return {
        'eig': time_eig,
        'eigh': time_eigh,
        'divide_conquer': time_dc,
        'mrrr': time_mrrr,
        'partial': time_partial
    }
 
# Run comparison
times = compare_symmetric_solvers(500)
 
# Demonstrate scipy.sparse.linalg.eigsh for sparse matrices
print("
" + "="*60)
print("For SPARSE symmetric matrices (e.g., graph Laplacians):")
print("="*60)
 
from scipy.sparse import random as sparse_random
from scipy.sparse.linalg import eigsh
 
# Create sparse symmetric matrix
n_sparse = 5000
density = 0.01  # 1% nonzero
Sparse = sparse_random(n_sparse, n_sparse, density=density, format='csr')
Sparse = Sparse @ Sparse.T  # Make symmetric
 
# Find only top k eigenvalues
k = 10
start = time.perf_counter()
eigenvalues_sparse, eigenvectors_sparse = eigsh(Sparse, k=k, which='LM')
time_sparse = time.perf_counter() - start
print(f"scipy.sparse.linalg.eigsh for {n_sparse}x{n_sparse} matrix:")
print(f"  Density: {Sparse.nnz / n_sparse**2 * 100:.4f}%")
print(f"  Time for top {k} eigenvalues: {time_sparse:.4f}s")
print(f"  Top eigenvalues: {eigenvalues_sparse[::-1]}")

Always Use eigh for Symmetric Matrices

Sparse Matrix Methods: Lanczos and Arnoldi

The Lanczos algorithm (for symmetric matrices) and Arnoldi algorithm (for general matrices) extend power iteration to find multiple eigenvalues while exploiting sparsity.

Key idea:

Build an orthonormal basis for the Krylov subspace:

K_k(A, v) = span{v, Av, A²v, ..., Aᵏ⁻¹v}

In this subspace, A is represented by a small tridiagonal (symmetric) or upper Hessenberg (general) matrix. Finding eigenvalues of this small matrix is cheap.

lanczos_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
import numpy as np
from scipy import sparse
from scipy.sparse.linalg import eigsh, eigs
import matplotlib.pyplot as plt
 
def simple_lanczos(A, k, v0=None):
    """
    Simplified Lanczos algorithm for symmetric matrices.
    Builds a tridiagonal matrix T whose eigenvalues approximate A's.
    
    Parameters:
        A: Symmetric matrix (can be sparse)
        k: Number of Lanczos iterations (= size of Krylov subspace)
        v0: Starting vector
    
    Returns:
        T: Tridiagonal matrix
        Q: Orthonormal Lanczos vectors
    """
    n = A.shape[0]
    
    # Handle sparse matrices
    if sparse.issparse(A):
        matvec = lambda x: A @ x
    else:
        matvec = lambda x: A @ x
    
    # Initialize
    if v0 is None:
        v0 = np.random.randn(n)
    v = v0 / np.linalg.norm(v0)
    
    # Storage
    Q = np.zeros((n, k))
    alpha = np.zeros(k)  # Diagonal of T
    beta = np.zeros(k-1)  # Off-diagonal of T
    
    Q[:, 0] = v
    
    # First iteration
    w = matvec(v)
    alpha[0] = np.dot(w, v)
    w = w - alpha[0] * v
    
    for j in range(1, k):
        beta[j-1] = np.linalg.norm(w)
        if beta[j-1] < 1e-12:
            print(f"Lanczos breakdown at iteration {j}")
            k = j
            break
        
        v = w / beta[j-1]
        Q[:, j] = v
        
        w = matvec(v) - beta[j-1] * Q[:, j-1]
        alpha[j] = np.dot(w, v)
        w = w - alpha[j] * v
        
        # Reorthogonalization (important for numerical stability)
        for i in range(j+1):
            w = w - np.dot(w, Q[:, i]) * Q[:, i]
    
    # Build tridiagonal matrix
    T = np.diag(alpha[:k]) + np.diag(beta[:k-1], 1) + np.diag(beta[:k-1], -1)
    
    return T, Q[:, :k]
 
# Example: Large sparse matrix
n = 1000
density = 0.01
np.random.seed(42)
 
# Create sparse symmetric matrix (e.g., graph Laplacian structure)
data = sparse.random(n, n, density=density, format='csr')
A_sparse = data + data.T  # Make symmetric
 
print(f"Sparse matrix: {n}x{n}, {A_sparse.nnz} nonzeros ({100*A_sparse.nnz/n**2:.2f}%)")
print()
 
# Run Lanczos
k = 50  # Number of Lanczos iterations
T, Q = simple_lanczos(A_sparse, k)
 
# Eigenvalues of T approximate extreme eigenvalues of A
eigenvalues_T = np.linalg.eigvalsh(T)
 
# Compare with scipy's eigsh
eigenvalues_actual, _ = eigsh(A_sparse, k=10, which='BE')  # Both extremes
 
print("Lanczos approximation (from tridiagonal T):")
print(f"  Largest: {max(eigenvalues_T):.6f}")
print(f"  Smallest: {min(eigenvalues_T):.6f}")
print()
print("scipy.eigsh result:")
print(f"  Largest: {max(eigenvalues_actual):.6f}")
print(f"  Smallest: {min(eigenvalues_actual):.6f}")
 
# Visualize convergence: how Lanczos eigenvalues converge to true values
all_true_eigenvalues = np.linalg.eigvalsh(A_sparse.toarray())
 
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Left: Eigenvalue distribution
ax = axes[0]
ax.hist(all_true_eigenvalues, bins=50, alpha=0.7, label='True eigenvalues')
ax.hist(eigenvalues_T, bins=30, alpha=0.7, label=f'Lanczos (k={k})')
ax.set_xlabel('Eigenvalue')
ax.set_ylabel('Count')
ax.set_title('Eigenvalue Distribution')
ax.legend()
ax.grid(True, alpha=0.3)
 
# Right: Lanczos captures extremes
ax = axes[1]
ax.plot(sorted(all_true_eigenvalues), 'b-', label='True (sorted)', alpha=0.5)
ax.scatter([0, n-1], [min(all_true_eigenvalues), max(all_true_eigenvalues)], 
          c='blue', s=100, zorder=5)
ax.axhline(y=min(eigenvalues_T), color='red', linestyle='--', 
          label=f'Lanczos min: {min(eigenvalues_T):.3f}')
ax.axhline(y=max(eigenvalues_T), color='red', linestyle='--', 
          label=f'Lanczos max: {max(eigenvalues_T):.3f}')
ax.set_xlabel('Index (sorted)')
ax.set_ylabel('Eigenvalue')
ax.set_title('Lanczos Captures Extreme Eigenvalues')
ax.legend()
ax.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('lanczos_demo.png', dpi=150)
plt.show()

When to Use Sparse Methods

Practical Considerations for ML

When computing eigenvalues in ML applications, several practical issues arise:

Common Pitfalls and Solutions

•Numerical precision: Eigenvalues near zero may be computed as small positive or negative due to floating-point errors. For positive semi-definite matrices (like covariance), clip negative eigenvalues to zero.
•Eigenvector sign ambiguity: Eigenvectors are only unique up to sign. If v is an eigenvector, so is -v. For reproducibility, adopt a convention (e.g., first nonzero component positive).
•Ordering: eigh() returns eigenvalues in ascending order; eig() has no guaranteed order. Always sort if needed.
•Complex vs real: For real non-symmetric matrices, eigenvalues may be complex. Use real Schur decomposition if you want real arithmetic only.
•Memory for large matrices: For PCA on huge datasets, use incremental/streaming PCA (sklearn.decomposition.IncrementalPCA) instead of computing the full covariance matrix.
•Condition number: If max(|λ|)/min(|λ|) is huge, the matrix is ill-conditioned. Small eigenvalues are computed less accurately. Consider regularization.

practical_tips.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import numpy as np
 
def safe_pca_eigenvalues(X, n_components=None):
    """
    Compute eigenvalues for PCA with practical safeguards.
    
    Parameters:
        X: Data matrix (n_samples, n_features), assumed centered
        n_components: Number of components to return (default: all)
    
    Returns:
        eigenvalues: Sorted in descending order
        eigenvectors: Corresponding eigenvectors
        explained_variance_ratio: Fraction of variance per component
    """
    n_samples, n_features = X.shape
    
    # Compute covariance matrix (use n_samples - 1 for unbiased estimate)
    cov = X.T @ X / (n_samples - 1)
    
    # Use eigh for symmetric matrix
    eigenvalues, eigenvectors = np.linalg.eigh(cov)
    
    # eigh returns ascending order; reverse to descending
    eigenvalues = eigenvalues[::-1]
    eigenvectors = eigenvectors[:, ::-1]
    
    # SAFEGUARD 1: Clip negative eigenvalues (numerical artifact)
    eigenvalues = np.maximum(eigenvalues, 0)
    
    # SAFEGUARD 2: Make eigenvector signs consistent
    # Convention: largest absolute component is positive
    for i in range(eigenvectors.shape[1]):
        max_abs_idx = np.argmax(np.abs(eigenvectors[:, i]))
        if eigenvectors[max_abs_idx, i] < 0:
            eigenvectors[:, i] *= -1
    
    # Compute explained variance ratio
    total_variance = eigenvalues.sum()
    if total_variance > 0:
        explained_variance_ratio = eigenvalues / total_variance
    else:
        explained_variance_ratio = np.zeros_like(eigenvalues)
    
    # Select top n_components
    if n_components is not None:
        eigenvalues = eigenvalues[:n_components]
        eigenvectors = eigenvectors[:, :n_components]
        explained_variance_ratio = explained_variance_ratio[:n_components]
    
    return eigenvalues, eigenvectors, explained_variance_ratio
 
# Example usage
np.random.seed(42)
 
# Generate some data with known structure
n_samples = 1000
n_features = 10
 
# True low-rank structure: 3 important components
latent = np.random.randn(n_samples, 3)
projection = np.random.randn(3, n_features)
X = latent @ projection + 0.5 * np.random.randn(n_samples, n_features)
 
# Center the data
X_centered = X - X.mean(axis=0)
 
# Compute PCA eigenstructure
eigenvalues, eigenvectors, explained_ratio = safe_pca_eigenvalues(X_centered)
 
print("PCA Eigenvalue Analysis")
print("=" * 50)
print(f"
Top eigenvalues: {eigenvalues[:5].round(4)}")
print(f"
Explained variance ratio:")
for i, ratio in enumerate(explained_ratio[:5]):
    cumulative = explained_ratio[:i+1].sum()
    print(f"  PC{i+1}: {ratio*100:.2f}% (cumulative: {cumulative*100:.2f}%)")
 
print(f"
Note: First 3 components explain {explained_ratio[:3].sum()*100:.2f}%")
print("(Matches our 3-dimensional latent structure)")
 
# Demonstrate condition number issue
print("
" + "=" * 50)
print("Condition Number Analysis")
print("=" * 50)
 
cov = X_centered.T @ X_centered / (n_samples - 1)
condition_number = max(eigenvalues) / max(eigenvalues[-1], 1e-10)
print(f"Condition number: {condition_number:.2e}")
 
if condition_number > 1e10:
    print("WARNING: Matrix is ill-conditioned. Consider regularization.")

Summary: Computing Eigenvalues

We've explored the computational side of eigenvalue problems. Let's consolidate:

Key Takeaways

•Avoid the characteristic polynomial — It's numerically unstable and doesn't scale. Modern algorithms work directly with the matrix.
•Power iteration finds the dominant eigenvalue — Simple, low memory, good for sparse matrices. Convergence depends on spectral gap.
•Inverse and shift-invert iteration target specific eigenvalues — Find smallest eigenvalue or eigenvalue closest to a target. Essential for spectral clustering.
•QR algorithm finds all eigenvalues — Industry standard, O(n³) complexity. Enhanced versions use shifts and Hessenberg reduction.
•Use eigh() for symmetric matrices — Faster, more stable, guaranteed real eigenvalues. Essential for PCA and kernel methods.
•Lanczos algorithm handles sparse matrices — Builds a small tridiagonal approximation. Perfect when you only need extreme eigenvalues.
•Watch for numerical issues — Clip negative eigenvalues for PSD matrices, handle sign ambiguity, and monitor condition numbers.

Choosing the Right Eigenvalue Algorithm
Situation	Algorithm	Python Function
Dense, general matrix	QR algorithm	np.linalg.eig(A)
Dense, symmetric matrix	Symmetric QR / Divide-and-Conquer	np.linalg.eigh(A)
Sparse, symmetric, few eigenvalues	Lanczos	scipy.sparse.linalg.eigsh(A, k)
Sparse, general, few eigenvalues	Arnoldi	scipy.sparse.linalg.eigs(A, k)
Only dominant eigenvalue	Power iteration	Custom implementation
Eigenvalue near target σ	Shift-invert	eigsh(A, sigma=σ, k)

What's Next:

Page Complete