Matrix Decompositions - Learning Module

Loading content...

0/245

Eigendecomposition

Decomposing Matrices by Their Fixed Directions

When a linear transformation acts on a vector, it typically changes both the direction and magnitude. But for certain special vectors—the eigenvectors—the transformation only scales, leaving the direction unchanged (or exactly reversed). These invariant directions are the key to understanding what a matrix does.

Eigendecomposition factorizes a square matrix into its eigenvectors and eigenvalues, revealing the fundamental structure of the linear transformation. Unlike SVD which works for any matrix, eigendecomposition requires square matrices and may involve complex numbers—but in exchange, it provides unique insights into dynamics, stability, and long-term behavior.

From PageRank's random walk analysis to quantum mechanics' measurement operators, from differential equation solutions to Markov chain steady states, eigendecomposition is the analytical tool that translates matrix algebra into interpretable, actionable understanding.

What You Will Learn

By the end of this page, you will understand: (1) The definition of eigenvalues, eigenvectors, and eigenspaces, (2) When eigendecomposition exists and when it doesn't, (3) Special properties for symmetric and positive definite matrices, (4) How to compute eigendecomposition, (5) The spectral theorem and diagonalization, and (6) Key applications in ML including PCA, spectral clustering, and Markov chains.

Eigenvalues, Eigenvectors, and Eigenspaces

The core definition:

For a square matrix A ∈ ℝⁿˣⁿ (or ℂⁿˣⁿ), a scalar λ is an eigenvalue of A if there exists a non-zero vector v such that:

$$Av = \lambda v$$

The vector v is called an eigenvector corresponding to eigenvalue λ.

Interpretation: Multiplying v by A has the same effect as multiplying by the scalar λ. The direction of v is preserved (or exactly reversed if λ < 0); only the magnitude changes.

The eigenspace:

For each eigenvalue λ, the set of all eigenvectors (plus the zero vector) forms a subspace called the eigenspace: $$E_\lambda = {v : Av = \lambda v} = \text{Null}(A - \lambda I)$$

The dimension of Eλ is called the geometric multiplicity of λ.

Finding eigenvalues:

Rearranging Av = λv: $$(A - \lambda I)v = 0$$

For a non-zero solution v to exist, (A - λI) must be singular, which means: $$\det(A - \lambda I) = 0$$

This polynomial equation in λ is called the characteristic polynomial. For an n×n matrix, it's a degree-n polynomial with exactly n roots (counting multiplicity, possibly complex).

Algebraic vs Geometric Multiplicity

The algebraic multiplicity of λ is its multiplicity as a root of the characteristic polynomial. The geometric multiplicity is the dimension of the eigenspace. We always have: geometric ≤ algebraic. When they're unequal, the matrix is called defective and cannot be diagonalized.

Key properties of eigenvalues:

Sum of eigenvalues = Trace(A) (sum of diagonal entries)
Product of eigenvalues = det(A)
Eigenvalues of Aᵏ are λᵏ (eigenvectors remain the same)
Eigenvalues of A⁻¹ are 1/λ (for invertible A)
Eigenvalues of A + cI are λ + c
Real symmetric matrices have real eigenvalues
Eigenvalues of Aᵀ equal eigenvalues of A

Complex eigenvalues:

For real matrices, complex eigenvalues always come in conjugate pairs: if λ = a + bi is an eigenvalue, so is λ̄ = a - bi. The corresponding eigenvectors are also complex conjugates.

eigenvalue_computation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import numpy as np
 
# Example: A 3x3 matrix with real and complex eigenvalues
A = np.array([
    [4, -2, 0],
    [1,  1, 0],
    [0,  0, 3]
])
 
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
 
print("Matrix A:")
print(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"\nEigenvectors (columns):")
print(eigenvectors)
 
# Verify the eigenvalue equation: Av = λv
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lam = eigenvalues[i]
    Av = A @ v
    lam_v = lam * v
    print(f"\nλ_{i+1} = {lam:.4f}")
    print(f"Av = {Av}")
    print(f"λv = {lam_v}")
    print(f"Match: {np.allclose(Av, lam_v)}")
 
# Verify trace and determinant properties
print(f"\nSum of eigenvalues: {np.sum(eigenvalues):.4f}")
print(f"Trace of A: {np.trace(A):.4f}")
print(f"Product of eigenvalues: {np.prod(eigenvalues):.4f}")
print(f"Determinant of A: {np.linalg.det(A):.4f}")
 
# Example with complex eigenvalues (rotation matrix)
theta = np.pi / 4  # 45 degrees
R = np.array([
    [np.cos(theta), -np.sin(theta)],
    [np.sin(theta),  np.cos(theta)]
])
 
eig_R, _ = np.linalg.eig(R)
print(f"\nRotation matrix (45°) eigenvalues: {eig_R}")
print("Note: Complex eigenvalues indicate rotation without stretching")

Diagonalization: The Eigendecomposition

If we can find n linearly independent eigenvectors for an n×n matrix A, we can express A in a remarkably simple form.

The eigendecomposition:

Let v₁, v₂, ..., vₙ be linearly independent eigenvectors with eigenvalues λ₁, λ₂, ..., λₙ. Form:

Q = [v₁ | v₂ | ... | vₙ] (eigenvectors as columns)
Λ = diag(λ₁, λ₂, ..., λₙ) (eigenvalues on diagonal)

Then: $$A = Q \Lambda Q^{-1}$$

Why this works:

Since Avᵢ = λᵢvᵢ for each eigenvector: $$AQ = A[v_1 | v_2 | ... | v_n] = [\lambda_1 v_1 | \lambda_2 v_2 | ... | \lambda_n v_n] = Q\Lambda$$

Multiplying both sides by Q⁻¹: A = QΛQ⁻¹

When is diagonalization possible?

A matrix is diagonalizable if and only if:

It has n linearly independent eigenvectors, OR equivalently,
For every eigenvalue, geometric multiplicity = algebraic multiplicity

Matrices that are NOT diagonalizable (defective matrices):

The classic example: $$A = \begin{pmatrix} 1 & 1 \ 0 & 1 \end{pmatrix}$$

This has eigenvalue λ = 1 with algebraic multiplicity 2, but geometric multiplicity 1 (only one independent eigenvector). It cannot be diagonalized over ℝ or ℂ.

The Power of Diagonalization

Once you have A = QΛQ⁻¹, computing matrix powers becomes trivial: A^n = QΛⁿQ⁻¹. Since Λⁿ just raises each diagonal entry to the n-th power, this is O(n) operations instead of O(n³) for naive matrix multiplication. This is essential for analyzing dynamical systems and solving recurrences.

Applications of A = QΛQ⁻¹:

1. Matrix powers: $$A^k = Q\Lambda^k Q^{-1}$$ Λᵏ = diag(λ₁ᵏ, λ₂ᵏ, ..., λₙᵏ)

2. Matrix exponential: $$e^A = Qe^\Lambda Q^{-1}$$ exp(Λ) = diag(eᵏ¹, eᵏ², ..., eᵏⁿ)

This is crucial for solving systems of linear ODEs: dx/dt = Ax.

3. Matrix functions in general: For any function f that can be applied to scalars: $$f(A) = Qf(\Lambda)Q^{-1} = Q \cdot \text{diag}(f(\lambda_1), ..., f(\lambda_n)) \cdot Q^{-1}$$

4. Understanding long-term behavior: If |λᵢ| < 1 for all eigenvalues, then Aᵏ → 0 as k → ∞. If |λᵢ| ≤ 1 with equality only for simple eigenvalues, the system is stable.

matrix_diagonalization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import numpy as np
 
def verify_diagonalization(A):
    """
    Verify eigendecomposition A = Q Λ Q^(-1)
    """
    n = A.shape[0]
    eigenvalues, Q = np.linalg.eig(A)
    
    # Create diagonal matrix
    Lambda = np.diag(eigenvalues)
    
    # Reconstruct A
    Q_inv = np.linalg.inv(Q)
    A_reconstructed = Q @ Lambda @ Q_inv
    
    print("Original A:")
    print(A)
    print(f"\nEigenvalues: {eigenvalues}")
    print("\nQ (eigenvectors):")
    print(Q)
    print("\nΛ (eigenvalue diagonal):")
    print(Lambda)
    print(f"\nReconstruction error: {np.linalg.norm(A - A_reconstructed):.2e}")
    
    return eigenvalues, Q
 
# Diagonalizable example
A1 = np.array([
    [4, 1],
    [2, 3]
])
print("=== Diagonalizable Matrix ===")
verify_diagonalization(A1)
 
# Computing matrix powers efficiently
print("\n=== Matrix Powers via Eigendecomposition ===")
eigenvalues, Q = np.linalg.eig(A1)
Lambda = np.diag(eigenvalues)
Q_inv = np.linalg.inv(Q)
 
k = 100  # Compute A^100
 
# Method 1: Direct (inefficient for large k)
A_power_direct = np.linalg.matrix_power(A1, k)
 
# Method 2: Via eigendecomposition (efficient)
Lambda_k = np.diag(eigenvalues ** k)
A_power_eigen = Q @ Lambda_k @ Q_inv
 
print(f"A^{k} via direct computation:")
print(A_power_direct)
print(f"\nA^{k} via eigendecomposition:")
print(np.real(A_power_eigen))  # Real part (imaginary should be ~0)
print(f"\nDifference: {np.linalg.norm(A_power_direct - A_power_eigen):.2e}")
 
# Non-diagonalizable (defective) example
print("\n=== Non-Diagonalizable Matrix ===")
A_defective = np.array([
    [1, 1],
    [0, 1]
])
eigenvalues_d, Q_d = np.linalg.eig(A_defective)
print("Matrix:")
print(A_defective)
print(f"Eigenvalues: {eigenvalues_d}")
print("Q (note: columns are NOT independent):")
print(Q_d)
print(f"Condition number of Q: {np.linalg.cond(Q_d):.2e}")
print("(Very high condition number indicates near-singularity)")

The Spectral Theorem: Symmetric Matrices

Symmetric matrices (A = Aᵀ) have exceptionally nice properties that make eigendecomposition particularly powerful in machine learning.

The Spectral Theorem:

For any real symmetric matrix A ∈ ℝⁿˣⁿ with A = Aᵀ:

All eigenvalues are real (no complex numbers needed)
Eigenvectors corresponding to distinct eigenvalues are orthogonal
A is always diagonalizable (never defective)
There exists an orthonormal basis of eigenvectors

This means every symmetric matrix can be written as: $$A = Q\Lambda Q^T$$

where Q is orthogonal (QᵀQ = QQᵀ = I) and Λ is diagonal with real entries.

Why this matters for ML:

Many matrices in ML are symmetric by construction:

Covariance matrices: Σ = E[(X-μ)(X-μ)ᵀ]
Gram matrices: K = XXᵀ or XᵀX
Graph Laplacians: L = D - A (degree minus adjacency)
Hessian matrices: ∂²f/∂xᵢ∂xⱼ (for smooth functions)

For all of these, eigendecomposition is numerically stable, produces real results, and yields orthogonal eigenvectors for easy interpretation.

Orthogonality Simplifies Everything

When Q is orthogonal: Q⁻¹ = Qᵀ. This means we never need to compute a matrix inverse—just transpose! The decomposition A = QΛQᵀ involves only well-conditioned operations, making it numerically robust.

Positive definite matrices:

A symmetric matrix A is positive definite if xᵀAx > 0 for all non-zero x. Equivalently:

All eigenvalues are strictly positive: λᵢ > 0
A = BᵀB for some invertible B
All leading principal minors are positive

A is positive semi-definite (PSD) if xᵀAx ≥ 0, which means λᵢ ≥ 0.

Properties of PSD matrices:

Sum of PSD matrices is PSD
Product with transpose: AᵀA is always PSD
Diagonal entries are non-negative
Can be written as A = BᵀB

The Rayleigh quotient:

For symmetric A, the Rayleigh quotient is: $$R(x) = \frac{x^T A x}{x^T x}$$

This function is minimized by the smallest eigenvector (value = smallest eigenvalue) and maximized by the largest eigenvector (value = largest eigenvalue): $$\lambda_{min} = \min_{x \neq 0} R(x), \quad \lambda_{max} = \max_{x \neq 0} R(x)$$

This is the foundation of power iteration and many optimization algorithms.

spectral_theorem.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import numpy as np
 
def analyze_symmetric_matrix(A):
    """
    Analyze properties of a symmetric matrix using spectral theorem.
    """
    # Verify symmetry
    assert np.allclose(A, A.T), "Matrix is not symmetric!"
    
    # Use eigh for symmetric matrices (more stable, guaranteed real)
    eigenvalues, Q = np.linalg.eigh(A)
    
    # eigh returns in ascending order, often we want descending
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    Q = Q[:, idx]
    
    print("Symmetric Matrix Analysis")
    print("=" * 50)
    print("Matrix A:")
    print(A)
    print(f"\nEigenvalues: {eigenvalues}")
    print(f"All real: {np.all(np.isreal(eigenvalues))}")
    
    # Check positive definiteness
    if np.all(eigenvalues > 0):
        print("Definiteness: Positive Definite")
    elif np.all(eigenvalues >= 0):
        print("Definiteness: Positive Semi-Definite")
    elif np.all(eigenvalues < 0):
        print("Definiteness: Negative Definite")
    else:
        print("Definiteness: Indefinite")
    
    # Verify orthogonality of eigenvectors
    print(f"\nQ (eigenvectors):")
    print(Q)
    print(f"\nQ^T Q (should be identity):")
    print(np.round(Q.T @ Q, 10))
    
    # Verify reconstruction
    Lambda = np.diag(eigenvalues)
    A_reconstructed = Q @ Lambda @ Q.T
    print(f"\nReconstruction error ||A - QΛQ^T||: {np.linalg.norm(A - A_reconstructed):.2e}")
    
    return eigenvalues, Q
 
# Example 1: Positive definite matrix (covariance)
np.random.seed(42)
X = np.random.randn(5, 3)
Cov = X.T @ X  # Always PSD
print("=== Positive Definite Covariance Matrix ===")
analyze_symmetric_matrix(Cov)
 
# Example 2: Graph Laplacian
print("\n=== Graph Laplacian ===")
# Adjacency matrix for a simple graph
Adj = np.array([
    [0, 1, 1, 0],
    [1, 0, 1, 1],
    [1, 1, 0, 1],
    [0, 1, 1, 0]
])
Degree = np.diag(Adj.sum(axis=1))
Laplacian = Degree - Adj
analyze_symmetric_matrix(Laplacian)
 
# Rayleigh quotient demonstration
print("\n=== Rayleigh Quotient ===")
A = np.array([
    [4, 1, 1],
    [1, 3, 0],
    [1, 0, 2]
])
eigenvalues, Q = np.linalg.eigh(A)
 
for i, (lam, v) in enumerate(zip(eigenvalues, Q.T)):
    rayleigh = (v @ A @ v) / (v @ v)
    print(f"Eigenvector {i+1}: eigenvalue = {lam:.4f}, Rayleigh quotient = {rayleigh:.4f}")

Computing Eigendecomposition: Algorithms and Complexity

Computing eigenvalues and eigenvectors is a fundamental numerical problem with well-developed algorithms.

The challenge:

While the characteristic polynomial det(A - λI) = 0 defines eigenvalues, finding polynomial roots is numerically unstable for degree ≥ 5 (Abel-Ruffini theorem shows no general closed form). Modern algorithms avoid explicit polynomial computation.

Core algorithms:

1. Power Iteration (for largest eigenvalue):

Start with random vector x₀
Iterate: xₖ₊₁ = Axₖ / ||Axₖ||
Converges to eigenvector of largest |λ|
Rate: geometric, ratio |λ₂/λ₁|

2. Inverse Iteration (for eigenvalue near μ):

Solve (A - μI)xₖ₊₁ = xₖ at each step
Converges to eigenvector of eigenvalue closest to μ
Very fast when μ is a good guess

3. QR Algorithm (for all eigenvalues):

Iterate: Aₖ = QₖRₖ (QR factorization), Aₖ₊₁ = RₖQₖ
Aₖ converges to upper triangular (Schur form)
Eigenvalues appear on diagonal
O(n³) per iteration, typically O(n) iterations

4. Symmetric QR / Jacobi:

Specialized for symmetric matrices
Preserves symmetry, converges to diagonal
Extremely stable for positive definite matrices

Library Choice Matters

NumPy provides np.linalg.eig (general matrices) and np.linalg.eigh (symmetric/Hermitian). The 'h' version is faster, more stable, and guarantees real eigenvalues—always use it for symmetric matrices! SciPy adds even more specialized routines for banded, sparse, and tridiagonal matrices.

Computational complexity:

Algorithm	Time	Notes
Full eigendecomposition (dense)	O(n³)	Standard for small-medium
QR algorithm	O(n³)	Dominant for general matrices
Power iteration (1 eigenpair)	O(n²) per iteration	Simple, parallelizable
Lanczos (k eigenpairs, symmetric)	O(kn²)	For large sparse matrices
Arnoldi (k eigenpairs, general)	O(kn²)	For large sparse matrices

For large sparse matrices:

When n is large but the matrix is sparse (few non-zero entries), iterative methods like Lanczos and Arnoldi compute only the dominant eigenpairs without forming dense intermediate matrices. These are essential for:

Graph analysis (millions of nodes)
Natural language processing (term-document matrices)
Spectral clustering on large datasets

power_iteration.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import numpy as np
from scipy.sparse.linalg import eigsh
 
def power_iteration(A, num_iterations=100, tol=1e-10):
    """
    Power iteration to find the largest eigenvalue/eigenvector.
    
    Returns:
        eigenvalue: Largest eigenvalue (by magnitude)
        eigenvector: Corresponding unit eigenvector
        history: List of eigenvalue estimates per iteration
    """
    n = A.shape[0]
    
    # Random initialization
    x = np.random.randn(n)
    x = x / np.linalg.norm(x)
    
    history = []
    
    for i in range(num_iterations):
        # Power step
        y = A @ x
        
        # Estimate eigenvalue via Rayleigh quotient
        eigenvalue = x @ y  # Since x is unit norm
        history.append(eigenvalue)
        
        # Normalize
        x_new = y / np.linalg.norm(y)
        
        # Check convergence
        if np.linalg.norm(x_new - x) < tol:
            print(f"Converged in {i+1} iterations")
            return eigenvalue, x_new, history
        
        x = x_new
    
    print(f"Reached max iterations ({num_iterations})")
    return eigenvalue, x, history
 
# Test on a symmetric matrix
np.random.seed(42)
n = 100
A = np.random.randn(n, n)
A = A @ A.T  # Make symmetric positive semidefinite
 
# Power iteration
lam_pow, v_pow, history = power_iteration(A)
 
# Compare with numpy
eigenvalues_np, eigenvectors_np = np.linalg.eigh(A)
lam_max = eigenvalues_np[-1]  # Largest (eigh returns ascending)
 
print(f"\nPower iteration eigenvalue: {lam_pow:.6f}")
print(f"NumPy largest eigenvalue:   {lam_max:.6f}")
print(f"Relative error: {abs(lam_pow - lam_max) / lam_max:.2e}")
 
# For sparse matrices, use scipy.sparse.linalg.eigsh
from scipy.sparse import random as sparse_random
 
print("\n=== Sparse Eigenvalue Computation ===")
n_large = 1000
k_eigenpairs = 5
A_sparse = sparse_random(n_large, n_large, density=0.01, format='csr')
A_sparse = A_sparse @ A_sparse.T  # Symmetric
 
# Get top k eigenvalues
eigenvalues_sparse, eigenvectors_sparse = eigsh(A_sparse, k=k_eigenpairs, which='LM')
print(f"Matrix size: {n_large} x {n_large}")
print(f"Top {k_eigenpairs} eigenvalues: {eigenvalues_sparse[::-1]}")

Geometric Interpretation of Eigendecomposition

Beyond algebraic manipulation, eigendecomposition reveals deep geometric structure.

Eigenvectors as invariant directions:

When A acts on space, most vectors get rotated. Eigenvectors are the special directions that only get scaled:

λ > 1: stretching (vectors grow)
0 < λ < 1: compression (vectors shrink)
λ < 0: reflection + scaling (vectors flip and scale)
λ = 0: collapse to zero (in the null space)

For symmetric matrices, eigenvectors are mutually orthogonal, creating a natural coordinate system aligned with the matrix's action.

The ellipsoid picture for symmetric A:

Consider the quadratic form q(x) = xᵀAx for positive definite A.

The level set {x : xᵀAx = 1} is an ellipsoid:

Axes aligned with eigenvectors
Semi-axis lengths are 1/√λᵢ
Larger eigenvalues → shorter axes (tighter constraint in that direction)

This is fundamental to understanding:

Confidence ellipses in statistics
Newton's method convergence
Regularization in optimization

Condition number:

The ratio κ(A) = |λₘₐₓ|/|λₘᵢₙ| (condition number) measures how "stretched" the ellipsoid is:

κ ≈ 1: nearly spherical, well-conditioned
κ >> 1: elongated, ill-conditioned

Ill-conditioned matrices cause numerical instability and slow convergence in iterative methods.

Condition Number in Practice

The condition number matters whenever you solve Ax = b or optimize quadratics. For κ = 10⁶, solving the linear system can lose about 6 digits of precision. Preconditioning (transforming to reduce κ) is a critical technique for large-scale ML optimization.

Eigenvalues and matrix dynamics:

For discrete dynamical systems xₖ₊₁ = Axₖ:

Stable if all |λᵢ| < 1 (trajectories converge to 0)
Unstable if any |λᵢ| > 1 (trajectories diverge)
Marginally stable if all |λᵢ| ≤ 1 with any |λᵢ| = 1 being simple

The dominant eigenvalue (largest |λ|) determines long-term behavior: $$x_k \approx \lambda_1^k (c_1 v_1) \text{ as } k \to \infty$$

where v₁ is the dominant eigenvector and c₁ is determined by initial conditions.

Complex eigenvalues indicate oscillation:

If λ = re^(iθ) with θ ≠ 0, π:

r controls growth/decay
θ controls rotation/oscillation frequency

This is why complex eigenvalues come in conjugate pairs for real matrices—they represent oscillatory modes.

eigenvalue_dynamics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import numpy as np
import matplotlib.pyplot as plt
 
def visualize_matrix_dynamics(A, x0, num_steps=50):
    """
    Visualize trajectory of discrete dynamical system x_{k+1} = Ax_k
    """
    eigenvalues, _ = np.linalg.eig(A)
    
    # Simulate trajectory
    trajectory = [x0]
    x = x0.copy()
    for _ in range(num_steps):
        x = A @ x
        trajectory.append(x.copy())
    trajectory = np.array(trajectory)
    
    # Stability analysis
    max_eig = np.max(np.abs(eigenvalues))
    if max_eig < 1:
        stability = "Stable (converges to 0)"
    elif max_eig == 1:
        stability = "Marginally stable"
    else:
        stability = "Unstable (diverges)"
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Plot trajectory
    axes[0].plot(trajectory[:, 0], trajectory[:, 1], 'b.-', alpha=0.7)
    axes[0].plot(x0[0], x0[1], 'go', markersize=10, label='Start')
    axes[0].plot(trajectory[-1, 0], trajectory[-1, 1], 'r*', 
                 markersize=15, label='End')
    axes[0].axhline(y=0, color='k', linewidth=0.5)
    axes[0].axvline(x=0, color='k', linewidth=0.5)
    axes[0].set_xlabel('x₁')
    axes[0].set_ylabel('x₂')
    axes[0].set_title(f'Trajectory: {stability}')
    axes[0].legend()
    axes[0].axis('equal')
    
    # Plot eigenvalues in complex plane
    theta = np.linspace(0, 2*np.pi, 100)
    axes[1].plot(np.cos(theta), np.sin(theta), 'k--', 
                 alpha=0.3, label='Unit circle')
    axes[1].scatter(eigenvalues.real, eigenvalues.imag, 
                    s=100, c='red', zorder=5)
    for i, lam in enumerate(eigenvalues):
        axes[1].annotate(f'λ{i+1}={lam:.2f}', (lam.real, lam.imag),
                        textcoords="offset points", xytext=(10, 5))
    axes[1].axhline(y=0, color='k', linewidth=0.5)
    axes[1].axvline(x=0, color='k', linewidth=0.5)
    axes[1].set_xlim(-2, 2)
    axes[1].set_ylim(-2, 2)
    axes[1].set_aspect('equal')
    axes[1].set_xlabel('Real')
    axes[1].set_ylabel('Imaginary')
    axes[1].set_title('Eigenvalues in Complex Plane')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()
    
    print(f"Eigenvalues: {eigenvalues}")
    print(f"Max |eigenvalue|: {max_eig:.4f}")
 
# Example 1: Stable (contraction)
A_stable = 0.9 * np.array([[np.cos(0.3), -np.sin(0.3)],
                            [np.sin(0.3),  np.cos(0.3)]])
x0 = np.array([2, 1])
print("=== Stable System (Rotation + Contraction) ===")
visualize_matrix_dynamics(A_stable, x0)
 
# Example 2: Unstable (expansion)
A_unstable = 1.1 * np.eye(2) + 0.1 * np.array([[0, 1], [-1, 0]])
print("\n=== Unstable System (Spiral outward) ===")
visualize_matrix_dynamics(A_unstable, x0, num_steps=30)
 
# Example 3: Saddle (one stable, one unstable direction)
A_saddle = np.array([[1.2, 0], [0, 0.8]])
print("\n=== Saddle Point (Mixed stability) ===")
visualize_matrix_dynamics(A_saddle, x0)

Eigendecomposition in Machine Learning

Eigendecomposition is not just mathematical elegance—it's the computational engine behind many ML methods.

1. Principal Component Analysis (PCA):

PCA finds orthogonal directions of maximum variance. Given centered data matrix X:

Covariance matrix: C = XᵀX / (n-1)
Principal components = eigenvectors of C (sorted by eigenvalue)
Explained variance ratio = eigenvalue / sum of eigenvalues

Eigendecomposition of the symmetric PSD matrix C gives ordered directions of decreasing importance—exactly what PCA needs.

2. Spectral Clustering:

To cluster data based on graph structure:

Build similarity graph with adjacency matrix W
Compute graph Laplacian L = D - W (or normalized versions)
Find bottom k eigenvectors of L (smallest eigenvalues)
Cluster in the k-dimensional eigenvector space

The eigenvectors of L encode cluster structure—points in the same cluster have similar eigenvector coordinates.

3. Kernel PCA:

For nonlinear dimensionality reduction:

Compute kernel matrix K = [k(xᵢ, xⱼ)]
Center the kernel matrix
Eigendecompose K
Project to top k eigenvectors

The eigenvectors of K correspond to principal components in the kernel feature space.

PageRank Connection

Google's PageRank algorithm finds the dominant eigenvector of a modified web graph transition matrix. The eigenvector with eigenvalue 1 gives the stationary distribution of a random surfer—web pages with higher eigenvector components are more 'important'. This made eigendecomposition central to web search.

4. Markov Chains and Stationary Distributions:

For transition matrix P (rows sum to 1):

The stationary distribution π satisfies πP = π
This is an eigenvector problem: πᵀ is the left eigenvector with eigenvalue 1
The second eigenvalue controls mixing time (how fast chain converges)

5. Linear Dynamical Systems:

The system dx/dt = Ax has solution x(t) = exp(At)x₀.

Eigendecomposition gives:

Modes: eigenvectors (independent response patterns)
Growth/decay rates: real parts of eigenvalues
Oscillation frequencies: imaginary parts of eigenvalues

6. Hessian Analysis for Optimization:

At a critical point of f(x), the Hessian H = ∇²f determines:

All eigenvalues > 0 → local minimum
All eigenvalues < 0 → local maximum
Mixed signs → saddle point

The smallest eigenvalue direction indicates where the function curves least—important for understanding loss surface geometry.

spectral_clustering.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import numpy as np
from sklearn.datasets import make_moons
from sklearn.cluster import KMeans, SpectralClustering
import matplotlib.pyplot as plt
 
def spectral_clustering_from_scratch(X, n_clusters, sigma=0.5):
    """
    Implement spectral clustering using eigendecomposition.
    
    Steps:
    1. Build similarity graph (RBF kernel)
    2. Compute normalized Laplacian
    3. Find bottom k eigenvectors
    4. Cluster in eigenvector space
    """
    n = X.shape[0]
    
    # Step 1: Compute similarity matrix (RBF kernel)
    from scipy.spatial.distance import cdist
    distances = cdist(X, X, 'sqeuclidean')
    W = np.exp(-distances / (2 * sigma**2))
    np.fill_diagonal(W, 0)  # No self-loops
    
    # Step 2: Compute normalized Laplacian L = I - D^(-1/2) W D^(-1/2)
    D = np.diag(W.sum(axis=1))
    D_inv_sqrt = np.diag(1.0 / np.sqrt(W.sum(axis=1) + 1e-10))
    L_norm = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt
    
    # Step 3: Eigendecomposition (get smallest eigenvalues)
    eigenvalues, eigenvectors = np.linalg.eigh(L_norm)
    
    # Take the k smallest eigenvectors (excluding first which is constant)
    # For normalized Laplacian, smallest eigenvalue is 0
    Z = eigenvectors[:, :n_clusters]
    
    # Normalize rows
    Z_normalized = Z / np.linalg.norm(Z, axis=1, keepdims=True)
    
    # Step 4: K-means clustering in eigenvector space
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels = kmeans.fit_predict(Z_normalized)
    
    return labels, eigenvalues, Z
 
# Generate non-linearly separable data
X, y_true = make_moons(n_samples=300, noise=0.05, random_state=42)
 
# Our implementation
labels_scratch, eigenvalues, Z = spectral_clustering_from_scratch(X, n_clusters=2)
 
# Sklearn implementation for comparison
spectral = SpectralClustering(n_clusters=2, affinity='rbf', gamma=2, random_state=42)
labels_sklearn = spectral.fit_predict(X)
 
# Visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
 
# Original data with true labels
axes[0, 0].scatter(X[:, 0], X[:, 1], c=y_true, cmap='coolwarm', alpha=0.7)
axes[0, 0].set_title('Original Data (True Labels)')
 
# Our spectral clustering result
axes[0, 1].scatter(X[:, 0], X[:, 1], c=labels_scratch, cmap='coolwarm', alpha=0.7)
axes[0, 1].set_title('Spectral Clustering (Our Implementation)')
 
# Eigenvector embedding
axes[1, 0].scatter(Z[:, 0], Z[:, 1], c=labels_scratch, cmap='coolwarm', alpha=0.7)
axes[1, 0].set_xlabel('Eigenvector 1')
axes[1, 0].set_ylabel('Eigenvector 2')
axes[1, 0].set_title('Eigenvector Space (Linearly Separable!)')
 
# Eigenvalue spectrum
axes[1, 1].bar(range(len(eigenvalues[:20])), eigenvalues[:20])
axes[1, 1].set_xlabel('Index')
axes[1, 1].set_ylabel('Eigenvalue')
axes[1, 1].set_title('Eigenvalue Spectrum (Eigengap at k=2)')
axes[1, 1].axhline(y=eigenvalues[2], color='r', linestyle='--', 
                   label=f'Gap after λ₂')
 
plt.tight_layout()
plt.show()
 
print(f"Clustering accuracy: {np.mean(labels_scratch == y_true):.2%}")
print(f"First few eigenvalues: {eigenvalues[:5].round(4)}")

Summary: Mastering Eigendecomposition

We've explored eigendecomposition comprehensively—from definitions through spectral theory to practical ML applications. Let's consolidate the essential insights.

Key Takeaways

•Eigenvalues and eigenvectors reveal invariant directions of a linear transformation—directions that only get scaled, not rotated.
•Diagonalization A = QΛQ⁻¹ (when possible) makes matrix powers, exponentials, and functions trivial to compute.
•Symmetric matrices have special properties: real eigenvalues, orthogonal eigenvectors, always diagonalizable—central to ML.
•The spectral theorem guarantees A = QΛQᵀ for symmetric matrices, with Q orthogonal.
•Geometric interpretation: eigenvalues control stretching/compression along eigenvector directions; condition number measures anisotropy.
•Stability analysis: eigenvalue magnitudes determine convergence/divergence of iterative systems.
•ML applications abound: PCA, spectral clustering, PageRank, Markov chains, kernel methods, Hessian analysis.

What's next:

We'll explore QR Decomposition—a fundamental factorization that underlies eigenvalue algorithms, least squares solvers, and numerical stability techniques. While less glamorous than SVD or eigendecomposition, QR is the computational workhorse that makes many algorithms practical.

Page Complete

You now understand eigendecomposition in depth—when it exists, how to compute it, what it reveals geometrically, and how it powers core ML algorithms. Combined with SVD, you have the two most important matrix decompositions for machine learning.