Eigenvalues And Eigenvectors - Learning Module

Loading content...

0/278

Diagonalization

Reducing Complexity to Its Essence

Matrix operations become dramatically simpler when expressed in the right coordinate system. Diagonalization is the art of finding that coordinate system—a basis of eigenvectors where the matrix becomes diagonal.

Why does this matter? Consider computing A¹⁰⁰ for some matrix A. Direct multiplication requires 99 matrix-matrix products. But if A = VΛV⁻¹, then A¹⁰⁰ = VΛ¹⁰⁰V⁻¹. Since Λ is diagonal, Λ¹⁰⁰ just raises each diagonal entry to the 100th power—trivial!

Diagonalization underlies PCA (the spectral decomposition of covariance matrices), Markov chain analysis (finding steady states), differential equations (solving systems of ODEs), and countless optimization algorithms. It's not just a computational trick—it reveals the intrinsic structure of linear transformations.

What You Will Learn

By the end of this page, you will understand the conditions for diagonalizability, construct the diagonalization A = VΛV⁻¹, recognize when matrices are NOT diagonalizable (defective matrices), master the Spectral Theorem for symmetric matrices, and apply diagonalization to simplify matrix powers, exponentials, and functions.

What is Diagonalization?

A matrix A is diagonalizable if there exists an invertible matrix V and a diagonal matrix Λ such that:

$$\mathbf{A} = \mathbf{V} \mathbf{\Lambda} \mathbf{V}^{-1}$$

Equivalently:

$$\mathbf{V}^{-1} \mathbf{A} \mathbf{V} = \mathbf{\Lambda}$$

The columns of V are eigenvectors of A, and the diagonal entries of Λ are the corresponding eigenvalues:

$$\mathbf{V} = \begin{pmatrix} | & | & & | \ \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \ | & | & & | \end{pmatrix}, \quad \mathbf{\Lambda} = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \ 0 & \lambda_2 & \cdots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & 0 & \cdots & \lambda_n \end{pmatrix}$$

Why V Must Be Invertible

For V to be invertible, its columns (eigenvectors) must be linearly independent. This is why we need n linearly independent eigenvectors to diagonalize an n×n matrix. If eigenvectors are dependent (or there aren't enough), diagonalization fails.

Verification:

We can verify A = VΛV⁻¹ by checking Av_i = λ_i v_i for each eigenvector:

$$\mathbf{A}\mathbf{V} = \mathbf{A}\begin{pmatrix} \mathbf{v}_1 & \cdots & \mathbf{v}_n \end{pmatrix} = \begin{pmatrix} \lambda_1\mathbf{v}_1 & \cdots & \lambda_n\mathbf{v}_n \end{pmatrix} = \mathbf{V}\mathbf{\Lambda}$$

Multiplying both sides by V⁻¹ on the right: A = VΛV⁻¹.

Geometric interpretation:

Diagonalization says: "In the coordinate system defined by eigenvectors, A acts as simple scaling along each axis." The transformation V converts from eigenvector coordinates to standard coordinates; Λ scales along eigenvector directions; V⁻¹ converts back.

diagonalization_basics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import numpy as np
 
def diagonalize(A):
    """
    Attempt to diagonalize matrix A.
    
    Returns:
        V: Matrix of eigenvectors (columns)
        Lambda: Diagonal matrix of eigenvalues
        V_inv: Inverse of V
    
    Raises ValueError if matrix is not diagonalizable.
    """
    n = A.shape[0]
    
    # Compute eigenvalues and eigenvectors
    eigenvalues, V = np.linalg.eig(A)
    
    # Check if V is invertible (eigenvectors are linearly independent)
    det_V = np.linalg.det(V)
    if np.abs(det_V) < 1e-10:
        raise ValueError("Matrix is not diagonalizable: eigenvectors are not linearly independent")
    
    # Create diagonal matrix
    Lambda = np.diag(eigenvalues)
    
    # Compute inverse
    V_inv = np.linalg.inv(V)
    
    return V, Lambda, V_inv
 
def verify_diagonalization(A, V, Lambda, V_inv):
    """Verify that A = V @ Lambda @ V_inv."""
    A_reconstructed = V @ Lambda @ V_inv
    
    # Handle potential complex numbers from eigendecomposition
    if np.iscomplexobj(A_reconstructed) and not np.iscomplexobj(A):
        A_reconstructed = A_reconstructed.real
    
    error = np.linalg.norm(A - A_reconstructed)
    return error < 1e-10, error
 
# Example 1: Diagonalizable matrix
A = np.array([
    [4, 1],
    [2, 3]
], dtype=float)
 
print("Example 1: Diagonalizable matrix")
print("=" * 50)
print(f"A =
{A}")
print()
 
V, Lambda, V_inv = diagonalize(A)
 
print(f"Eigenvalues (diagonal of Λ): {np.diag(Lambda)}")
print()
print(f"V (eigenvectors as columns) =
{V}")
print()
print(f"Λ (diagonal matrix) =
{Lambda}")
print()
 
# Verify
is_correct, error = verify_diagonalization(A, V, Lambda, V_inv)
print(f"Verification: A = VΛV⁻¹? {is_correct} (error: {error:.2e})")
print()
 
# Show the power of diagonalization: compute A^10
A_10_direct = np.linalg.matrix_power(A, 10)
Lambda_10 = np.diag(np.diag(Lambda)**10)
A_10_via_diag = V @ Lambda_10 @ V_inv
 
print("Computing A¹⁰:")
print(f"  Direct computation:    A¹⁰[0,0] = {A_10_direct[0,0]:.6f}")
print(f"  Via diagonalization:   VΛ¹⁰V⁻¹[0,0] = {A_10_via_diag[0,0].real:.6f}")
print(f"  Match: {np.allclose(A_10_direct, A_10_via_diag.real)}")

Conditions for Diagonalizability

Not all matrices are diagonalizable. Understanding when diagonalization is possible is crucial.

Theorem: Diagonalizability Conditions

An n×n matrix A is diagonalizable if and only if it has n linearly independent eigenvectors.

This happens when:

A has n distinct eigenvalues (sufficient but not necessary), OR
For each repeated eigenvalue, the geometric multiplicity equals the algebraic multiplicity

Key Terminology

•Algebraic multiplicity of eigenvalue λ: The number of times λ appears as a root of the characteristic polynomial (its multiplicity in det(A-λI)=0).
•Geometric multiplicity of eigenvalue λ: The dimension of the eigenspace Eλ = null(A-λI)—the number of linearly independent eigenvectors for λ.
•Key relationship: For any eigenvalue, geometric multiplicity ≤ algebraic multiplicity.
•Defective matrix: A matrix where some eigenvalue has geometric multiplicity < algebraic multiplicity. Such matrices are NOT diagonalizable.

Examples of Diagonalizable vs Non-Diagonalizable Matrices
Matrix	Eigenvalues	Multiplicities (Alg/Geo)	Diagonalizable?
[[3,0],[0,5]]	λ₁=3, λ₂=5	Each: 1/1	Yes (distinct eigenvalues)
[[2,0],[0,2]]	λ=2 (double)	2/2	Yes (enough eigenvectors)
[[2,1],[0,2]]	λ=2 (double)	2/1 ❌	No (defective)
[[0,-1],[1,0]]	λ=±i (complex)	Each: 1/1	Yes (over ℂ), No (over ℝ)
Symmetric	All real	Geo = Alg always	Always yes

check_diagonalizability.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
import numpy as np
from collections import Counter
 
def analyze_diagonalizability(A, tol=1e-10):
    """
    Analyze whether a matrix is diagonalizable and why.
    
    Returns detailed information about multiplicities.
    """
    n = A.shape[0]
    
    print(f"Matrix A ({n}x{n}):")
    print(A)
    print()
    
    # Get eigenvalues
    eigenvalues = np.linalg.eigvals(A)
    
    # Round eigenvalues for grouping (handle numerical noise)
    def round_complex(z, decimals=8):
        return complex(round(z.real, decimals), round(z.imag, decimals))
    
    rounded_eigenvalues = [round_complex(e) for e in eigenvalues]
    
    # Count algebraic multiplicities
    algebraic_mult = Counter(rounded_eigenvalues)
    
    print("Eigenvalue Analysis:")
    print("-" * 50)
    
    is_diagonalizable = True
    
    for eigenvalue, alg_mult in algebraic_mult.items():
        # Compute geometric multiplicity = dim(null(A - λI))
        shifted = A - eigenvalue * np.eye(n)
        
        # Geometric multiplicity = n - rank(A - λI)
        rank = np.linalg.matrix_rank(shifted, tol=tol)
        geo_mult = n - rank
        
        status = "✓" if geo_mult == alg_mult else "✗"
        print(f"λ = {eigenvalue}")
        print(f"  Algebraic multiplicity: {alg_mult}")
        print(f"  Geometric multiplicity: {geo_mult}")
        print(f"  Status: {status}")
        print()
        
        if geo_mult < alg_mult:
            is_diagonalizable = False
    
    print("-" * 50)
    if is_diagonalizable:
        print("RESULT: Matrix IS diagonalizable")
    else:
        print("RESULT: Matrix is NOT diagonalizable (defective)")
    
    return is_diagonalizable
 
# Example 1: Distinct eigenvalues - always diagonalizable
print("=" * 60)
print("Example 1: Distinct eigenvalues")
print("=" * 60)
A1 = np.array([
    [4, 1],
    [2, 3]
], dtype=float)
analyze_diagonalizability(A1)
 
# Example 2: Repeated eigenvalue, diagonalizable (identity-like)
print("
" + "=" * 60)
print("Example 2: Repeated eigenvalue, IS diagonalizable")
print("=" * 60)
A2 = np.array([
    [2, 0],
    [0, 2]
], dtype=float)
analyze_diagonalizability(A2)
 
# Example 3: Defective matrix (NOT diagonalizable)
print("
" + "=" * 60)
print("Example 3: Defective matrix (Jordan block)")
print("=" * 60)
A3 = np.array([
    [2, 1],
    [0, 2]
], dtype=float)
analyze_diagonalizability(A3)
 
# Example 4: 3x3 with repeated eigenvalue, check carefully
print("
" + "=" * 60)
print("Example 4: 3x3 with repeated eigenvalue")
print("=" * 60)
A4 = np.array([
    [3, 0, 0],
    [0, 3, 0],
    [0, 0, 5]
], dtype=float)
analyze_diagonalizability(A4)

Defective Matrices: The Exception

Defective matrices (like the Jordan block [[2,1],[0,2]]) have repeated eigenvalues but not enough eigenvectors. They require the more general Jordan normal form instead of diagonalization. Fortunately, defective matrices are 'rare' in a precise mathematical sense—almost any symmetric matrix is diagonalizable.

The Spectral Theorem for Symmetric Matrices

The Spectral Theorem is one of the most important results in linear algebra for ML. It guarantees that symmetric matrices have especially nice diagonalizations.

Theorem (Spectral Theorem for Real Symmetric Matrices):

If A is a real symmetric matrix (A = Aᵀ), then:

All eigenvalues of A are real
Eigenvectors corresponding to distinct eigenvalues are orthogonal
A is always diagonalizable (geometric multiplicity = algebraic multiplicity for all eigenvalues)
There exists an orthonormal basis of eigenvectors

Consequently, A can be written as:

$$\mathbf{A} = \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^T$$

where Q is an orthogonal matrix (Qᵀ = Q⁻¹) whose columns are orthonormal eigenvectors.

Why This Matters for PCA

Covariance matrices are symmetric (Σ = Σᵀ) and positive semi-definite. The Spectral Theorem guarantees we can always find orthonormal principal components with real, non-negative eigenvalues. This makes PCA well-defined and numerically stable. The orthogonality means principal components are uncorrelated—each captures independent variation.

spectral_theorem.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
 
def spectral_decomposition(A):
    """
    Compute the spectral decomposition of a symmetric matrix.
    
    For symmetric A: A = Q @ Λ @ Q.T
    where Q is orthogonal (Q.T = Q^{-1}) and Λ is diagonal.
    """
    # Check symmetry
    if not np.allclose(A, A.T):
        raise ValueError("Matrix must be symmetric for spectral decomposition")
    
    # Use eigh for symmetric matrices (guaranteed real eigenvalues, orthogonal eigenvectors)
    eigenvalues, Q = np.linalg.eigh(A)
    
    # Sort by eigenvalue magnitude (descending for PCA-like interpretation)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    Q = Q[:, idx]
    
    Lambda = np.diag(eigenvalues)
    
    return Q, Lambda, eigenvalues
 
def verify_spectral_properties(A, Q, Lambda):
    """
    Verify all properties guaranteed by the Spectral Theorem.
    """
    print("Verifying Spectral Theorem properties:")
    print("-" * 50)
    
    eigenvalues = np.diag(Lambda)
    
    # Property 1: All eigenvalues are real
    all_real = np.allclose(eigenvalues.imag, 0) if np.iscomplexobj(eigenvalues) else True
    print(f"1. All eigenvalues real: {all_real}")
    print(f"   Eigenvalues: {eigenvalues}")
    
    # Property 2: Q is orthogonal (Q.T @ Q = I)
    is_orthogonal = np.allclose(Q.T @ Q, np.eye(Q.shape[1]))
    print(f"2. Q is orthogonal (QᵀQ = I): {is_orthogonal}")
    
    # Property 3: Reconstruction A = QΛQᵀ
    A_reconstructed = Q @ Lambda @ Q.T
    reconstruction_correct = np.allclose(A, A_reconstructed)
    print(f"3. A = QΛQᵀ: {reconstruction_correct}")
    
    # Property 4: Qᵀ = Q⁻¹ (no explicit inverse needed)
    Q_T = Q.T
    Q_inv = np.linalg.inv(Q)
    transpose_equals_inverse = np.allclose(Q_T, Q_inv)
    print(f"4. Qᵀ = Q⁻¹: {transpose_equals_inverse}")
    
    print("-" * 50)
    return all([all_real, is_orthogonal, reconstruction_correct, transpose_equals_inverse])
 
# Example: Covariance matrix (symmetric positive semi-definite)
print("Spectral Decomposition of a Covariance Matrix")
print("=" * 60)
 
# Generate random data and compute covariance
np.random.seed(42)
n_samples, n_features = 100, 4
X = np.random.randn(n_samples, n_features)
 
# Make the data have different variances in different directions
X[:, 0] *= 5  # High variance
X[:, 1] *= 2  # Medium variance
X[:, 2] *= 1  # Unit variance
X[:, 3] *= 0.5  # Low variance
 
# Center the data
X_centered = X - X.mean(axis=0)
 
# Covariance matrix
Sigma = X_centered.T @ X_centered / (n_samples - 1)
 
print("Covariance matrix Σ:")
print(Sigma.round(4))
print()
 
# Spectral decomposition
Q, Lambda, eigenvalues = spectral_decomposition(Sigma)
 
print("
Eigenvectors (columns of Q):")
print(Q.round(4))
print()
 
print("Eigenvalues (diagonal of Λ):")
print(eigenvalues.round(4))
print()
 
# Verify properties
all_verified = verify_spectral_properties(Sigma, Q, Lambda)
print(f"
All Spectral Theorem properties verified: {all_verified}")
 
# Show orthogonality of eigenvectors explicitly
print("
" + "=" * 60)
print("Eigenvector Orthogonality:")
print("=" * 60)
for i in range(len(eigenvalues)):
    for j in range(i+1, len(eigenvalues)):
        dot_product = np.dot(Q[:, i], Q[:, j])
        print(f"v{i+1} · v{j+1} = {dot_product:.2e}")

Spectral decomposition vs eigendecomposition:

Both write A = VΛV⁻¹, but the spectral decomposition is special:

Property	General Eigendecomposition	Spectral Decomposition
Matrix type	Any diagonalizable matrix	Symmetric matrices only
V⁻¹	Must compute inverse	V⁻¹ = Vᵀ (just transpose!)
Eigenvectors	Linearly independent	Orthonormal
Eigenvalues	Can be complex	Always real
Numerical stability	Depends on condition	Highly stable

For PCA, kernel methods, and graph Laplacians, we always work with symmetric matrices and use the spectral decomposition.

Powers and Functions of Matrices

Diagonalization makes computing matrix powers and functions trivial. If A = VΛV⁻¹, then:

Matrix powers: $$\mathbf{A}^k = \mathbf{V} \mathbf{\Lambda}^k \mathbf{V}^{-1}$$

Since Λ is diagonal, Λᵏ just raises each eigenvalue to the kth power: $$\mathbf{\Lambda}^k = \text{diag}(\lambda_1^k, \lambda_2^k, \ldots, \lambda_n^k)$$

Matrix exponential: $$e^{\mathbf{A}} = \mathbf{V} e^{\mathbf{\Lambda}} \mathbf{V}^{-1} = \mathbf{V} \text{diag}(e^{\lambda_1}, e^{\lambda_2}, \ldots, e^{\lambda_n}) \mathbf{V}^{-1}$$

General matrix functions: For any function f analytic at the eigenvalues: $$f(\mathbf{A}) = \mathbf{V} f(\mathbf{\Lambda}) \mathbf{V}^{-1} = \mathbf{V} \text{diag}(f(\lambda_1), f(\lambda_2), \ldots, f(\lambda_n)) \mathbf{V}^{-1}$$

matrix_functions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt
 
def matrix_power_via_diagonalization(A, k):
    """
    Compute A^k using eigendecomposition.
    
    A^k = V @ Λ^k @ V^{-1}
    """
    eigenvalues, V = np.linalg.eig(A)
    V_inv = np.linalg.inv(V)
    
    # Λ^k: raise each eigenvalue to power k
    Lambda_k = np.diag(eigenvalues ** k)
    
    result = V @ Lambda_k @ V_inv
    
    # Return real part if input was real
    if not np.iscomplexobj(A):
        result = result.real
    
    return result
 
def matrix_exp_via_diagonalization(A):
    """
    Compute e^A using eigendecomposition.
    
    e^A = V @ e^Λ @ V^{-1}
    """
    eigenvalues, V = np.linalg.eig(A)
    V_inv = np.linalg.inv(V)
    
    # e^Λ: exponentiate each eigenvalue
    exp_Lambda = np.diag(np.exp(eigenvalues))
    
    result = V @ exp_Lambda @ V_inv
    
    if not np.iscomplexobj(A):
        result = result.real
    
    return result
 
def matrix_function(A, f):
    """
    Apply function f to matrix A via eigendecomposition.
    
    f(A) = V @ f(Λ) @ V^{-1}
    """
    eigenvalues, V = np.linalg.eig(A)
    V_inv = np.linalg.inv(V)
    
    # Apply f to each eigenvalue
    f_Lambda = np.diag(f(eigenvalues))
    
    result = V @ f_Lambda @ V_inv
    
    if not np.iscomplexobj(A):
        result = result.real
    
    return result
 
# Example 1: Matrix powers
print("Matrix Powers via Diagonalization")
print("=" * 60)
 
A = np.array([
    [0.9, 0.2],
    [0.1, 0.8]
], dtype=float)
 
print(f"A =
{A}")
print()
 
# Compute A^10 both ways
A_10_direct = np.linalg.matrix_power(A, 10)
A_10_diag = matrix_power_via_diagonalization(A, 10)
 
print(f"A¹⁰ (direct):        
{A_10_direct.round(6)}")
print()
print(f"A¹⁰ (via eigendecomp):
{A_10_diag.round(6)}")
print()
print(f"Match: {np.allclose(A_10_direct, A_10_diag)}")
 
# Large power - this is where diagonalization shines
print("
" + "-" * 40)
print("Large power: A¹⁰⁰⁰")
A_1000_diag = matrix_power_via_diagonalization(A, 1000)
print(f"A¹⁰⁰⁰ =
{A_1000_diag.round(6)}")
 
eigenvalues = np.linalg.eigvals(A)
print(f"
Eigenvalues: {eigenvalues}")
print(f"λ₁¹⁰⁰⁰ = {eigenvalues[0]**1000:.6f}")
print(f"λ₂¹⁰⁰⁰ = {eigenvalues[1]**1000:.6e}")
print("Note: Largest eigenvalue (1.0) dominates; other decays to 0")
 
# Example 2: Matrix exponential
print("
" + "=" * 60)
print("Matrix Exponential")
print("=" * 60)
 
B = np.array([
    [0, -1],
    [1, 0]
], dtype=float)  # Rotation generator
 
print(f"B (90° rotation generator) =
{B}")
print()
 
# e^B should give rotation by 1 radian
exp_B_diag = matrix_exp_via_diagonalization(B)
exp_B_scipy = linalg.expm(B)
 
print(f"e^B (via eigendecomp): 
{exp_B_diag.round(6)}")
print()
print(f"e^B (scipy.linalg.expm):
{exp_B_scipy.round(6)}")
print()
 
# This should be rotation by 1 radian
theta = 1.0
expected = np.array([
    [np.cos(theta), -np.sin(theta)],
    [np.sin(theta), np.cos(theta)]
])
print(f"Expected (rotation matrix for θ=1 rad):
{expected.round(6)}")
print(f"Match: {np.allclose(exp_B_diag, expected)}")
 
# Example 3: Matrix square root
print("
" + "=" * 60)
print("Matrix Square Root (√A)")
print("=" * 60)
 
C = np.array([
    [4, 2],
    [2, 5]
], dtype=float)  # Positive definite
 
sqrt_C = matrix_function(C, np.sqrt)
print(f"C =
{C}")
print()
print(f"√C =
{sqrt_C.round(6)}")
print()
print(f"(√C)² =
{(sqrt_C @ sqrt_C).round(6)}")
print(f"Match C: {np.allclose(sqrt_C @ sqrt_C, C)}")

Computational Advantage

Computing A^k directly requires O(k) matrix multiplications. Via diagonalization, we compute eigendecomposition once (O(n³)), then the power is O(n) regardless of k. For large k, this is a massive speedup. This is why Markov chain steady-state analysis (A^∞) uses eigendecomposition.

Application: Markov Chains and Steady States

Diagonalization is essential for analyzing Markov chains—stochastic processes where the probability of the next state depends only on the current state.

A Markov chain is defined by a transition matrix P where P_ij = P(next state = j | current state = i). The rows sum to 1.

If π₀ is the initial state distribution (row vector), after k steps:

$$\pi_k = \pi_0 \mathbf{P}^k$$

Key questions:

Does the chain converge to a steady state (π∞ such that π∞P = π∞)?
How fast does it converge?

Both questions are answered by eigenanalysis of P.

markov_chain_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
import numpy as np
import matplotlib.pyplot as plt
 
def analyze_markov_chain(P, name="Markov Chain"):
    """
    Analyze a Markov chain transition matrix using eigendecomposition.
    """
    print(f"
{'='*60}")
    print(f"Analysis: {name}")
    print(f"{'='*60}")
    
    print("Transition matrix P:")
    print(P.round(4))
    
    # Verify it's a valid transition matrix
    row_sums = P.sum(axis=1)
    print(f"
Row sums (should be 1): {row_sums}")
    
    # Eigendecomposition - eigenvalues of P^T give left eigenvectors of P
    eigenvalues, eigenvectors = np.linalg.eig(P.T)
    
    # Sort by eigenvalue magnitude
    idx = np.argsort(np.abs(eigenvalues))[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    print(f"
Eigenvalues: {eigenvalues.round(6)}")
    
    # The steady state corresponds to eigenvalue 1
    # Find the eigenvalue closest to 1
    steady_state_idx = np.argmin(np.abs(eigenvalues - 1))
    
    if np.abs(eigenvalues[steady_state_idx] - 1) < 1e-10:
        # Normalize to be a probability distribution
        steady_state = eigenvectors[:, steady_state_idx].real
        steady_state = steady_state / steady_state.sum()
        
        print(f"
Steady state distribution (π∞):")
        print(f"  {steady_state.round(6)}")
        
        # Verify: π∞ P = π∞
        verification = steady_state @ P
        print(f"  Verify π∞P = π∞: {np.allclose(verification, steady_state)}")
    else:
        print("
No steady state (no eigenvalue = 1)")
        steady_state = None
    
    # Convergence rate determined by second-largest eigenvalue
    if len(eigenvalues) > 1:
        second_eigenvalue = eigenvalues[1]
        mixing_rate = np.abs(second_eigenvalue)
        print(f"
Second largest eigenvalue: {second_eigenvalue.round(6)}")
        print(f"Mixing rate (|λ₂|): {mixing_rate:.6f}")
        print(f"Approximate mixing time: {int(-1/np.log(mixing_rate + 1e-10)) if mixing_rate < 1 else 'slow/infinite'} steps")
    
    return eigenvalues, eigenvectors, steady_state
 
def simulate_markov_chain(P, initial_state, num_steps):
    """
    Simulate how the state distribution evolves over time.
    """
    n_states = P.shape[0]
    distribution = np.zeros(n_states)
    distribution[initial_state] = 1.0
    
    history = [distribution.copy()]
    
    for _ in range(num_steps):
        distribution = distribution @ P
        history.append(distribution.copy())
    
    return np.array(history)
 
# Example 1: Weather model (Sunny/Rainy)
P_weather = np.array([
    [0.8, 0.2],  # Sunny -> [Sunny, Rainy]
    [0.4, 0.6]   # Rainy -> [Sunny, Rainy]
])
eigenvalues, _, steady_state = analyze_markov_chain(P_weather, "Weather Model")
 
# Simulate convergence
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
history = simulate_markov_chain(P_weather, 0, 20)  # Start sunny
ax = axes[0]
ax.plot(history[:, 0], 'o-', label='P(Sunny)', color='orange')
ax.plot(history[:, 1], 's-', label='P(Rainy)', color='blue')
ax.axhline(y=steady_state[0], color='orange', linestyle='--', alpha=0.5)
ax.axhline(y=steady_state[1], color='blue', linestyle='--', alpha=0.5)
ax.set_xlabel('Step')
ax.set_ylabel('Probability')
ax.set_title('Convergence to Steady State
(Starting from Sunny)')
ax.legend()
ax.grid(True, alpha=0.3)
 
# Example 2: PageRank-style matrix
P_pagerank = np.array([
    [0.0, 0.5, 0.5],
    [0.0, 0.0, 1.0],
    [1.0, 0.0, 0.0]
])
eigenvalues_pr, _, steady_state_pr = analyze_markov_chain(P_pagerank, "Simple Web Graph")
 
# Plot eigenvalues in complex plane
ax = axes[1]
theta = np.linspace(0, 2*np.pi, 100)
ax.plot(np.cos(theta), np.sin(theta), 'k--', alpha=0.3, label='Unit circle')
ax.scatter(eigenvalues_pr.real, eigenvalues_pr.imag, s=100, c='red', zorder=5)
for i, ev in enumerate(eigenvalues_pr):
    ax.annotate(f'λ{i+1}={ev:.2f}', (ev.real + 0.1, ev.imag + 0.1))
ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.5, 1.5)
ax.set_aspect('equal')
ax.set_xlabel('Real')
ax.set_ylabel('Imaginary')
ax.set_title('Eigenvalues in Complex Plane
(λ=1 gives steady state)')
ax.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('markov_chain_analysis.png', dpi=150)
plt.show()

PageRank Connection

Google's original PageRank is a Markov chain on web pages. The steady state (eigenvector for λ=1) gives page importance scores. The 'damping factor' ensures convergence by making the transition matrix primitive (all eigenvalues except 1 have |λ| < 1).

Special Case: Symmetric Positive Definite Matrices

Symmetric Positive Definite (SPD) matrices are the best-behaved matrices in linear algebra. They appear everywhere in ML: covariance matrices, kernel matrices, Hessians of convex functions.

Definition: A symmetric matrix A is positive definite if xᵀAx > 0 for all non-zero x ∈ ℝⁿ.

Characterization via eigenvalues: A symmetric matrix A is positive definite if and only if all eigenvalues are strictly positive (λᵢ > 0 for all i).

Positive semi-definite: λᵢ ≥ 0 for all i.

Properties of SPD Matrices

•Always diagonalizable with orthonormal eigenvectors
•All eigenvalues positive (strictly for PD, non-negative for PSD)
•Unique Cholesky decomposition: A = LLᵀ where L is lower triangular
•Unique positive definite square root: √A exists and is also SPD
•Inverse is also SPD (eigenvalues of A⁻¹ are 1/λᵢ, still positive)
•Defines an inner product: ⟨x,y⟩_A = xᵀAy (Mahalanobis-like)
•Defines an ellipsoid: {x : xᵀAx ≤ 1} is an ellipsoid with axes along eigenvectors

spd_matrices.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import numpy as np
from scipy import linalg
 
def analyze_spd_matrix(A, name="Matrix"):
    """
    Analyze properties of a symmetric positive (semi-)definite matrix.
    """
    print(f"
Analyzing: {name}")
    print("=" * 60)
    print(A)
    print()
    
    # Check symmetry
    is_symmetric = np.allclose(A, A.T)
    print(f"Symmetric: {is_symmetric}")
    
    # Eigenanalysis
    eigenvalues, Q = np.linalg.eigh(A)
    print(f"Eigenvalues: {eigenvalues.round(6)}")
    
    # Positive definiteness
    is_pd = np.all(eigenvalues > 0)
    is_psd = np.all(eigenvalues >= 0)
    
    print(f"Positive definite: {is_pd}")
    print(f"Positive semi-definite: {is_psd}")
    
    # Condition number
    if is_psd and eigenvalues.min() > 0:
        condition = eigenvalues.max() / eigenvalues.min()
        print(f"Condition number: {condition:.4f}")
    
    if is_pd:
        # Cholesky decomposition
        L = linalg.cholesky(A, lower=True)
        print(f"
Cholesky factor L (A = LLᵀ):")
        print(L.round(4))
        
        # Verify
        print(f"LLᵀ ≈ A: {np.allclose(L @ L.T, A)}")
        
        # Matrix square root via eigendecomposition
        sqrt_eigenvalues = np.sqrt(eigenvalues)
        sqrt_A = Q @ np.diag(sqrt_eigenvalues) @ Q.T
        print(f"
Matrix square root √A:")
        print(sqrt_A.round(4))
        print(f"(√A)² ≈ A: {np.allclose(sqrt_A @ sqrt_A, A)}")
        
        # Inverse
        A_inv = np.linalg.inv(A)
        inv_eigenvalues = np.linalg.eigvalsh(A_inv)
        print(f"
Inverse A⁻¹ eigenvalues: {inv_eigenvalues.round(6)}")
        print(f"These are 1/λ: {(1/eigenvalues[::-1]).round(6)}")
    
    return eigenvalues, Q
 
# Example 1: Create SPD matrix from data covariance
np.random.seed(42)
X = np.random.randn(100, 3)
X[:, 0] *= 3  # Different variances
X[:, 1] *= 2
cov = X.T @ X / (100 - 1)
 
eigenvalues, Q = analyze_spd_matrix(cov, "Covariance Matrix")
 
# Example 2: Kernel matrix (always PSD)
from scipy.spatial.distance import cdist
 
points = np.random.randn(5, 2)
# RBF kernel: K_ij = exp(-||x_i - x_j||^2 / (2σ^2))
sigma = 1.0
distances = cdist(points, points, 'sqeuclidean')
K = np.exp(-distances / (2 * sigma**2))
 
print("
" + "=" * 60)
analyze_spd_matrix(K, "RBF Kernel Matrix")
 
# Example 3: Nearly singular (ill-conditioned) SPD
A_ill = np.array([
    [1.0, 0.99],
    [0.99, 1.0]
])
analyze_spd_matrix(A_ill, "Near-Singular SPD Matrix")

When Diagonalization Fails: Jordan Normal Form

Not all matrices are diagonalizable. Defective matrices have repeated eigenvalues with insufficient eigenvectors. What can we do?

Jordan Normal Form:

Every square matrix A (over ℂ) can be written as:

$$\mathbf{A} = \mathbf{P} \mathbf{J} \mathbf{P}^{-1}$$

where J is in Jordan normal form—block diagonal with Jordan blocks:

$$J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \cdots \ 0 & \lambda & 1 & \cdots \ \vdots & & \ddots & \ddots \ 0 & \cdots & 0 & \lambda \end{pmatrix}$$

For diagonalizable matrices, all Jordan blocks are 1×1 (just eigenvalues on diagonal).

Practical Impact

In ML practice, defective matrices are rare. Symmetric matrices (covariance, kernel matrices) are always diagonalizable. Random matrices are almost surely diagonalizable. Jordan form matters more for theoretical analysis than practical computation—but it's good to know the theory is complete.

jordan_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
from scipy import linalg
 
# The canonical defective matrix: a Jordan block
J = np.array([
    [2, 1],
    [0, 2]
], dtype=float)
 
print("Defective Matrix (Jordan block):")
print(J)
print()
 
# Eigenvalues
eigenvalues = np.linalg.eigvals(J)
print(f"Eigenvalues: {eigenvalues}")
print("Note: λ=2 has algebraic multiplicity 2")
 
# Try to find eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(J)
print(f"
Eigenvectors (columns):")
print(eigenvectors)
print()
 
# Check if eigenvectors are linearly independent
det = np.linalg.det(eigenvectors)
print(f"Determinant of eigenvector matrix: {det:.2e}")
print("Since det ≈ 0, eigenvectors are NOT linearly independent!")
print()
 
# What goes wrong with A^k?
print("Powers of J:")
for k in [1, 2, 3, 5, 10]:
    J_k = np.linalg.matrix_power(J, k)
    print(f"J^{k} = 
{J_k}")
    print()
 
print("Notice: J^k = [[2^k, k*2^(k-1)], [0, 2^k]]")
print("The off-diagonal grows with k (not fixed by eigendecomposition)")
 
# Schur decomposition always works
print("
" + "=" * 60)
print("Schur Decomposition (Always Works)")
print("=" * 60)
 
# For any matrix, A = Q T Q^H where T is upper triangular
T, Q = linalg.schur(J)
print(f"Schur form T (upper triangular):")
print(T)
print(f"
Q (unitary):")
print(Q)
print(f"
Verify A = Q T Qᴴ: {np.allclose(J, Q @ T @ Q.T.conj())}")

Summary: Diagonalization

We've explored diagonalization—the process of expressing matrices in their simplest form. Let's consolidate:

Key Takeaways

•Diagonalization writes A = VΛV⁻¹ where V contains eigenvectors and Λ is diagonal with eigenvalues.
•Diagonalizability requires n linearly independent eigenvectors. This fails (defective matrix) when geometric multiplicity < algebraic multiplicity for some eigenvalue.
•The Spectral Theorem guarantees symmetric matrices are always diagonalizable with orthonormal eigenvectors: A = QΛQᵀ.
•Diagonalization trivializes matrix functions: Aᵏ = VΛᵏV⁻¹, exp(A) = V exp(Λ) V⁻¹, etc. Apply function to eigenvalues only.
•Markov chain analysis uses diagonalization: Steady state = eigenvector for λ=1; convergence rate = second-largest |λ|.
•SPD matrices have positive eigenvalues and possess unique Cholesky decomposition, square root, and many convenient properties.
•Jordan normal form handles defective matrices but is rarely needed in ML practice.

What's Next:

We've built the complete theoretical foundation of eigenanalysis. The final page of this module explores real-world ML applications—how eigenvalues and eigenvectors power PCA for dimensionality reduction, PageRank for web search, spectral clustering for community detection, and more. We'll see eigenanalysis in action!

Page Complete

You now understand matrix diagonalization—when it's possible, how it works, and why it's powerful. You've mastered the Spectral Theorem for symmetric matrices and seen applications to matrix functions and Markov chains. Next, we'll apply these concepts to PCA, PageRank, and spectral clustering.