Machine LearningLinear Algebra

Eigenvalues and Eigenvectors

LevelIntermediate

Duration90 mins

TopicLinear Algebra

5 / 5

Applications in ML (PCA, PageRank)

Eigenanalysis in Production ML Systems

The theory of eigenvalues and eigenvectors isn't abstract mathematics confined to textbooks—it powers billion-dollar ML systems running in production today.

Google's original search engine was built on the PageRank algorithm, which computes the dominant eigenvector of the web's link matrix. Netflix's recommendation engine relies on matrix factorization, fundamentally enabled by eigendecomposition. Principal Component Analysis (PCA), perhaps the most widely used dimensionality reduction technique, is pure eigenanalysis of covariance matrices. And spectral clustering discovers communities in social networks by analyzing graph Laplacian eigenvectors.

This page brings together everything we've learned to show eigenanalysis in action. You'll understand not just how these algorithms work, but why eigenvalues and eigenvectors naturally solve these problems.

What You Will Master

By the end of this page, you will implement PCA from scratch and understand its eigenvalue interpretation, reproduce the core PageRank algorithm and see eigenvectors in action, grasp spectral clustering and why graph eigenvalues reveal clusters, and recognize eigenanalysis patterns across diverse ML applications.

Principal Component Analysis (PCA)

PCA is the crown jewel of eigenanalysis applications in ML. It finds the directions of maximum variance in high-dimensional data, enabling dimensionality reduction while preserving the most important information.

The problem: Given n data points in d dimensions, find a lower-dimensional representation that captures the most variance.

The eigenvalue solution:

Center the data (subtract mean)
Compute the covariance matrix Σ (d×d, symmetric positive semi-definite)
Find eigenvalues and eigenvectors of Σ
Principal components = eigenvectors sorted by eigenvalue magnitude
Project data onto top k eigenvectors for k-dimensional representation

Why eigenvalues maximize variance:

The first principal component v₁ maximizes variance subject to ||v||=1:

$$v_1 = \arg\max_{||v||=1} \text{Var}(Xv) = \arg\max_{||v||=1} v^T \Sigma v$$

By the method of Lagrange multipliers, the solution satisfies Σv₁ = λ₁v₁ — the eigenvector with largest eigenvalue!

pca_from_scratch.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
 
class PCA:
    """
    Principal Component Analysis from scratch using eigendecomposition.
    """
    
    def __init__(self, n_components=None):
        self.n_components = n_components
        self.components_ = None      # Principal component directions
        self.explained_variance_ = None
        self.explained_variance_ratio_ = None
        self.mean_ = None
    
    def fit(self, X):
        """
        Fit PCA to data X.
        
        Steps:
        1. Center the data
        2. Compute covariance matrix
        3. Eigendecomposition (spectral decomposition)
        4. Sort by eigenvalue magnitude
        """
        n_samples, n_features = X.shape
        
        # Step 1: Center the data
        self.mean_ = X.mean(axis=0)
        X_centered = X - self.mean_
        
        # Step 2: Compute covariance matrix
        # Cov = X^T X / (n-1) for centered data
        covariance_matrix = X_centered.T @ X_centered / (n_samples - 1)
        
        # Step 3: Eigendecomposition (use eigh for symmetric matrix)
        eigenvalues, eigenvectors = np.linalg.eigh(covariance_matrix)
        
        # Step 4: Sort by eigenvalue (descending)
        idx = np.argsort(eigenvalues)[::-1]
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
        
        # Store results
        if self.n_components is not None:
            eigenvalues = eigenvalues[:self.n_components]
            eigenvectors = eigenvectors[:, :self.n_components]
        
        self.components_ = eigenvectors.T  # Rows are components
        self.explained_variance_ = eigenvalues
        total_variance = eigenvalues.sum() if self.n_components is None else X_centered.var(axis=0).sum()
        self.explained_variance_ratio_ = eigenvalues / X_centered.var(axis=0).sum()
        
        return self
    
    def transform(self, X):
        """Project data onto principal components."""
        X_centered = X - self.mean_
        return X_centered @ self.components_.T
    
    def fit_transform(self, X):
        """Fit and transform in one step."""
        self.fit(X)
        return self.transform(X)
    
    def inverse_transform(self, X_reduced):
        """Reconstruct data from reduced representation."""
        return X_reduced @ self.components_ + self.mean_
 
# Load and visualize with Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
 
print("Iris Dataset: PCA Analysis")
print("=" * 60)
print(f"Original shape: {X.shape} (150 samples, 4 features)")
 
# Fit our PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
 
print(f"Reduced shape: {X_reduced.shape}")
print(f"\nExplained variance per component:")
for i, (var, ratio) in enumerate(zip(pca.explained_variance_, pca.explained_variance_ratio_)):
    print(f"  PC{i+1}: variance = {var:.4f}, ratio = {ratio*100:.2f}%")
print(f"\nTotal variance explained: {pca.explained_variance_ratio_.sum()*100:.2f}%")
 
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Left: Projected data
ax = axes[0]
colors = ['red', 'green', 'blue']
for i, color in enumerate(colors):
    mask = y == i
    ax.scatter(X_reduced[mask, 0], X_reduced[mask, 1], 
              c=color, label=iris.target_names[i], alpha=0.7)
ax.set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)')
ax.set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)')
ax.set_title('Iris Dataset: PCA Projection')
ax.legend()
ax.grid(True, alpha=0.3)
 
# Right: Scree plot
ax = axes[1]
full_pca = PCA().fit(X)
var_ratios = full_pca.explained_variance_ratio_
cumulative = np.cumsum(var_ratios)
 
ax.bar(range(1, len(var_ratios)+1), var_ratios, alpha=0.7, label='Individual')
ax.plot(range(1, len(var_ratios)+1), cumulative, 'ro-', label='Cumulative')
ax.axhline(y=0.95, color='green', linestyle='--', label='95% threshold')
ax.set_xlabel('Principal Component')
ax.set_ylabel('Variance Ratio')
ax.set_title('Scree Plot: How Many Components?')
ax.legend()
ax.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('pca_analysis.png', dpi=150)
plt.show()
 
# Verify against sklearn
from sklearn.decomposition import PCA as SklearnPCA
sklearn_pca = SklearnPCA(n_components=2).fit(X)
print(f"\nVerification against sklearn PCA:")
print(f"Variance ratios match: {np.allclose(pca.explained_variance_ratio_, sklearn_pca.explained_variance_ratio_)}")

PCA Insights from Eigenvalues

Eigenvalues tell you how much variance each PC captures. A steep drop in the scree plot suggests low intrinsic dimensionality. If the first few eigenvalues capture 95%+ of variance, the data lies near a low-dimensional subspace—PCA works beautifully. If eigenvalues decay slowly, the data is truly high-dimensional.

PageRank: Eigenvectors for Web Search

PageRank, developed by Larry Page and Sergey Brin at Stanford, revolutionized web search. It models a random web surfer who randomly clicks links, occasionally jumping to random pages. Page importance = probability of being on that page in the long run.

The mathematical model:

Build the link matrix L: L_ij = 1 if page j links to page i
Normalize columns: M_ij = L_ij / (out-degree of j)
Add damping to handle dead ends: P = α·M + (1-α)·(1/n)·𝟙
Find the dominant eigenvector of P (eigenvalue = 1)
This eigenvector = stationary distribution = PageRank scores

Why eigenvectors?

The random surfer's state after k steps is π_k = π_0 · Pᵏ. As k→∞, this converges to the unique stationary distribution π satisfying π = πP. Transposed: Pᵀπᵀ = πᵀ — the eigenvector of Pᵀ with eigenvalue 1!

pagerank.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
import numpy as np
import matplotlib.pyplot as plt
 
def pagerank(adjacency_matrix, damping=0.85, max_iterations=100, tol=1e-8):
    """
    Compute PageRank using the power iteration method.
    
    Parameters:
        adjacency_matrix: A[i,j] = 1 if page j links to page i
        damping: Probability of following a link (vs random jump)
        max_iterations: Maximum number of iterations
        tol: Convergence tolerance
    
    Returns:
        PageRank scores (probability distribution)
    """
    n = adjacency_matrix.shape[0]
    
    # Step 1: Create column-stochastic transition matrix
    # Normalize by out-degree (column sums)
    out_degrees = adjacency_matrix.sum(axis=0)
    
    # Handle dangling nodes (no outgoing links)
    dangling = (out_degrees == 0)
    out_degrees[dangling] = 1  # Avoid division by zero
    
    M = adjacency_matrix / out_degrees
    M[:, dangling] = 1/n  # Dangling nodes link to everyone
    
    # Step 2: Add damping factor
    # P = α*M + (1-α) * (1/n) * ones
    teleport = np.ones((n, n)) / n
    P = damping * M + (1 - damping) * teleport
    
    # Step 3: Power iteration to find dominant eigenvector
    # Start with uniform distribution
    pagerank_scores = np.ones(n) / n
    
    for iteration in range(max_iterations):
        new_scores = P @ pagerank_scores
        
        # Check convergence
        diff = np.abs(new_scores - pagerank_scores).sum()
        pagerank_scores = new_scores
        
        if diff < tol:
            print(f"Converged after {iteration + 1} iterations")
            break
    
    return pagerank_scores
 
def pagerank_eigen(adjacency_matrix, damping=0.85):
    """
    Compute PageRank using explicit eigendecomposition.
    (Power iteration is preferred for large graphs, but this shows the connection.)
    """
    n = adjacency_matrix.shape[0]
    
    # Build transition matrix (same as above)
    out_degrees = adjacency_matrix.sum(axis=0)
    dangling = (out_degrees == 0)
    out_degrees[dangling] = 1
    M = adjacency_matrix / out_degrees
    M[:, dangling] = 1/n
    P = damping * M + (1 - damping) * np.ones((n, n)) / n
    
    # Find eigenvector with eigenvalue 1
    eigenvalues, eigenvectors = np.linalg.eig(P)
    
    # Find the eigenvalue closest to 1
    idx = np.argmin(np.abs(eigenvalues - 1))
    dominant_eigenvector = eigenvectors[:, idx].real
    
    # Normalize to be a probability distribution
    pagerank_scores = dominant_eigenvector / dominant_eigenvector.sum()
    
    return pagerank_scores, eigenvalues
 
# Example: Small web graph
#     0 -> 1 -> 3
#     |    ^    |
#     v    |    v
#     2 ---+    4
#     
pages = ['Homepage', 'Products', 'About', 'Blog', 'Contact']
n = len(pages)
 
# Adjacency matrix: A[i,j] = 1 if j links to i
# (Column j shows where page j links TO)
links = np.array([
    [0, 0, 1, 0, 0],  # Links TO Homepage
    [1, 0, 1, 0, 0],  # Links TO Products
    [1, 0, 0, 0, 0],  # Links TO About
    [0, 1, 0, 0, 0],  # Links TO Blog
    [0, 0, 0, 1, 0],  # Links TO Contact
])
 
print("PageRank Analysis")
print("=" * 60)
print("\nLink structure:")
for i, page in enumerate(pages):
    outlinks = [pages[j] for j in range(n) if links[j, i] == 1]
    print(f"  {page} -> {outlinks if outlinks else '[none]'}")
 
# Compute PageRank
print("\n" + "-" * 40)
print("Power Iteration Method:")
pr_power = pagerank(links.astype(float), damping=0.85)
for page, score in zip(pages, pr_power):
    print(f"  {page}: {score:.4f}")
 
print("\n" + "-" * 40)
print("Eigendecomposition Method:")
pr_eigen, eigenvalues = pagerank_eigen(links.astype(float), damping=0.85)
for page, score in zip(pages, pr_eigen):
    print(f"  {page}: {score:.4f}")
 
print(f"\nEigenvalues of transition matrix:")
for i, ev in enumerate(sorted(np.abs(eigenvalues), reverse=True)[:5]):
    print(f"  |λ{i+1}| = {ev:.6f}")
print("(Largest eigenvalue = 1 corresponds to stationary distribution)")
 
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Left: PageRank scores as bar chart
ax = axes[0]
colors = plt.cm.viridis(pr_power / pr_power.max())
ax.barh(pages, pr_power, color=colors)
ax.set_xlabel('PageRank Score')
ax.set_title('PageRank Scores')
ax.grid(True, alpha=0.3)
 
# Right: Eigenvalue spectrum
ax = axes[1]
theta = np.linspace(0, 2*np.pi, 100)
ax.plot(np.cos(theta), np.sin(theta), 'k--', alpha=0.3)
ax.scatter(eigenvalues.real, eigenvalues.imag, s=100, c='red', zorder=5)
ax.scatter([1], [0], s=200, c='green', marker='*', zorder=6, label='λ=1 (PageRank)')
ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.5, 1.5)
ax.set_aspect('equal')
ax.set_xlabel('Real')
ax.set_ylabel('Imaginary')
ax.set_title('Eigenvalues of Transition Matrix\n(Green star = PageRank eigenvector)')
ax.legend()
ax.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('pagerank_analysis.png', dpi=150)
plt.show()

PageRank at Scale

The actual web has billions of pages. Power iteration works because: (1) it only needs the dominant eigenvector, (2) the transition matrix is extremely sparse (each page links to few others), and (3) it can be parallelized using MapReduce. Early Google computed PageRank on clusters of commodity machines.

Spectral Clustering: Communities from Eigenvectors

Spectral clustering discovers structure in graphs (and in data, via similarity graphs). The key insight: clusters correspond to near-zero eigenvalues of the graph Laplacian.

The Graph Laplacian:

For a graph with adjacency matrix W (weighted edges) and degree matrix D (diagonal, D_ii = sum of weights at node i):

Unnormalized Laplacian: L = D - W
Normalized Laplacian: L_sym = D^(-1/2) L D^(-1/2) = I - D^(-1/2) W D^(-1/2)

Key properties:

L is symmetric positive semi-definite
The smallest eigenvalue is always 0 (eigenvector = constant)
The multiplicity of eigenvalue 0 = number of connected components
For k clusters, the smallest k eigenvectors embed the graph into ℝᵏ, where k-means finds clusters

spectral_clustering.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
 
def build_similarity_graph(X, sigma=1.0):
    """
    Build a weighted adjacency matrix using Gaussian (RBF) similarity.
    W_ij = exp(-||x_i - x_j||^2 / (2 * sigma^2))
    """
    distances = cdist(X, X, 'sqeuclidean')
    W = np.exp(-distances / (2 * sigma**2))
    np.fill_diagonal(W, 0)  # No self-loops
    return W
 
def spectral_clustering(X, n_clusters, sigma=1.0):
    """
    Spectral clustering from scratch.
    
    Steps:
    1. Build similarity graph
    2. Compute unnormalized Laplacian L = D - W
    3. Find k smallest eigenvectors of L
    4. Use these as features for k-means
    """
    n = X.shape[0]
    
    # Step 1: Build similarity graph
    W = build_similarity_graph(X, sigma)
    
    # Step 2: Compute Laplacian
    D = np.diag(W.sum(axis=1))  # Degree matrix
    L = D - W  # Unnormalized Laplacian
    
    # Step 3: Eigendecomposition (using eigh for symmetric matrix)
    eigenvalues, eigenvectors = np.linalg.eigh(L)
    
    # The k smallest eigenvalues (excluding the trivial 0 eigenvalue for connected graphs)
    # For k clusters, we use the k smallest eigenvectors
    spectral_embedding = eigenvectors[:, :n_clusters]
    
    # Normalize rows (for normalized spectral clustering)
    spectral_embedding = spectral_embedding / np.linalg.norm(spectral_embedding, axis=1, keepdims=True)
    
    # Step 4: K-means in the spectral embedding
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels = kmeans.fit_predict(spectral_embedding)
    
    return labels, eigenvalues, spectral_embedding
 
# Generate synthetic data with clusters
np.random.seed(42)
 
# Two moons - classic test for spectral clustering
from sklearn.datasets import make_moons, make_circles
 
X_moons, y_moons = make_moons(n_samples=300, noise=0.05)
X_circles, y_circles = make_circles(n_samples=300, noise=0.05, factor=0.5)
 
# Figure
fig, axes = plt.subplots(2, 4, figsize=(18, 9))
 
for row, (X, y_true, name) in enumerate([(X_moons, y_moons, 'Two Moons'), 
                                          (X_circles, y_circles, 'Concentric Circles')]):
    
    # Column 1: Original data
    ax = axes[row, 0]
    ax.scatter(X[:, 0], X[:, 1], c=y_true, cmap='viridis', alpha=0.7)
    ax.set_title(f'{name}\nTrue Labels')
    ax.set_aspect('equal')
    
    # Column 2: K-means (fails on non-convex)
    kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
    labels_kmeans = kmeans.fit_predict(X)
    
    ax = axes[row, 1]
    ax.scatter(X[:, 0], X[:, 1], c=labels_kmeans, cmap='viridis', alpha=0.7)
    ax.set_title('K-means Clustering\n(fails on non-convex)')
    ax.set_aspect('equal')
    
    # Column 3: Spectral clustering
    labels_spectral, eigenvalues, embedding = spectral_clustering(X, n_clusters=2, sigma=0.3)
    
    ax = axes[row, 2]
    ax.scatter(X[:, 0], X[:, 1], c=labels_spectral, cmap='viridis', alpha=0.7)
    ax.set_title('Spectral Clustering\n(handles non-convex!)')
    ax.set_aspect('equal')
    
    # Column 4: Eigenvalue spectrum
    ax = axes[row, 3]
    ax.plot(eigenvalues[:20], 'o-', markersize=8)
    ax.axhline(y=0, color='red', linestyle='--', alpha=0.5)
    ax.set_xlabel('Index')
    ax.set_ylabel('Eigenvalue')
    ax.set_title('Laplacian Eigenvalues\n(Gap reveals # clusters)')
    ax.grid(True, alpha=0.3)
    
    # Highlight the spectral gap
    ax.axvline(x=1.5, color='green', linestyle='--', alpha=0.7, label='Spectral gap')
    ax.legend()
 
plt.tight_layout()
plt.savefig('spectral_clustering_demo.png', dpi=150)
plt.show()
 
print("Spectral Clustering Analysis")
print("=" * 60)
print("\nKey insight: The SPECTRAL GAP (jump between eigenvalues)")
print("reveals the number of clusters!")
print("\nFor 'Two Moons':")
print(f"  First 5 eigenvalues: {eigenvalues[:5].round(6)}")
print(f"  Notice gap after k=2: eigenvalue 3 is much larger than eigenvalue 2")

Why Spectral Clustering Works

The eigenvectors of the Laplacian represent the 'smoothest' functions over the graph—functions that vary slowly across edges. Cluster membership IS a smooth function (constant within clusters, different between). So the bottom eigenvectors naturally encode cluster structure!

Singular Value Decomposition: Eigenanalysis Extended

The Singular Value Decomposition (SVD) extends eigenanalysis to rectangular matrices. Any m×n matrix A can be decomposed as:

$$\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$

where:

U is m×m orthogonal (left singular vectors)
Σ is m×n diagonal (singular values, σ₁ ≥ σ₂ ≥ ... ≥ 0)
V is n×n orthogonal (right singular vectors)

Connection to eigenvalues:

Singular values σᵢ = √(eigenvalue of AᵀA) = √(eigenvalue of AAᵀ)
Right singular vectors V = eigenvectors of AᵀA
Left singular vectors U = eigenvectors of AAᵀ

Applications:

Latent Semantic Analysis (LSA): Topic modeling via SVD of document-term matrix
Recommendation Systems: Matrix completion via low-rank SVD approximation
Image Compression: Keep only top k singular values
Pseudoinverse: A⁺ = VΣ⁺Uᵀ for least squares

svd_applications.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import numpy as np
import matplotlib.pyplot as plt
 
def demonstrate_svd_compression():
    """
    Demonstrate low-rank approximation via SVD for image compression.
    """
    # Create a simple grayscale "image" (or use any image)
    # Using a gradient + pattern for illustration
    x = np.linspace(0, 4*np.pi, 100)
    y = np.linspace(0, 4*np.pi, 80)
    X, Y = np.meshgrid(x, y)
    image = np.sin(X) * np.cos(Y) + 0.5 * np.sin(2*X)
    
    print("SVD for Low-Rank Approximation (Image Compression)")
    print("=" * 60)
    print(f"Original image shape: {image.shape}")
    print(f"Original storage: {image.size} values")
    
    # Compute SVD
    U, singular_values, Vt = np.linalg.svd(image, full_matrices=False)
    
    print(f"\nSingular values (first 10): {singular_values[:10].round(4)}")
    print(f"Total singular values: {len(singular_values)}")
    
    # Reconstruct with different ranks
    ranks = [1, 5, 10, 20, 50, 80]
    
    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    
    # Original
    ax = axes[0, 0]
    ax.imshow(image, cmap='viridis')
    ax.set_title(f'Original\n{image.size} values')
    ax.axis('off')
    
    for idx, k in enumerate(ranks):
        # Low-rank approximation: A_k = U[:,:k] @ diag(s[:k]) @ Vt[:k,:]
        image_k = U[:, :k] @ np.diag(singular_values[:k]) @ Vt[:k, :]
        
        # Compression ratio
        storage = U.shape[0] * k + k + Vt.shape[1] * k
        compression = 100 * (1 - storage / image.size)
        
        # Reconstruction error
        error = np.linalg.norm(image - image_k) / np.linalg.norm(image)
        
        ax = axes.flat[idx + 1]
        ax.imshow(image_k, cmap='viridis')
        ax.set_title(f'Rank {k}\n{compression:.1f}% compression\nError: {error:.4f}')
        ax.axis('off')
    
    # Scree plot
    ax = axes[1, 3]
    ax.semilogy(singular_values, 'b-o', markersize=3)
    ax.set_xlabel('Index')
    ax.set_ylabel('Singular Value (log scale)')
    ax.set_title('Singular Value Decay')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('svd_compression.png', dpi=150)
    plt.show()
    
    return singular_values
 
def demonstrate_latent_semantic_analysis():
    """
    Demonstrate LSA for text similarity using SVD.
    """
    print("\n" + "=" * 60)
    print("Latent Semantic Analysis (LSA)")
    print("=" * 60)
    
    # Toy document-term matrix
    # Documents: rows, Terms: columns
    # Each entry = term frequency
    documents = [
        "cat dog pet",
        "dog cat animal pet",
        "car truck vehicle",
        "truck car motor vehicle",
        "pet animal love",
    ]
    
    # Build vocabulary and term-document matrix
    vocab = sorted(set(' '.join(documents).split()))
    n_docs = len(documents)
    n_terms = len(vocab)
    term_idx = {term: i for i, term in enumerate(vocab)}
    
    A = np.zeros((n_docs, n_terms))
    for i, doc in enumerate(documents):
        for word in doc.split():
            A[i, term_idx[word]] += 1
    
    print(f"Documents: {documents}")
    print(f"Vocabulary: {vocab}")
    print(f"\nDocument-Term Matrix A ({n_docs} docs x {n_terms} terms):")
    print(A)
    
    # SVD
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    
    print(f"\nSingular values: {s.round(4)}")
    
    # Project documents to 2D
    k = 2
    doc_embeddings = U[:, :k] @ np.diag(s[:k])
    
    print(f"\nDocument embeddings (2D):")
    for i, doc in enumerate(documents):
        print(f"  Doc {i} ({doc[:20]}...): {doc_embeddings[i].round(3)}")
    
    # Compute document similarities in LSA space
    from scipy.spatial.distance import cosine
    
    print(f"\nDocument similarities in LSA space:")
    for i in range(n_docs):
        for j in range(i+1, n_docs):
            sim = 1 - cosine(doc_embeddings[i], doc_embeddings[j])
            if sim > 0.7:  # Show high similarities
                print(f"  Doc {i} <-> Doc {j}: {sim:.3f}")
    
    return doc_embeddings
 
singular_values = demonstrate_svd_compression()
doc_embeddings = demonstrate_latent_semantic_analysis()

Eigenvalues in Neural Network Analysis

Eigenvalues are crucial for understanding the training dynamics of neural networks, particularly recurrent neural networks (RNNs).

Gradient flow and eigenvalues:

In a deep network, gradients flow through weight matrices. If W is a weight matrix:

|λ_max(W)| > 1 → gradients explode
|λ_max(W)| < 1 → gradients vanish

RNN dynamics:

An RNN updates hidden states as h_t = tanh(W_h h_{t-1} + W_x x_t). Over T timesteps, the Jacobian involves W_h^T. If |eigenvalues| ≠ 1:

|λ| > 1: exponential gradient explosion
|λ| < 1: exponential gradient vanishing

This is why standard RNNs struggle with long sequences—eigenvalue analysis explains the vanishing gradient problem!

Eigenvalue-Based Solutions in Deep Learning
Problem	Eigenvalue Diagnosis	Solution
Vanishing gradients in RNN	\|λ\| < 1 for W_h	LSTM/GRU (gating), orthogonal init
Exploding gradients	\|λ\| > 1 for W_h	Gradient clipping, spectral norm
Slow convergence	Large condition number	Batch normalization, Adam
Mode collapse in GANs	Singular Jacobian	Spectral normalization
Training instability	Complex eigenvalues	Careful initialization

rnn_eigenvalue_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
import numpy as np
import matplotlib.pyplot as plt
 
def analyze_rnn_weight_matrix(W, name="W"):
    """
    Analyze an RNN weight matrix for gradient flow properties.
    """
    eigenvalues = np.linalg.eigvals(W)
    
    print(f"\n{name}: {W.shape[0]}x{W.shape[1]} matrix")
    print(f"  Spectral radius (max |λ|): {np.max(np.abs(eigenvalues)):.4f}")
    print(f"  Min |λ|: {np.min(np.abs(eigenvalues)):.4f}")
    
    # Classify gradient behavior
    max_abs = np.max(np.abs(eigenvalues))
    if max_abs > 1.01:
        status = "⚠️ EXPLODING gradients risk"
    elif max_abs < 0.99:
        status = "⚠️ VANISHING gradients risk"
    else:
        status = "✓ Stable gradient flow"
    
    print(f"  Status: {status}")
    
    return eigenvalues
 
def compare_initializations(hidden_size=100):
    """
    Compare different RNN weight initializations via eigenvalue analysis.
    """
    print("RNN Weight Matrix Initialization Analysis")
    print("=" * 60)
    
    fig, axes = plt.subplots(2, 2, figsize=(12, 12))
    
    initializations = {
        'Random Gaussian (σ=1.0)': np.random.randn(hidden_size, hidden_size),
        'Xavier/Glorot': np.random.randn(hidden_size, hidden_size) / np.sqrt(hidden_size),
        'Orthogonal': np.linalg.qr(np.random.randn(hidden_size, hidden_size))[0],
        'Identity + small noise': np.eye(hidden_size) + 0.01 * np.random.randn(hidden_size, hidden_size),
    }
    
    for ax, (name, W) in zip(axes.flat, initializations.items()):
        eigenvalues = analyze_rnn_weight_matrix(W, name)
        
        # Plot eigenvalues in complex plane
        theta = np.linspace(0, 2*np.pi, 100)
        ax.plot(np.cos(theta), np.sin(theta), 'g--', alpha=0.5, label='Unit circle')
        ax.scatter(eigenvalues.real, eigenvalues.imag, alpha=0.6, s=20)
        
        max_abs = np.max(np.abs(eigenvalues))
        ax.set_xlim(-max(2, max_abs*1.1), max(2, max_abs*1.1))
        ax.set_ylim(-max(2, max_abs*1.1), max(2, max_abs*1.1))
        ax.set_aspect('equal')
        ax.set_xlabel('Real')
        ax.set_ylabel('Imaginary')
        ax.set_title(f'{name}\nSpectral radius: {max_abs:.3f}')
        ax.grid(True, alpha=0.3)
        ax.legend(loc='upper right')
    
    plt.suptitle('Eigenvalue Distribution for Different RNN Initializations', fontsize=14)
    plt.tight_layout()
    plt.savefig('rnn_eigenvalue_init.png', dpi=150)
    plt.show()
 
def simulate_gradient_flow(W, T=50):
    """
    Simulate gradient magnitude over time in an RNN.
    
    For a simple RNN, gradient at time 0 for loss at time T involves W^T.
    """
    spectral_radius = np.max(np.abs(np.linalg.eigvals(W)))
    
    gradient_norms = []
    gradient = np.ones(W.shape[0])
    gradient = gradient / np.linalg.norm(gradient)
    
    for t in range(T):
        gradient = W.T @ gradient
        gradient_norms.append(np.linalg.norm(gradient))
    
    return gradient_norms, spectral_radius
 
# Run analysis
np.random.seed(42)
compare_initializations(hidden_size=100)
 
# Simulate gradient flow
print("\n" + "=" * 60)
print("Gradient Flow Simulation")
print("=" * 60)
 
fig, ax = plt.subplots(figsize=(10, 6))
 
hidden_size = 50
T = 30
 
for spectral_target, label in [(0.8, 'σ_r = 0.8 (vanishing)'),
                                (1.0, 'σ_r = 1.0 (stable)'),
                                (1.2, 'σ_r = 1.2 (exploding)')]:
    # Create orthogonal matrix scaled to target spectral radius
    Q = np.linalg.qr(np.random.randn(hidden_size, hidden_size))[0]
    W = spectral_target * Q
    
    gradient_norms, _ = simulate_gradient_flow(W, T)
    ax.semilogy(gradient_norms, label=label, linewidth=2)
 
ax.set_xlabel('Timesteps back')
ax.set_ylabel('Gradient norm (log scale)')
ax.set_title('Gradient Flow vs Spectral Radius\n(Why RNNs struggle with long sequences)')
ax.legend()
ax.grid(True, alpha=0.3)
ax.axhline(y=1, color='gray', linestyle='--', alpha=0.5)
plt.savefig('gradient_flow_simulation.png', dpi=150)
plt.show()

LSTM and Eigenvalues

LSTMs solve the vanishing gradient problem by using gating mechanisms that effectively control eigenvalues during backpropagation. The cell state path has gradients that multiply by (close to 1) factors, maintaining |λ| ≈ 1 over long sequences. This is why LSTMs capture long-range dependencies!

The Eigenanalysis Toolkit

We've seen eigenvalues and eigenvectors appear across diverse ML applications. Let's consolidate the patterns:

Eigenanalysis Across Machine Learning
Application	What's Diagonalized	Key Insight
PCA	Covariance matrix Σ	Eigenvectors = principal directions, eigenvalues = variance
PageRank	Transition matrix P	Dominant eigenvector (λ=1) = page importance
Spectral Clustering	Graph Laplacian L	Small eigenvalues = cluster indicators
SVD / LSA	Data matrix A (via AᵀA)	Low-rank structure for compression/similarity
RNN Analysis	Weight matrix W	Spectral radius determines gradient flow
Kernel PCA	Kernel matrix K	Nonlinear PCA in feature space
Markov Chains	Transition matrix P	Eigenvalue 1 = steady state; \|λ₂\| = mixing rate
Graph Embedding	Adjacency/Laplacian	Eigenvectors = node features
Optimization	Hessian H	Eigenvalues = curvature, condition number

Unifying Principles

•Eigenvectors reveal structure: Principal directions, cluster memberships, page importances, and stable dynamics all emerge as eigenvectors of the right matrix.
•Eigenvalues quantify importance: Variance explained, convergence rates, spectral gaps, and stability all trace to eigenvalue magnitudes.
•Low rank means compressibility: When eigenvalues decay rapidly, we can approximate with few components—the foundation of dimensionality reduction.
•Symmetric matrices are well-behaved: Covariance matrices, kernel matrices, and graph Laplacians are symmetric → guaranteed real eigenvalues and orthogonal eigenvectors.
•Spectral gaps reveal discrete structure: A gap in the eigenvalue spectrum often indicates cluster count, rank, or dimensionality.

Module Summary: Eigenvalues and Eigenvectors

We've completed a comprehensive exploration of eigenvalues and eigenvectors—from formal definitions to production ML systems. Let's consolidate everything:

Module Mastery Checklist

•Definition mastery: You understand Av = λv, the characteristic polynomial, and how to find eigenpairs.
•Geometric intuition: You visualize eigenvectors as invariant directions, eigenvalues as scaling factors, and transformations as ellipsoids.
•Computational literacy: You know power iteration, QR algorithm, and when to use eigh vs eigsh for different matrix types.
•Diagonalization expertise: You can write A = VΛV⁻¹, know when it's possible, and apply it to simplify matrix powers and functions.
•Spectral Theorem fluency: You understand why symmetric matrices are special and how this underpins PCA reliability.
•PCA implementation: You can derive and implement PCA from eigendecomposition of the covariance matrix.
•PageRank understanding: You see web ranking as finding the dominant eigenvector of a transition matrix.
•Spectral clustering intuition: You know how graph Laplacian eigenvectors reveal cluster structure.
•Deep learning connection: You understand how eigenvalues of weight matrices affect gradient flow and training stability.

Module Complete!

You now possess a deep understanding of eigenvalues and eigenvectors—the mathematical backbone of dimensionality reduction, graph analysis, and dynamical systems in machine learning. This knowledge will serve you across PCA, spectral methods, matrix factorization, and neural network analysis. The eigenvalue lens reveals structure hidden in matrices everywhere.

What's Next:

The next module in this chapter covers Matrix Decompositions—SVD, QR, Cholesky, and LU decompositions. These build on eigenanalysis to provide practical tools for solving linear systems, low-rank approximation, and numerical stability in ML algorithms.

5 / 5

Loading learning content...

Machine LearningLinear Algebra

Eigenvalues and Eigenvectors

LevelIntermediate

Duration90 mins

TopicLinear Algebra

5 / 5

Applications in ML (PCA, PageRank)

Eigenanalysis in Production ML Systems

The theory of eigenvalues and eigenvectors isn't abstract mathematics confined to textbooks—it powers billion-dollar ML systems running in production today.

What You Will Master

Principal Component Analysis (PCA)

The problem: Given n data points in d dimensions, find a lower-dimensional representation that captures the most variance.

The eigenvalue solution:

Center the data (subtract mean)
Compute the covariance matrix Σ (d×d, symmetric positive semi-definite)
Find eigenvalues and eigenvectors of Σ
Principal components = eigenvectors sorted by eigenvalue magnitude
Project data onto top k eigenvectors for k-dimensional representation

Why eigenvalues maximize variance:

The first principal component v₁ maximizes variance subject to ||v||=1:

$$v_1 = \arg\max_{||v||=1} \text{Var}(Xv) = \arg\max_{||v||=1} v^T \Sigma v$$

By the method of Lagrange multipliers, the solution satisfies Σv₁ = λ₁v₁ — the eigenvector with largest eigenvalue!

pca_from_scratch.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
 
class PCA:
    """
    Principal Component Analysis from scratch using eigendecomposition.
    """
    
    def __init__(self, n_components=None):
        self.n_components = n_components
        self.components_ = None      # Principal component directions
        self.explained_variance_ = None
        self.explained_variance_ratio_ = None
        self.mean_ = None
    
    def fit(self, X):
        """
        Fit PCA to data X.
        
        Steps:
        1. Center the data
        2. Compute covariance matrix
        3. Eigendecomposition (spectral decomposition)
        4. Sort by eigenvalue magnitude
        """
        n_samples, n_features = X.shape
        
        # Step 1: Center the data
        self.mean_ = X.mean(axis=0)
        X_centered = X - self.mean_
        
        # Step 2: Compute covariance matrix
        # Cov = X^T X / (n-1) for centered data
        covariance_matrix = X_centered.T @ X_centered / (n_samples - 1)
        
        # Step 3: Eigendecomposition (use eigh for symmetric matrix)
        eigenvalues, eigenvectors = np.linalg.eigh(covariance_matrix)
        
        # Step 4: Sort by eigenvalue (descending)
        idx = np.argsort(eigenvalues)[::-1]
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
        
        # Store results
        if self.n_components is not None:
            eigenvalues = eigenvalues[:self.n_components]
            eigenvectors = eigenvectors[:, :self.n_components]
        
        self.components_ = eigenvectors.T  # Rows are components
        self.explained_variance_ = eigenvalues
        total_variance = eigenvalues.sum() if self.n_components is None else X_centered.var(axis=0).sum()
        self.explained_variance_ratio_ = eigenvalues / X_centered.var(axis=0).sum()
        
        return self
    
    def transform(self, X):
        """Project data onto principal components."""
        X_centered = X - self.mean_
        return X_centered @ self.components_.T
    
    def fit_transform(self, X):
        """Fit and transform in one step."""
        self.fit(X)
        return self.transform(X)
    
    def inverse_transform(self, X_reduced):
        """Reconstruct data from reduced representation."""
        return X_reduced @ self.components_ + self.mean_
 
# Load and visualize with Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
 
print("Iris Dataset: PCA Analysis")
print("=" * 60)
print(f"Original shape: {X.shape} (150 samples, 4 features)")
 
# Fit our PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
 
print(f"Reduced shape: {X_reduced.shape}")
print(f"\nExplained variance per component:")
for i, (var, ratio) in enumerate(zip(pca.explained_variance_, pca.explained_variance_ratio_)):
    print(f"  PC{i+1}: variance = {var:.4f}, ratio = {ratio*100:.2f}%")
print(f"\nTotal variance explained: {pca.explained_variance_ratio_.sum()*100:.2f}%")
 
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Left: Projected data
ax = axes[0]
colors = ['red', 'green', 'blue']
for i, color in enumerate(colors):
    mask = y == i
    ax.scatter(X_reduced[mask, 0], X_reduced[mask, 1], 
              c=color, label=iris.target_names[i], alpha=0.7)
ax.set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)')
ax.set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)')
ax.set_title('Iris Dataset: PCA Projection')
ax.legend()
ax.grid(True, alpha=0.3)
 
# Right: Scree plot
ax = axes[1]
full_pca = PCA().fit(X)
var_ratios = full_pca.explained_variance_ratio_
cumulative = np.cumsum(var_ratios)
 
ax.bar(range(1, len(var_ratios)+1), var_ratios, alpha=0.7, label='Individual')
ax.plot(range(1, len(var_ratios)+1), cumulative, 'ro-', label='Cumulative')
ax.axhline(y=0.95, color='green', linestyle='--', label='95% threshold')
ax.set_xlabel('Principal Component')
ax.set_ylabel('Variance Ratio')
ax.set_title('Scree Plot: How Many Components?')
ax.legend()
ax.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('pca_analysis.png', dpi=150)
plt.show()
 
# Verify against sklearn
from sklearn.decomposition import PCA as SklearnPCA
sklearn_pca = SklearnPCA(n_components=2).fit(X)
print(f"\nVerification against sklearn PCA:")
print(f"Variance ratios match: {np.allclose(pca.explained_variance_ratio_, sklearn_pca.explained_variance_ratio_)}")

PCA Insights from Eigenvalues

PageRank: Eigenvectors for Web Search

The mathematical model:

Build the link matrix L: L_ij = 1 if page j links to page i
Normalize columns: M_ij = L_ij / (out-degree of j)
Add damping to handle dead ends: P = α·M + (1-α)·(1/n)·𝟙
Find the dominant eigenvector of P (eigenvalue = 1)
This eigenvector = stationary distribution = PageRank scores

Why eigenvectors?

pagerank.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
import numpy as np
import matplotlib.pyplot as plt
 
def pagerank(adjacency_matrix, damping=0.85, max_iterations=100, tol=1e-8):
    """
    Compute PageRank using the power iteration method.
    
    Parameters:
        adjacency_matrix: A[i,j] = 1 if page j links to page i
        damping: Probability of following a link (vs random jump)
        max_iterations: Maximum number of iterations
        tol: Convergence tolerance
    
    Returns:
        PageRank scores (probability distribution)
    """
    n = adjacency_matrix.shape[0]
    
    # Step 1: Create column-stochastic transition matrix
    # Normalize by out-degree (column sums)
    out_degrees = adjacency_matrix.sum(axis=0)
    
    # Handle dangling nodes (no outgoing links)
    dangling = (out_degrees == 0)
    out_degrees[dangling] = 1  # Avoid division by zero
    
    M = adjacency_matrix / out_degrees
    M[:, dangling] = 1/n  # Dangling nodes link to everyone
    
    # Step 2: Add damping factor
    # P = α*M + (1-α) * (1/n) * ones
    teleport = np.ones((n, n)) / n
    P = damping * M + (1 - damping) * teleport
    
    # Step 3: Power iteration to find dominant eigenvector
    # Start with uniform distribution
    pagerank_scores = np.ones(n) / n
    
    for iteration in range(max_iterations):
        new_scores = P @ pagerank_scores
        
        # Check convergence
        diff = np.abs(new_scores - pagerank_scores).sum()
        pagerank_scores = new_scores
        
        if diff < tol:
            print(f"Converged after {iteration + 1} iterations")
            break
    
    return pagerank_scores
 
def pagerank_eigen(adjacency_matrix, damping=0.85):
    """
    Compute PageRank using explicit eigendecomposition.
    (Power iteration is preferred for large graphs, but this shows the connection.)
    """
    n = adjacency_matrix.shape[0]
    
    # Build transition matrix (same as above)
    out_degrees = adjacency_matrix.sum(axis=0)
    dangling = (out_degrees == 0)
    out_degrees[dangling] = 1
    M = adjacency_matrix / out_degrees
    M[:, dangling] = 1/n
    P = damping * M + (1 - damping) * np.ones((n, n)) / n
    
    # Find eigenvector with eigenvalue 1
    eigenvalues, eigenvectors = np.linalg.eig(P)
    
    # Find the eigenvalue closest to 1
    idx = np.argmin(np.abs(eigenvalues - 1))
    dominant_eigenvector = eigenvectors[:, idx].real
    
    # Normalize to be a probability distribution
    pagerank_scores = dominant_eigenvector / dominant_eigenvector.sum()
    
    return pagerank_scores, eigenvalues
 
# Example: Small web graph
#     0 -> 1 -> 3
#     |    ^    |
#     v    |    v
#     2 ---+    4
#     
pages = ['Homepage', 'Products', 'About', 'Blog', 'Contact']
n = len(pages)
 
# Adjacency matrix: A[i,j] = 1 if j links to i
# (Column j shows where page j links TO)
links = np.array([
    [0, 0, 1, 0, 0],  # Links TO Homepage
    [1, 0, 1, 0, 0],  # Links TO Products
    [1, 0, 0, 0, 0],  # Links TO About
    [0, 1, 0, 0, 0],  # Links TO Blog
    [0, 0, 0, 1, 0],  # Links TO Contact
])
 
print("PageRank Analysis")
print("=" * 60)
print("\nLink structure:")
for i, page in enumerate(pages):
    outlinks = [pages[j] for j in range(n) if links[j, i] == 1]
    print(f"  {page} -> {outlinks if outlinks else '[none]'}")
 
# Compute PageRank
print("\n" + "-" * 40)
print("Power Iteration Method:")
pr_power = pagerank(links.astype(float), damping=0.85)
for page, score in zip(pages, pr_power):
    print(f"  {page}: {score:.4f}")
 
print("\n" + "-" * 40)
print("Eigendecomposition Method:")
pr_eigen, eigenvalues = pagerank_eigen(links.astype(float), damping=0.85)
for page, score in zip(pages, pr_eigen):
    print(f"  {page}: {score:.4f}")
 
print(f"\nEigenvalues of transition matrix:")
for i, ev in enumerate(sorted(np.abs(eigenvalues), reverse=True)[:5]):
    print(f"  |λ{i+1}| = {ev:.6f}")
print("(Largest eigenvalue = 1 corresponds to stationary distribution)")
 
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
# Left: PageRank scores as bar chart
ax = axes[0]
colors = plt.cm.viridis(pr_power / pr_power.max())
ax.barh(pages, pr_power, color=colors)
ax.set_xlabel('PageRank Score')
ax.set_title('PageRank Scores')
ax.grid(True, alpha=0.3)
 
# Right: Eigenvalue spectrum
ax = axes[1]
theta = np.linspace(0, 2*np.pi, 100)
ax.plot(np.cos(theta), np.sin(theta), 'k--', alpha=0.3)
ax.scatter(eigenvalues.real, eigenvalues.imag, s=100, c='red', zorder=5)
ax.scatter([1], [0], s=200, c='green', marker='*', zorder=6, label='λ=1 (PageRank)')
ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.5, 1.5)
ax.set_aspect('equal')
ax.set_xlabel('Real')
ax.set_ylabel('Imaginary')
ax.set_title('Eigenvalues of Transition Matrix\n(Green star = PageRank eigenvector)')
ax.legend()
ax.grid(True, alpha=0.3)
 
plt.tight_layout()
plt.savefig('pagerank_analysis.png', dpi=150)
plt.show()

PageRank at Scale

Spectral Clustering: Communities from Eigenvectors

Spectral clustering discovers structure in graphs (and in data, via similarity graphs). The key insight: clusters correspond to near-zero eigenvalues of the graph Laplacian.

The Graph Laplacian:

For a graph with adjacency matrix W (weighted edges) and degree matrix D (diagonal, D_ii = sum of weights at node i):

Unnormalized Laplacian: L = D - W
Normalized Laplacian: L_sym = D^(-1/2) L D^(-1/2) = I - D^(-1/2) W D^(-1/2)

Key properties:

L is symmetric positive semi-definite
The smallest eigenvalue is always 0 (eigenvector = constant)
The multiplicity of eigenvalue 0 = number of connected components
For k clusters, the smallest k eigenvectors embed the graph into ℝᵏ, where k-means finds clusters

spectral_clustering.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
 
def build_similarity_graph(X, sigma=1.0):
    """
    Build a weighted adjacency matrix using Gaussian (RBF) similarity.
    W_ij = exp(-||x_i - x_j||^2 / (2 * sigma^2))
    """
    distances = cdist(X, X, 'sqeuclidean')
    W = np.exp(-distances / (2 * sigma**2))
    np.fill_diagonal(W, 0)  # No self-loops
    return W
 
def spectral_clustering(X, n_clusters, sigma=1.0):
    """
    Spectral clustering from scratch.
    
    Steps:
    1. Build similarity graph
    2. Compute unnormalized Laplacian L = D - W
    3. Find k smallest eigenvectors of L
    4. Use these as features for k-means
    """
    n = X.shape[0]
    
    # Step 1: Build similarity graph
    W = build_similarity_graph(X, sigma)
    
    # Step 2: Compute Laplacian
    D = np.diag(W.sum(axis=1))  # Degree matrix
    L = D - W  # Unnormalized Laplacian
    
    # Step 3: Eigendecomposition (using eigh for symmetric matrix)
    eigenvalues, eigenvectors = np.linalg.eigh(L)
    
    # The k smallest eigenvalues (excluding the trivial 0 eigenvalue for connected graphs)
    # For k clusters, we use the k smallest eigenvectors
    spectral_embedding = eigenvectors[:, :n_clusters]
    
    # Normalize rows (for normalized spectral clustering)
    spectral_embedding = spectral_embedding / np.linalg.norm(spectral_embedding, axis=1, keepdims=True)
    
    # Step 4: K-means in the spectral embedding
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels = kmeans.fit_predict(spectral_embedding)
    
    return labels, eigenvalues, spectral_embedding
 
# Generate synthetic data with clusters
np.random.seed(42)
 
# Two moons - classic test for spectral clustering
from sklearn.datasets import make_moons, make_circles
 
X_moons, y_moons = make_moons(n_samples=300, noise=0.05)
X_circles, y_circles = make_circles(n_samples=300, noise=0.05, factor=0.5)
 
# Figure
fig, axes = plt.subplots(2, 4, figsize=(18, 9))
 
for row, (X, y_true, name) in enumerate([(X_moons, y_moons, 'Two Moons'), 
                                          (X_circles, y_circles, 'Concentric Circles')]):
    
    # Column 1: Original data
    ax = axes[row, 0]
    ax.scatter(X[:, 0], X[:, 1], c=y_true, cmap='viridis', alpha=0.7)
    ax.set_title(f'{name}\nTrue Labels')
    ax.set_aspect('equal')
    
    # Column 2: K-means (fails on non-convex)
    kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
    labels_kmeans = kmeans.fit_predict(X)
    
    ax = axes[row, 1]
    ax.scatter(X[:, 0], X[:, 1], c=labels_kmeans, cmap='viridis', alpha=0.7)
    ax.set_title('K-means Clustering\n(fails on non-convex)')
    ax.set_aspect('equal')
    
    # Column 3: Spectral clustering
    labels_spectral, eigenvalues, embedding = spectral_clustering(X, n_clusters=2, sigma=0.3)
    
    ax = axes[row, 2]
    ax.scatter(X[:, 0], X[:, 1], c=labels_spectral, cmap='viridis', alpha=0.7)
    ax.set_title('Spectral Clustering\n(handles non-convex!)')
    ax.set_aspect('equal')
    
    # Column 4: Eigenvalue spectrum
    ax = axes[row, 3]
    ax.plot(eigenvalues[:20], 'o-', markersize=8)
    ax.axhline(y=0, color='red', linestyle='--', alpha=0.5)
    ax.set_xlabel('Index')
    ax.set_ylabel('Eigenvalue')
    ax.set_title('Laplacian Eigenvalues\n(Gap reveals # clusters)')
    ax.grid(True, alpha=0.3)
    
    # Highlight the spectral gap
    ax.axvline(x=1.5, color='green', linestyle='--', alpha=0.7, label='Spectral gap')
    ax.legend()
 
plt.tight_layout()
plt.savefig('spectral_clustering_demo.png', dpi=150)
plt.show()
 
print("Spectral Clustering Analysis")
print("=" * 60)
print("\nKey insight: The SPECTRAL GAP (jump between eigenvalues)")
print("reveals the number of clusters!")
print("\nFor 'Two Moons':")
print(f"  First 5 eigenvalues: {eigenvalues[:5].round(6)}")
print(f"  Notice gap after k=2: eigenvalue 3 is much larger than eigenvalue 2")

Why Spectral Clustering Works

Singular Value Decomposition: Eigenanalysis Extended

The Singular Value Decomposition (SVD) extends eigenanalysis to rectangular matrices. Any m×n matrix A can be decomposed as:

$$\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$

where:

U is m×m orthogonal (left singular vectors)
Σ is m×n diagonal (singular values, σ₁ ≥ σ₂ ≥ ... ≥ 0)
V is n×n orthogonal (right singular vectors)

Connection to eigenvalues:

Singular values σᵢ = √(eigenvalue of AᵀA) = √(eigenvalue of AAᵀ)
Right singular vectors V = eigenvectors of AᵀA
Left singular vectors U = eigenvectors of AAᵀ

Applications:

Latent Semantic Analysis (LSA): Topic modeling via SVD of document-term matrix
Recommendation Systems: Matrix completion via low-rank SVD approximation
Image Compression: Keep only top k singular values
Pseudoinverse: A⁺ = VΣ⁺Uᵀ for least squares

svd_applications.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import numpy as np
import matplotlib.pyplot as plt
 
def demonstrate_svd_compression():
    """
    Demonstrate low-rank approximation via SVD for image compression.
    """
    # Create a simple grayscale "image" (or use any image)
    # Using a gradient + pattern for illustration
    x = np.linspace(0, 4*np.pi, 100)
    y = np.linspace(0, 4*np.pi, 80)
    X, Y = np.meshgrid(x, y)
    image = np.sin(X) * np.cos(Y) + 0.5 * np.sin(2*X)
    
    print("SVD for Low-Rank Approximation (Image Compression)")
    print("=" * 60)
    print(f"Original image shape: {image.shape}")
    print(f"Original storage: {image.size} values")
    
    # Compute SVD
    U, singular_values, Vt = np.linalg.svd(image, full_matrices=False)
    
    print(f"\nSingular values (first 10): {singular_values[:10].round(4)}")
    print(f"Total singular values: {len(singular_values)}")
    
    # Reconstruct with different ranks
    ranks = [1, 5, 10, 20, 50, 80]
    
    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    
    # Original
    ax = axes[0, 0]
    ax.imshow(image, cmap='viridis')
    ax.set_title(f'Original\n{image.size} values')
    ax.axis('off')
    
    for idx, k in enumerate(ranks):
        # Low-rank approximation: A_k = U[:,:k] @ diag(s[:k]) @ Vt[:k,:]
        image_k = U[:, :k] @ np.diag(singular_values[:k]) @ Vt[:k, :]
        
        # Compression ratio
        storage = U.shape[0] * k + k + Vt.shape[1] * k
        compression = 100 * (1 - storage / image.size)
        
        # Reconstruction error
        error = np.linalg.norm(image - image_k) / np.linalg.norm(image)
        
        ax = axes.flat[idx + 1]
        ax.imshow(image_k, cmap='viridis')
        ax.set_title(f'Rank {k}\n{compression:.1f}% compression\nError: {error:.4f}')
        ax.axis('off')
    
    # Scree plot
    ax = axes[1, 3]
    ax.semilogy(singular_values, 'b-o', markersize=3)
    ax.set_xlabel('Index')
    ax.set_ylabel('Singular Value (log scale)')
    ax.set_title('Singular Value Decay')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('svd_compression.png', dpi=150)
    plt.show()
    
    return singular_values
 
def demonstrate_latent_semantic_analysis():
    """
    Demonstrate LSA for text similarity using SVD.
    """
    print("\n" + "=" * 60)
    print("Latent Semantic Analysis (LSA)")
    print("=" * 60)
    
    # Toy document-term matrix
    # Documents: rows, Terms: columns
    # Each entry = term frequency
    documents = [
        "cat dog pet",
        "dog cat animal pet",
        "car truck vehicle",
        "truck car motor vehicle",
        "pet animal love",
    ]
    
    # Build vocabulary and term-document matrix
    vocab = sorted(set(' '.join(documents).split()))
    n_docs = len(documents)
    n_terms = len(vocab)
    term_idx = {term: i for i, term in enumerate(vocab)}
    
    A = np.zeros((n_docs, n_terms))
    for i, doc in enumerate(documents):
        for word in doc.split():
            A[i, term_idx[word]] += 1
    
    print(f"Documents: {documents}")
    print(f"Vocabulary: {vocab}")
    print(f"\nDocument-Term Matrix A ({n_docs} docs x {n_terms} terms):")
    print(A)
    
    # SVD
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    
    print(f"\nSingular values: {s.round(4)}")
    
    # Project documents to 2D
    k = 2
    doc_embeddings = U[:, :k] @ np.diag(s[:k])
    
    print(f"\nDocument embeddings (2D):")
    for i, doc in enumerate(documents):
        print(f"  Doc {i} ({doc[:20]}...): {doc_embeddings[i].round(3)}")
    
    # Compute document similarities in LSA space
    from scipy.spatial.distance import cosine
    
    print(f"\nDocument similarities in LSA space:")
    for i in range(n_docs):
        for j in range(i+1, n_docs):
            sim = 1 - cosine(doc_embeddings[i], doc_embeddings[j])
            if sim > 0.7:  # Show high similarities
                print(f"  Doc {i} <-> Doc {j}: {sim:.3f}")
    
    return doc_embeddings
 
singular_values = demonstrate_svd_compression()
doc_embeddings = demonstrate_latent_semantic_analysis()

Eigenvalues in Neural Network Analysis

Eigenvalues are crucial for understanding the training dynamics of neural networks, particularly recurrent neural networks (RNNs).

Gradient flow and eigenvalues:

In a deep network, gradients flow through weight matrices. If W is a weight matrix:

|λ_max(W)| > 1 → gradients explode
|λ_max(W)| < 1 → gradients vanish

RNN dynamics:

An RNN updates hidden states as h_t = tanh(W_h h_{t-1} + W_x x_t). Over T timesteps, the Jacobian involves W_h^T. If |eigenvalues| ≠ 1:

|λ| > 1: exponential gradient explosion
|λ| < 1: exponential gradient vanishing

This is why standard RNNs struggle with long sequences—eigenvalue analysis explains the vanishing gradient problem!

Eigenvalue-Based Solutions in Deep Learning
Problem	Eigenvalue Diagnosis	Solution
Vanishing gradients in RNN	\|λ\| < 1 for W_h	LSTM/GRU (gating), orthogonal init
Exploding gradients	\|λ\| > 1 for W_h	Gradient clipping, spectral norm
Slow convergence	Large condition number	Batch normalization, Adam
Mode collapse in GANs	Singular Jacobian	Spectral normalization
Training instability	Complex eigenvalues	Careful initialization

rnn_eigenvalue_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
import numpy as np
import matplotlib.pyplot as plt
 
def analyze_rnn_weight_matrix(W, name="W"):
    """
    Analyze an RNN weight matrix for gradient flow properties.
    """
    eigenvalues = np.linalg.eigvals(W)
    
    print(f"\n{name}: {W.shape[0]}x{W.shape[1]} matrix")
    print(f"  Spectral radius (max |λ|): {np.max(np.abs(eigenvalues)):.4f}")
    print(f"  Min |λ|: {np.min(np.abs(eigenvalues)):.4f}")
    
    # Classify gradient behavior
    max_abs = np.max(np.abs(eigenvalues))
    if max_abs > 1.01:
        status = "⚠️ EXPLODING gradients risk"
    elif max_abs < 0.99:
        status = "⚠️ VANISHING gradients risk"
    else:
        status = "✓ Stable gradient flow"
    
    print(f"  Status: {status}")
    
    return eigenvalues
 
def compare_initializations(hidden_size=100):
    """
    Compare different RNN weight initializations via eigenvalue analysis.
    """
    print("RNN Weight Matrix Initialization Analysis")
    print("=" * 60)
    
    fig, axes = plt.subplots(2, 2, figsize=(12, 12))
    
    initializations = {
        'Random Gaussian (σ=1.0)': np.random.randn(hidden_size, hidden_size),
        'Xavier/Glorot': np.random.randn(hidden_size, hidden_size) / np.sqrt(hidden_size),
        'Orthogonal': np.linalg.qr(np.random.randn(hidden_size, hidden_size))[0],
        'Identity + small noise': np.eye(hidden_size) + 0.01 * np.random.randn(hidden_size, hidden_size),
    }
    
    for ax, (name, W) in zip(axes.flat, initializations.items()):
        eigenvalues = analyze_rnn_weight_matrix(W, name)
        
        # Plot eigenvalues in complex plane
        theta = np.linspace(0, 2*np.pi, 100)
        ax.plot(np.cos(theta), np.sin(theta), 'g--', alpha=0.5, label='Unit circle')
        ax.scatter(eigenvalues.real, eigenvalues.imag, alpha=0.6, s=20)
        
        max_abs = np.max(np.abs(eigenvalues))
        ax.set_xlim(-max(2, max_abs*1.1), max(2, max_abs*1.1))
        ax.set_ylim(-max(2, max_abs*1.1), max(2, max_abs*1.1))
        ax.set_aspect('equal')
        ax.set_xlabel('Real')
        ax.set_ylabel('Imaginary')
        ax.set_title(f'{name}\nSpectral radius: {max_abs:.3f}')
        ax.grid(True, alpha=0.3)
        ax.legend(loc='upper right')
    
    plt.suptitle('Eigenvalue Distribution for Different RNN Initializations', fontsize=14)
    plt.tight_layout()
    plt.savefig('rnn_eigenvalue_init.png', dpi=150)
    plt.show()
 
def simulate_gradient_flow(W, T=50):
    """
    Simulate gradient magnitude over time in an RNN.
    
    For a simple RNN, gradient at time 0 for loss at time T involves W^T.
    """
    spectral_radius = np.max(np.abs(np.linalg.eigvals(W)))
    
    gradient_norms = []
    gradient = np.ones(W.shape[0])
    gradient = gradient / np.linalg.norm(gradient)
    
    for t in range(T):
        gradient = W.T @ gradient
        gradient_norms.append(np.linalg.norm(gradient))
    
    return gradient_norms, spectral_radius
 
# Run analysis
np.random.seed(42)
compare_initializations(hidden_size=100)
 
# Simulate gradient flow
print("\n" + "=" * 60)
print("Gradient Flow Simulation")
print("=" * 60)
 
fig, ax = plt.subplots(figsize=(10, 6))
 
hidden_size = 50
T = 30
 
for spectral_target, label in [(0.8, 'σ_r = 0.8 (vanishing)'),
                                (1.0, 'σ_r = 1.0 (stable)'),
                                (1.2, 'σ_r = 1.2 (exploding)')]:
    # Create orthogonal matrix scaled to target spectral radius
    Q = np.linalg.qr(np.random.randn(hidden_size, hidden_size))[0]
    W = spectral_target * Q
    
    gradient_norms, _ = simulate_gradient_flow(W, T)
    ax.semilogy(gradient_norms, label=label, linewidth=2)
 
ax.set_xlabel('Timesteps back')
ax.set_ylabel('Gradient norm (log scale)')
ax.set_title('Gradient Flow vs Spectral Radius\n(Why RNNs struggle with long sequences)')
ax.legend()
ax.grid(True, alpha=0.3)
ax.axhline(y=1, color='gray', linestyle='--', alpha=0.5)
plt.savefig('gradient_flow_simulation.png', dpi=150)
plt.show()

LSTM and Eigenvalues

The Eigenanalysis Toolkit

We've seen eigenvalues and eigenvectors appear across diverse ML applications. Let's consolidate the patterns:

Eigenanalysis Across Machine Learning
Application	What's Diagonalized	Key Insight
PCA	Covariance matrix Σ	Eigenvectors = principal directions, eigenvalues = variance
PageRank	Transition matrix P	Dominant eigenvector (λ=1) = page importance
Spectral Clustering	Graph Laplacian L	Small eigenvalues = cluster indicators
SVD / LSA	Data matrix A (via AᵀA)	Low-rank structure for compression/similarity
RNN Analysis	Weight matrix W	Spectral radius determines gradient flow
Kernel PCA	Kernel matrix K	Nonlinear PCA in feature space
Markov Chains	Transition matrix P	Eigenvalue 1 = steady state; \|λ₂\| = mixing rate
Graph Embedding	Adjacency/Laplacian	Eigenvectors = node features
Optimization	Hessian H	Eigenvalues = curvature, condition number

Unifying Principles

•Eigenvectors reveal structure: Principal directions, cluster memberships, page importances, and stable dynamics all emerge as eigenvectors of the right matrix.
•Eigenvalues quantify importance: Variance explained, convergence rates, spectral gaps, and stability all trace to eigenvalue magnitudes.
•Low rank means compressibility: When eigenvalues decay rapidly, we can approximate with few components—the foundation of dimensionality reduction.
•Symmetric matrices are well-behaved: Covariance matrices, kernel matrices, and graph Laplacians are symmetric → guaranteed real eigenvalues and orthogonal eigenvectors.
•Spectral gaps reveal discrete structure: A gap in the eigenvalue spectrum often indicates cluster count, rank, or dimensionality.

Module Summary: Eigenvalues and Eigenvectors

We've completed a comprehensive exploration of eigenvalues and eigenvectors—from formal definitions to production ML systems. Let's consolidate everything:

Module Mastery Checklist

•Definition mastery: You understand Av = λv, the characteristic polynomial, and how to find eigenpairs.
•Geometric intuition: You visualize eigenvectors as invariant directions, eigenvalues as scaling factors, and transformations as ellipsoids.
•Computational literacy: You know power iteration, QR algorithm, and when to use eigh vs eigsh for different matrix types.
•Diagonalization expertise: You can write A = VΛV⁻¹, know when it's possible, and apply it to simplify matrix powers and functions.
•Spectral Theorem fluency: You understand why symmetric matrices are special and how this underpins PCA reliability.
•PCA implementation: You can derive and implement PCA from eigendecomposition of the covariance matrix.
•PageRank understanding: You see web ranking as finding the dominant eigenvector of a transition matrix.
•Spectral clustering intuition: You know how graph Laplacian eigenvectors reveal cluster structure.
•Deep learning connection: You understand how eigenvalues of weight matrices affect gradient flow and training stability.

Module Complete!

What's Next:

5 / 5