Spectral Clustering - Learning Module

Loading content...

0/245

Eigenstructure

The Geometry of Spectral Decomposition

The eigenvalues and eigenvectors of the graph Laplacian are not merely abstract mathematical objects—they encode rich geometric and combinatorial information about the graph's structure. The eigenvalue spectrum reveals how "clusterable" a graph is, while the eigenvectors provide coordinates in a space where cluster structure becomes apparent.

Understanding this eigenstructure is essential for three reasons:

Theoretical foundation: It explains why spectral clustering finds good partitions
Algorithm design: It guides choices like how many eigenvectors to use
Diagnostics: Eigenvalue gaps indicate cluster quality and help choose $k$

This page develops the mathematical theory connecting Laplacian spectra to graph partitions, culminating in the spectral relaxation that makes spectral clustering computationally tractable.

What You Will Learn

By the end of this page, you will understand: (1) The Rayleigh quotient and its connection to eigenvalues, (2) Why minimizing graph cuts leads to eigenvalue problems, (3) The spectral relaxation that transforms NP-hard partitioning into polynomial-time eigendecomposition, (4) How multiple eigenvectors combine for k-way clustering, and (5) The spectral gap and its role in cluster quality.

The Rayleigh Quotient and Variational Characterization

The Rayleigh quotient is the central mathematical object connecting optimization over vectors to eigenvalue problems. For a symmetric matrix $A$ and nonzero vector $x$, the Rayleigh quotient is:

$$R_A(x) = \frac{x^T A x}{x^T x}$$

Variational Characterization of Eigenvalues:

For a symmetric matrix $A$ with eigenvalues $\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n$ and corresponding orthonormal eigenvectors $v_1, v_2, \ldots, v_n$:

Minimum Principle: $$\lambda_1 = \min_{x \neq 0} R_A(x) = \min_{x \neq 0} \frac{x^T A x}{x^T x}$$

The minimum is achieved when $x = v_1$.

Min-Max Principle (Courant-Fischer Theorem): $$\lambda_k = \min_{S: \dim(S)=k} \max_{x \in S, x \neq 0} R_A(x) = \max_{T: \dim(T)=n-k+1} \min_{x \in T, x \neq 0} R_A(x)$$

Alternatively, for eigenvalues after the first: $$\lambda_k = \min_{x \perp v_1, \ldots, v_{k-1}} R_A(x)$$

The $k$-th eigenvalue is the minimum Rayleigh quotient over vectors orthogonal to the first $k-1$ eigenvectors.

Application to the Graph Laplacian:

For the unnormalized Laplacian $L$, the Rayleigh quotient has a beautiful interpretation:

$$R_L(f) = \frac{f^T L f}{f^T f} = \frac{\sum_{i,j} W_{ij}(f_i - f_j)^2 / 2}{\sum_i f_i^2}$$

This is a ratio of:

Numerator: Total weighted variation of $f$ across graph edges
Denominator: Total "energy" of the vector $f$

Minimizing $R_L(f)$ finds vectors that vary minimally across edges relative to their magnitude—vectors that are smooth over the graph.

Eigenvalue Interpretation:

$\lambda_1 = 0$: Achieved by $f = \mathbf{1}$ (constant function—zero variation)
$\lambda_2$: Minimum variation for non-constant functions (Fiedler value)
Higher $\lambda_k$: Functions with increasingly rapid variation across the graph

The eigenvectors form an orthonormal basis ordered by smoothness: $v_1$ is maximally smooth (constant), $v_2$ is the smoothest nontrivial function, and so on.

rayleigh_quotient_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import numpy as np
from scipy.linalg import eigh
from scipy.spatial.distance import cdist
 
def demonstrate_rayleigh_quotient():
    """
    Demonstrate the Rayleigh quotient and its minimization by eigenvectors.
    
    This function shows:
    1. Eigenvectors minimize the Rayleigh quotient
    2. The relationship between eigenvalues and function smoothness
    3. How the min principle provides eigenvalues variationally
    """
    np.random.seed(42)
    
    # Create a simple graph: path graph with 10 nodes
    n = 10
    W = np.zeros((n, n))
    for i in range(n-1):
        W[i, i+1] = W[i+1, i] = 1.0
    
    # Laplacian
    d = np.sum(W, axis=1)
    D = np.diag(d)
    L = D - W
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(L)
    
    print("Rayleigh Quotient Analysis")
    print("=" * 55)
    print("Graph: Path with 10 nodes (0-1-2-...-9)")
    print(f"\nEigenvalues: {eigenvalues.round(4)}")
    
    print("\n--- Rayleigh Quotient Verification ---")
    
    for k in range(min(4, n)):
        v = eigenvectors[:, k]
        # Compute Rayleigh quotient
        rq = (v @ L @ v) / (v @ v)
        print(f"\nEigenvector {k+1}:")
        print(f"  Values: {v.round(3)}")
        print(f"  Rayleigh quotient: {rq:.6f}")
        print(f"  Eigenvalue λ_{k+1}: {eigenvalues[k]:.6f}")
        print(f"  Match: {np.isclose(rq, eigenvalues[k])}")
    
    # Demonstrate minimization property
    print("\n--- Minimization Property ---")
    print("\nRandom vectors and their Rayleigh quotients:")
    
    for trial in range(5):
        random_v = np.random.randn(n)
        random_v = random_v / np.linalg.norm(random_v)
        rq = (random_v @ L @ random_v) / (random_v @ random_v)
        print(f"  Random vector {trial+1}: R(v) = {rq:.4f}")
    
    # Constrained minimization: orthogonal to constant
    print("\n--- Constrained Minimization (⊥ to constant vector) ---")
    
    for trial in range(5):
        random_v = np.random.randn(n)
        random_v = random_v - np.mean(random_v)  # Project out constant
        random_v = random_v / np.linalg.norm(random_v)
        rq = (random_v @ L @ random_v)
        print(f"  Random (centered) vector {trial+1}: R(v) = {rq:.4f}")
    
    print(f"\n  Fiedler eigenvector achieves minimum: R(v₂) = {eigenvalues[1]:.4f}")
    print("  (All random vectors have R(v) ≥ λ₂)")
    
    return L, eigenvalues, eigenvectors
 
 
def visualize_eigenvector_smoothness():
    """
    Visualize how eigenvectors become less smooth (more oscillatory)
    for higher eigenvalues.
    """
    # Path graph with 50 nodes
    n = 50
    W = np.zeros((n, n))
    for i in range(n-1):
        W[i, i+1] = W[i+1, i] = 1.0
    
    d = np.sum(W, axis=1)
    L = np.diag(d) - W
    eigenvalues, eigenvectors = eigh(L)
    
    print("\n" + "=" * 55)
    print("Eigenvector Smoothness Analysis (Path Graph n=50)")
    print("=" * 55)
    
    for k in [0, 1, 2, 5, 10, 20]:
        v = eigenvectors[:, k]
        
        # Measure smoothness: average absolute difference between neighbors
        smoothness = np.mean(np.abs(np.diff(v)))
        
        # Count sign changes (oscillations)
        sign_changes = np.sum(np.abs(np.diff(np.sign(v))) > 0)
        
        print(f"\nEigenvector {k+1} (λ = {eigenvalues[k]:.4f}):")
        print(f"  Mean neighbor difference: {smoothness:.4f}")
        print(f"  Sign changes: {sign_changes}")
        print(f"  Interpretation: {'Constant' if k == 0 else f'{k} oscillation(s)'}")
    
    print("\n→ Higher eigenvalue = less smooth = more oscillatory")
    print("→ For clustering: use LOW eigenvectors (smooth within clusters)")
 
 
if __name__ == "__main__":
    demonstrate_rayleigh_quotient()
    visualize_eigenvector_smoothness()

From Graph Cuts to Eigenvalue Problems

We now establish the precise mathematical connection between graph cut objectives and Laplacian eigenvalues. This derivation is the theoretical heart of spectral clustering.

RatioCut and the Laplacian (Two-Way Partition):

Consider partitioning $V$ into two sets $A$ and $\bar{A}$. Define the indicator vector:

$$f_i = \begin{cases} \sqrt{|\bar{A}|/|A|} & \text{if } i \in A \ -\sqrt{|A|/|\bar{A}|} & \text{if } i \in \bar{A} \end{cases}$$

This specific scaling ensures:

$f \perp \mathbf{1}$: The vector is orthogonal to the constant vector
$f^T f = n$: The vector has a specific norm

Key Theorem: With this definition: $$f^T L f = 2n \cdot \text{RatioCut}(A, \bar{A})$$

Proof sketch: $$f^T L f = \frac{1}{2}\sum_{i,j} W_{ij}(f_i - f_j)^2$$

For edges within $A$ or within $\bar{A}$, $f_i = f_j$, so contribution is zero.

For edges across the cut: $f_i - f_j = \sqrt{|\bar{A}|/|A|} + \sqrt{|A|/|\bar{A}|} = (|A| + |\bar{A}|)/\sqrt{|A||\bar{A}|}$

Summing and simplifying yields the result.

The Optimization Problem:

Minimizing RatioCut is equivalent to: $$\min_{A \subset V} f^T L f \quad \text{subject to } f \perp \mathbf{1}, f^T f = n, f \text{ takes only two values}$$

The constraint that $f$ takes only two values (corresponding to cluster membership) makes this a discrete optimization problem—NP-hard in general.

The Spectral Relaxation:

The brilliant insight: relax the discrete constraint! Allow $f$ to be any real-valued vector: $$\min_{f \in \mathbb{R}^n} f^T L f \quad \text{subject to } f \perp \mathbf{1}, f^T f = n$$

This is now a continuous optimization problem that we can solve!

By the Rayleigh quotient theory:

The minimum is achieved by the eigenvector corresponding to the smallest eigenvalue subject to $f \perp \mathbf{1}$
This is $v_2$, the Fiedler vector, with eigenvalue $\lambda_2$

The Fiedler vector is the relaxed solution to the RatioCut problem. To recover a partition, we round the continuous values to discrete cluster assignments (typically by the sign of components).

The Relaxation Strategy

Spectral clustering's elegance lies in this relaxation: (1) Formulate clustering as discrete optimization (NP-hard), (2) Relax to continuous optimization (eigenvalue problem, polynomial time), (3) Solve relaxed problem (eigendecomposition), (4) Round continuous solution to discrete clusters. This provides no theoretical guarantee of optimality, but works remarkably well in practice.

NCut and the Normalized Laplacian:

A parallel derivation works for the Normalized Cut. Define:

$$g_i = \begin{cases} \sqrt{\text{vol}(\bar{A})/\text{vol}(A)} & \text{if } i \in A \ -\sqrt{\text{vol}(A)/\text{vol}(\bar{A})} & \text{if } i \in \bar{A} \end{cases}$$

where $\text{vol}(A) = \sum_{i \in A} d_i$.

Then: $$g^T L g = \text{vol}(V) \cdot \text{NCut}(A, \bar{A})$$

with constraints $g \perp D\mathbf{1}$ and $g^T D g = \text{vol}(V)$.

Substituting $h = D^{1/2}g$ transforms this to: $$\min h^T L_{sym} h \quad \text{subject to } h \perp D^{1/2}\mathbf{1}, h^T h = \text{vol}(V)$$

The solution is the second eigenvector of $L_{sym}$. This justifies using the normalized Laplacian for clustering: it directly minimizes the normalized cut objective.

Generalization to K Clusters

Real clustering requires more than two clusters. The spectral relaxation generalizes beautifully to $k$-way partitioning.

K-Way RatioCut:

For $k$ clusters $C_1, \ldots, C_k$:

$$\text{RatioCut}k = \sum{j=1}^{k} \frac{\text{cut}(C_j, \bar{C_j})}{|C_j|}$$

We encode the partition with $k$ indicator vectors $h^{(1)}, \ldots, h^{(k)}$ where:

$$h_i^{(j)} = \begin{cases} 1/\sqrt{|C_j|} & \text{if } i \in C_j \ 0 & \text{otherwise} \end{cases}$$

These vectors are orthonormal: $(h^{(j)})^T h^{(l)} = \delta_{jl}$.

Collecting them into a matrix $H \in \mathbb{R}^{n \times k}$: $$\text{RatioCut}_k = \text{Tr}(H^T L H)$$

The Trace Minimization Form:

Minimizing $k$-way RatioCut becomes: $$\min_{H \in \mathbb{R}^{n \times k}} \text{Tr}(H^T L H) \quad \text{subject to } H^T H = I_k, H \text{ is a cluster indicator matrix}$$

Spectral Relaxation for K Clusters:

Dropping the "indicator matrix" constraint and allowing $H$ to be any matrix with orthonormal columns:

$$\min_{H \in \mathbb{R}^{n \times k}} \text{Tr}(H^T L H) \quad \text{subject to } H^T H = I_k$$

Ky Fan Theorem: The solution is the matrix whose columns are the $k$ eigenvectors corresponding to the $k$ smallest eigenvalues of $L$.

This is the theoretical foundation for the spectral clustering algorithm:

Compute the $k$ smallest eigenvectors of $L$ (or $L_{sym}$)
Form the $n \times k$ matrix $U$ with these eigenvectors as columns
View each row of $U$ as a $k$-dimensional embedding of the corresponding node
Cluster these $n$ embedded points using K-Means

The eigenvectors provide coordinates where cluster structure is revealed. K-Means in this space recovers partitions that approximately minimize RatioCut (or NCut for normalized Laplacian).

kway_spectral_derivation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import numpy as np
from scipy.linalg import eigh
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from scipy.spatial.distance import cdist
 
def spectral_clustering_from_first_principles(X, n_clusters, sigma=None):
    """
    Implement spectral clustering following the mathematical derivation.
    
    This implementation makes each step explicit to demonstrate
    the connection between theory and practice.
    
    Parameters:
    -----------
    X : ndarray, shape (n_samples, n_features)
        Data points
    n_clusters : int
        Number of clusters k
    sigma : float, optional
        Gaussian kernel bandwidth. If None, estimated from data.
    
    Returns:
    --------
    labels : ndarray
        Cluster assignments
    eigenvalues : ndarray
        Smallest k+1 eigenvalues (for diagnostics)
    embedding : ndarray
        The spectral embedding (row i = coordinates of point i)
    """
    n = X.shape[0]
    
    # Step 1: Construct similarity graph
    # Using Gaussian kernel (fully connected with soft weights)
    distances = cdist(X, X)
    if sigma is None:
        sigma = np.median(distances[distances > 0])
    
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)  # No self-loops
    
    # Step 2: Compute normalized Laplacian L_sym
    # L_sym = I - D^{-1/2} W D^{-1/2}
    d = np.sum(W, axis=1)
    d_inv_sqrt = np.zeros(n)
    d_inv_sqrt[d > 0] = 1.0 / np.sqrt(d[d > 0])
    D_inv_sqrt = np.diag(d_inv_sqrt)
    
    L_sym = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt
    
    # Step 3: Eigendecomposition
    # Get k smallest eigenvalues/eigenvectors
    # (Actually computing all, then taking first k)
    eigenvalues, eigenvectors = eigh(L_sym)
    
    # Step 4: Form embedding matrix U with first k eigenvectors
    U = eigenvectors[:, :n_clusters]
    
    # Step 5: Row-normalize (important for Ng-Jordan-Weiss algorithm)
    # Each row becomes a unit vector on the k-dimensional sphere
    row_norms = np.linalg.norm(U, axis=1, keepdims=True)
    row_norms[row_norms == 0] = 1  # Avoid division by zero
    U_normalized = U / row_norms
    
    # Step 6: K-Means clustering in embedded space
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels = kmeans.fit_predict(U_normalized)
    
    return labels, eigenvalues[:n_clusters+2], U_normalized
 
 
def analyze_trace_minimization():
    """
    Verify that eigenvectors minimize the trace objective.
    """
    np.random.seed(42)
    
    # Create data with 3 clusters
    X, y_true = make_blobs(n_samples=150, centers=3, cluster_std=0.5, random_state=42)
    n = X.shape[0]
    
    # Build graph
    distances = cdist(X, X)
    sigma = np.median(distances[distances > 0])
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    d = np.sum(W, axis=1)
    L = np.diag(d) - W
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(L)
    
    print("Trace Minimization Verification")
    print("=" * 55)
    
    k = 3  # Number of clusters
    
    # Trace using eigenvectors (optimal)
    U_opt = eigenvectors[:, :k]
    trace_opt = np.trace(U_opt.T @ L @ U_opt)
    
    print(f"\nNumber of clusters k = {k}")
    print(f"Eigenvalues λ₁, λ₂, λ₃: {eigenvalues[:3].round(4)}")
    print(f"Optimal trace (sum of first k eigenvalues): {trace_opt:.6f}")
    print(f"Sum of eigenvalues: {sum(eigenvalues[:k]):.6f}")
    
    # Compare with random orthonormal matrices
    print("\nComparison with random orthonormal matrices:")
    
    for trial in range(5):
        # Generate random n x k matrix with orthonormal columns
        random_mat = np.random.randn(n, k)
        Q, _ = np.linalg.qr(random_mat)
        U_random = Q[:, :k]
        
        trace_random = np.trace(U_random.T @ L @ U_random)
        print(f"  Trial {trial+1}: Tr(H^T L H) = {trace_random:.4f}")
    
    print(f"\n→ Eigenvector solution ({trace_opt:.4f}) achieves minimum")
    
    return L, eigenvalues, eigenvectors
 
 
def demonstrate_kmeans_on_embedding():
    """
    Show how K-Means in spectral embedding space works.
    """
    np.random.seed(42)
    
    # Create challenging data: 3 elongated clusters
    n_per = 100
    t = np.linspace(0, 2*np.pi, n_per)
    
    # Three curved clusters
    c1 = np.column_stack([np.cos(t), np.sin(t)]) + np.random.randn(n_per, 2) * 0.1
    c2 = np.column_stack([np.cos(t) + 3, np.sin(t)]) + np.random.randn(n_per, 2) * 0.1
    c3 = np.column_stack([np.cos(t) + 1.5, np.sin(t) + 2.5]) + np.random.randn(n_per, 2) * 0.1
    
    X = np.vstack([c1, c2, c3])
    y_true = np.array([0]*n_per + [1]*n_per + [2]*n_per)
    
    # Standard K-Means (fails on non-convex)
    kmeans_original = KMeans(n_clusters=3, random_state=42, n_init=10)
    labels_kmeans = kmeans_original.fit_predict(X)
    
    # Spectral clustering
    labels_spectral, eigenvalues, embedding = spectral_clustering_from_first_principles(
        X, n_clusters=3
    )
    
    # Evaluate
    from sklearn.metrics import adjusted_rand_score
    
    print("\n" + "=" * 55)
    print("K-Means vs Spectral Clustering on Curved Clusters")
    print("=" * 55)
    
    ari_kmeans = adjusted_rand_score(y_true, labels_kmeans)
    ari_spectral = adjusted_rand_score(y_true, labels_spectral)
    
    print(f"\nK-Means ARI: {ari_kmeans:.4f}")
    print(f"Spectral Clustering ARI: {ari_spectral:.4f}")
    print(f"\nEigenvalues: {eigenvalues[:5].round(4)}")
    print(f"Spectral gap (λ₄ - λ₃): {eigenvalues[3] - eigenvalues[2]:.4f}")
    
    print("\n→ Spectral clustering succeeds by transforming to embedding space")
    print("   where K-Means can find the correct clusters")
 
 
if __name__ == "__main__":
    analyze_trace_minimization()
    demonstrate_kmeans_on_embedding()

The Spectral Gap and Cluster Quality

The spectral gap—the difference between consecutive eigenvalues, particularly $\lambda_{k+1} - \lambda_k$—provides crucial information about cluster quality and helps determine the number of clusters.

Intuition Behind the Spectral Gap:

Recall that for a graph with exactly $k$ disconnected components, the first $k$ eigenvalues are exactly zero. The spectral gap between $\lambda_k$ (last zero) and $\lambda_{k+1}$ (first positive) is infinite (relatively speaking).

For a graph with $k$ nearly disconnected components (weak inter-cluster edges):

$\lambda_1, \ldots, \lambda_k$ are small (near zero)
$\lambda_{k+1}$ is significantly larger
The gap $\lambda_{k+1} - \lambda_k$ is large

Large spectral gap indicates clear cluster structure.

Formal Statement (Perturbation Theory):

If a graph can be partitioned into $k$ clusters with small inter-cluster connectivity, then:

The first $k$ eigenvectors are approximately block-constant (constant within clusters)
The first $k$ eigenvalues are $O(\epsilon)$ where $\epsilon$ measures inter-cluster edge weight
The $(k+1)$-th eigenvalue is $\Omega(1)$—bounded away from zero

Using the Spectral Gap to Choose k:

The eigengap heuristic: Choose $k$ to maximize the gap $\lambda_{k+1} - \lambda_k$.

Compute eigenvalues $\lambda_1 \leq \lambda_2 \leq \cdots$
Plot eigenvalues vs. index (scree plot)
Look for an "elbow" or large jump
The number of eigenvalues before the jump suggests the number of clusters

Caveats:

Real data may not have a clear gap—eigenvalues often decay smoothly
Multiple gaps at different scales may indicate hierarchical structure
The heuristic provides guidance, not guarantees
Domain knowledge should inform the final choice of $k$

Quality Metric: The Gap Ratio

A useful metric is the gap ratio: $$\text{Gap Ratio}k = \frac{\lambda{k+1}}{\lambda_k}$$

(with $\lambda_k > 0$)

Large gap ratio indicates $\lambda_{k+1}$ is much larger than $\lambda_k$, suggesting $k$ is a natural number of clusters.

spectral_gap_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
import numpy as np
from scipy.linalg import eigh
from scipy.spatial.distance import cdist
from sklearn.datasets import make_blobs
 
def analyze_spectral_gap(X, max_clusters=10, sigma=None):
    """
    Analyze the spectral gap to help determine the number of clusters.
    
    Parameters:
    -----------
    X : ndarray, shape (n_samples, n_features)
        Data points
    max_clusters : int
        Maximum number of clusters to consider
    sigma : float, optional
        Gaussian kernel bandwidth
    
    Returns:
    --------
    suggested_k : int
        Suggested number of clusters based on eigengap
    eigenvalues : ndarray
        Computed eigenvalues
    gaps : ndarray
        Eigenvalue gaps
    """
    n = X.shape[0]
    
    # Build similarity graph
    distances = cdist(X, X)
    if sigma is None:
        sigma = np.median(distances[distances > 0])
    
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    # Normalized Laplacian
    d = np.sum(W, axis=1)
    d_inv_sqrt = np.zeros(n)
    d_inv_sqrt[d > 0] = 1.0 / np.sqrt(d[d > 0])
    D_inv_sqrt = np.diag(d_inv_sqrt)
    L_sym = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt
    
    # Eigendecomposition
    eigenvalues, _ = eigh(L_sym)
    
    # Compute gaps
    gaps = np.diff(eigenvalues[:max_clusters+1])
    
    # Find largest gap (excluding first, which is always 0 -> something)
    # The gap at index i is λ_{i+1} - λ_i
    # If gap at index k-1 is largest, suggests k clusters
    suggested_k = np.argmax(gaps[1:max_clusters]) + 2  # +2 because we skip index 0
    
    return suggested_k, eigenvalues[:max_clusters+1], gaps[:max_clusters]
 
 
def demonstrate_spectral_gap():
    """
    Demonstrate spectral gap analysis on data with known cluster count.
    """
    np.random.seed(42)
    
    print("Spectral Gap Analysis")
    print("=" * 55)
    
    # Test with different numbers of true clusters
    for true_k in [2, 3, 5]:
        print(f"\n--- True number of clusters: {true_k} ---")
        
        X, _ = make_blobs(
            n_samples=true_k * 50,
            centers=true_k,
            cluster_std=0.5,
            random_state=42
        )
        
        suggested, eigenvalues, gaps = analyze_spectral_gap(X)
        
        print(f"First 8 eigenvalues: {eigenvalues[:8].round(5)}")
        print(f"Gaps: {gaps[:7].round(5)}")
        print(f"Largest gap after λ₁: gap at k={np.argmax(gaps[1:])+2}")
        print(f"Suggested k: {suggested}")
        print(f"Match: {'✓' if suggested == true_k else '✗'}")
    
    # Test with poorly separated clusters
    print("\n--- Poorly Separated Clusters ---")
    X_hard, _ = make_blobs(
        n_samples=150,
        centers=3,
        cluster_std=2.0,  # High variance = overlapping clusters
        random_state=42
    )
    
    suggested, eigenvalues, gaps = analyze_spectral_gap(X_hard)
    print(f"First 8 eigenvalues: {eigenvalues[:8].round(5)}")
    print(f"Gaps: {gaps[:7].round(5)}")
    print("Note: No clear gap - clusters are hard to separate")
    print("The eigengap heuristic is less reliable for overlapping clusters")
 
 
def eigenvalue_scree_plot_data():
    """
    Generate data for creating a scree plot (eigenvalue vs index).
    """
    np.random.seed(42)
    
    # Create data with 4 well-separated clusters
    X, y = make_blobs(n_samples=200, centers=4, cluster_std=0.5, random_state=42)
    
    # Build graph
    distances = cdist(X, X)
    sigma = np.median(distances[distances > 0])
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    d = np.sum(W, axis=1)
    d_inv_sqrt = np.zeros(len(d))
    d_inv_sqrt[d > 0] = 1.0 / np.sqrt(d[d > 0])
    D_inv_sqrt = np.diag(d_inv_sqrt)
    L_sym = np.eye(len(X)) - D_inv_sqrt @ W @ D_inv_sqrt
    
    eigenvalues, _ = eigh(L_sym)
    
    print("\n" + "=" * 55)
    print("Scree Plot Data (4 true clusters)")
    print("=" * 55)
    print("\nIndex | Eigenvalue | Gap to Next")
    print("-" * 40)
    
    for i in range(10):
        gap = eigenvalues[i+1] - eigenvalues[i] if i < 9 else 0
        marker = " <-- large gap" if i+1 == 4 and gap > 0.1 else ""
        print(f"{i+1:5} | {eigenvalues[i]:10.6f} | {gap:10.6f}{marker}")
    
    print("\n→ Large gap after λ₄ confirms 4 clusters")
    
    return eigenvalues[:15]
 
 
if __name__ == "__main__":
    demonstrate_spectral_gap()
    eigenvalue_scree_plot_data()

The Eigengap Heuristic is Not Foolproof

While the spectral gap is useful, it has limitations: (1) Overlapping clusters produce no clear gap, (2) Hierarchical cluster structure produces multiple gaps, (3) The heuristic depends on graph construction (different σ = different gaps), (4) Very unbalanced clusters may not show clear gaps. Always combine the eigengap with domain knowledge and other methods (silhouette analysis, stability).

Interpreting Eigenvector Values

Beyond using eigenvectors for clustering, understanding what their actual values mean provides deeper insight into graph structure.

Eigenvector 1: The Trivial Solution

For a connected graph, $v_1 \propto \mathbf{1}$ (constant vector). This encodes the trivial partition: all nodes in one cluster. eigenvalue $\lambda_1 = 0$ means zero cut cost, but it's useless for clustering.

For the normalized Laplacian $L_{sym}$, the first eigenvector is $v_1 \propto D^{1/2}\mathbf{1}$, which weights nodes by square root of degree.

Eigenvector 2: The Fiedler Vector

The Fiedler vector $v_2$ provides the coarsest non-trivial partition:

Sign indicates cluster membership: For two clusters, $\text{sign}(v_2(i))$ gives cluster assignment
Magnitude indicates centrality: Nodes with values near zero are on the "boundary" between clusters
Ordering reveals structure: Sorting nodes by Fiedler vector value often reveals natural orderings (e.g., geographic, hierarchical)

Fiedler Vector Example: For a barbell graph (two cliques connected by a path):

Nodes in one clique have positive Fiedler values
Nodes in other clique have negative values
Path nodes transition smoothly from positive to negative

Higher Eigenvectors: Refined Partitions

Eigenvectors $v_3, v_4, \ldots$ capture progressively finer structure:

$v_3$ is orthogonal to $v_1$ and $v_2$—it partitions in a way independent of the Fiedler partition
For $k$ clusters, the first $k$ eigenvectors together provide coordinates where cluster structure is revealed

Geometric Interpretation:

Consider the $k$-dimensional embedding where point $i$ has coordinates $(v_2(i), v_3(i), \ldots, v_{k+1}(i))$. In this space:

Points from the same cluster are close together
Points from different clusters are far apart
The embedding "unfolds" non-convex clusters into linearly separable regions

For the normalized Laplacian with row normalization (Ng-Jordan-Weiss):

Each point lies on the unit sphere in $\mathbb{R}^k$
Ideal clusters correspond to distinct points on the sphere
K-Means finds cluster centers on the sphere

Random Walk Interpretation:

For the normalized Laplacian, eigenvector values have a random walk interpretation:

Eigenvector components relate to hitting times and commute times between nodes
Clusters correspond to regions where random walks are "trapped"

Eigenvector Interpretation Summary
Eigenvector	Eigenvalue	Interpretation	Use in Clustering
v₁	λ₁ = 0	Constant (connected component indicator)	Skip for clustering
v₂ (Fiedler)	λ₂ = algebraic connectivity	Coarsest partition; boundary identification	Binary clustering by sign
v₃	λ₃	Second partition direction (⊥ to Fiedler)	Additional dimension for 3+ clusters
v₂, ..., vₖ₊₁	λ₂, ..., λₖ₊₁	k-dimensional embedding coordinates	K-Means in embedding space

Theoretical Guarantees and Limitations

Spectral clustering has both strong theoretical foundations and known limitations. Understanding both is essential for principled application.

The Cheeger Inequality:

The Cheeger inequality provides a fundamental connection between the spectral gap and the optimal cut:

$$\frac{h_G^2}{2} \leq \lambda_2 \leq 2h_G$$

where $h_G$ is the Cheeger constant (isoperimetric number):

$$h_G = \min_{S \subset V, \text{vol}(S) \leq \text{vol}(V)/2} \frac{\text{cut}(S, \bar{S})}{\text{vol}(S)}$$

Implications:

If $\lambda_2$ is large, no good cut exists (graph is well-connected)
If $\lambda_2$ is small, a good cut exists (and spectral methods will find something close to it)
The bound is tight up to a quadratic factor—spectral methods provide a $O(\sqrt{h_G})$ approximation

Higher-Order Cheeger Inequalities:

For $k$ clusters, analogous inequalities relate $\lambda_k$ to the optimal $k$-way partition quality. This justifies using the first $k$ eigenvectors for $k$-way clustering.

Known Limitations:

Relaxation Gap: The continuous solution may be far from the discrete optimum. The spectral relaxation provides no guarantee on how much the rounding step loses.
Sensitivity to Graph Construction: Results depend heavily on the similarity graph. Two reasonable constructions may give very different clusterings.
Number of Clusters: Spectral clustering requires specifying $k$. The eigengap heuristic helps but isn't reliable for all data.
Computational Cost: Eigendecomposition is $O(n^3)$ for dense graphs, limiting scalability.
Cluster Shape Assumptions: While spectral methods handle non-convex shapes better than K-Means, they still assume clusters are connected components. Very diffuse clusters may not be found.

When Spectral Clustering Works Best:

Data has clear cluster structure with separation
Clusters are defined by connectivity (similarity), not centroids
Data lies on low-dimensional manifolds
Community detection in networks
$n$ is moderate (thousands, not millions)

The Power and Limits of Relaxation

Spectral clustering exemplifies a general strategy in optimization: relax hard constraints, solve the relaxed problem, round the solution back. This works well when: (1) The relaxed problem is much easier (here: eigenvalue vs. NP-hard), (2) The rounding doesn't lose too much (here: Cheeger bounds), (3) The relaxed solution has useful structure (here: eigenvector smoothness). But it provides no worst-case guarantees—when it fails, it can fail badly.

Summary: Eigenstructure

We have developed a deep understanding of how Laplacian eigenstructure connects to graph partitioning and enables spectral clustering:

Key Takeaways

•Rayleigh Quotient — Eigenvalues minimize the Rayleigh quotient; eigenvectors are the minimizers. For the Laplacian, this measures function smoothness over the graph.
•RatioCut Connection — The graph cut objective can be written as a Rayleigh quotient with discrete constraints. Relaxing these constraints yields an eigenvalue problem.
•K-Way Generalization — For k clusters, we minimize Tr(H^T L H); the solution is the matrix of first k eigenvectors (Ky Fan theorem).
•Spectral Gap — Large gap λₖ₊₁ - λₖ indicates clear k-cluster structure. The eigengap heuristic helps choose k.
•Eigenvector Interpretation — The Fiedler vector provides binary partitions; higher eigenvectors refine the embedding for multi-way clustering.
•Cheeger Inequality — Guarantees that spectral methods find cuts within a constant factor of optimal (up to quadratic).

What's Next:

With the theoretical foundation established, we next study the Normalized Cut (NCut) objective in detail. We'll see why normalizing by cluster volume often works better than RatioCut, explore the Shi-Malik algorithm, and understand the random walk perspective on spectral clustering.

Page Complete

You now understand the eigenstructure of graph Laplacians and how it enables spectral clustering. The key insight is that minimizing graph cuts, when relaxed to a continuous problem, becomes finding Laplacian eigenvectors. This elegant mathematical framework transforms an NP-hard combinatorial problem into tractable linear algebra.