Machine LearningSpectral Clustering

Spectral Clustering

LevelAdvanced

Duration90 mins

TopicSpectral Clustering

2 / 5

Laplacian Matrices

The Mathematical Heart of Spectral Clustering

If similarity graphs are the conceptual foundation of spectral clustering, then the graph Laplacian matrix is its computational engine. This matrix—seemingly just a simple difference between degree and adjacency matrices—possesses remarkable spectral properties that encode the graph's cluster structure in its eigenvalues and eigenvectors.

The Laplacian is not merely a convenient mathematical tool. It arises naturally in diverse contexts: heat diffusion on graphs, electrical resistance networks, random walks, and the continuous limit of differential operators on manifolds. These connections explain why the Laplacian's spectral properties reveal intrinsic geometric structure.

Mastering the Laplacian matrix is essential for understanding not just spectral clustering, but an entire family of graph-based machine learning methods including graph neural networks, Laplacian eigenmaps, and diffusion maps.

What You Will Learn

By the end of this page, you will understand: (1) The definition and construction of the unnormalized graph Laplacian, (2) Fundamental properties including positive semi-definiteness and the multiplicity of zero eigenvalues, (3) The normalized Laplacian variants and when to use each, (4) How the Laplacian connects to the continuous Laplace operator, and (5) Why the Laplacian's eigenvectors encode cluster structure.

The Unnormalized Graph Laplacian

Given an undirected weighted graph $G = (V, E, W)$ with $n$ vertices, we define three fundamental matrices:

Adjacency/Weight Matrix $W$: $$W_{ij} = \begin{cases} w_{ij} & \text{if edge } (i,j) \in E \ 0 & \text{otherwise} \end{cases}$$

For an undirected graph, $W = W^T$ (symmetric). Weights $w_{ij} \geq 0$ encode similarity.

Degree Matrix $D$: $$D = \text{diag}(d_1, d_2, \ldots, d_n)$$

where $d_i = \sum_{j=1}^{n} W_{ij}$ is the degree of vertex $i$—the total weight of edges incident to $i$.

Unnormalized Graph Laplacian $L$: $$L = D - W$$

This deceptively simple definition encodes profound structure. Let's examine what this matrix looks like and why it matters.

Matrix Structure:

The Laplacian has a specific pattern:

$$L_{ij} = \begin{cases} d_i & \text{if } i = j \ -W_{ij} & \text{if } i \neq j \text{ and } (i,j) \in E \ 0 & \text{otherwise} \end{cases}$$

Diagonal entries: The degree of each vertex (positive)
Off-diagonal entries: Negative of edge weights (non-positive)
Row sums: Each row sums to exactly zero: $\sum_j L_{ij} = d_i - \sum_j W_{ij} = 0$

Example: Simple Graph with 4 Nodes:

Consider an unweighted path graph: 1—2—3—4

$$W = \begin{pmatrix} 0 & 1 & 0 & 0 \ 1 & 0 & 1 & 0 \ 0 & 1 & 0 & 1 \ 0 & 0 & 1 & 0 \end{pmatrix}, \quad D = \begin{pmatrix} 1 & 0 & 0 & 0 \ 0 & 2 & 0 & 0 \ 0 & 0 & 2 & 0 \ 0 & 0 & 0 & 1 \end{pmatrix}$$

$$L = D - W = \begin{pmatrix} 1 & -1 & 0 & 0 \ -1 & 2 & -1 & 0 \ 0 & -1 & 2 & -1 \ 0 & 0 & -1 & 1 \end{pmatrix}$$

Notice: rows sum to zero, matrix is symmetric, diagonal entries are degrees.

laplacian_construction.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
import numpy as np
from scipy.sparse import diags, csr_matrix
from scipy.sparse.linalg import eigsh
 
class GraphLaplacian:
                                    """
    Comprehensive implementation of graph Laplacian matrices and their variants.
    
    The graph Laplacian is the cornerstone of spectral graph theory and spectral
    clustering. This class implements multiple variants and demonstrates their
    properties through eigendecomposition.
    """
    
    def __init__(self, W):
        """
        Initialize with a symmetric weight matrix W.
        
        Parameters:
        -----------
        W : ndarray, shape (n, n)
            Symmetric weight matrix. W[i,j] >= 0 represents the edge weight
            between nodes i and j. Self-loops (diagonal) should typically be 0.
        """
        self.W = np.asarray(W, dtype=np.float64)
        self.n = W.shape[0]
        
        # Verify symmetry
        if not np.allclose(self.W, self.W.T):
            raise ValueError("Weight matrix must be symmetric")
        
        # Compute degrees
        self.d = np.sum(self.W, axis=1)  # Row sums = degrees
        self.D = np.diag(self.d)          # Degree matrix
        
    def unnormalized(self):
        """
        Compute the unnormalized graph Laplacian: L = D - W
        
        Properties:
        -----------
        - Symmetric, positive semi-definite
        - Smallest eigenvalue is 0 with eigenvector = [1, 1, ..., 1]^T
        - Number of zero eigenvalues = number of connected components
        - For vector f: f^T L f = (1/2) * sum_{i,j} W_{ij}(f_i - f_j)^2
        
        Returns:
        --------
        L : ndarray, shape (n, n)
            Unnormalized Laplacian matrix
        """
        L = self.D - self.W
        return L
    
    def verify_quadratic_form(self, f=None):
        """
        Verify the Laplacian quadratic form identity:
        
        f^T L f = (1/2) * sum_{i,j} W_{ij} * (f_i - f_j)^2
        
        This identity shows L measures "smoothness" of f over the graph.
        A function f is smooth if it varies slowly along edges.
        """
        if f is None:
            f = np.random.randn(self.n)  # Random test vector
        
        L = self.unnormalized()
        
        # Left side: quadratic form
        left_side = f @ L @ f
        
        # Right side: sum of squared differences
        right_side = 0.0
        for i in range(self.n):
            for j in range(self.n):
                right_side += self.W[i, j] * (f[i] - f[j])**2
        right_side *= 0.5
        
        return {
            'quadratic_form': left_side,
            'sum_of_differences': right_side,
            'are_equal': np.isclose(left_side, right_side)
        }
    
    def normalized_symmetric(self):
        """
        Compute the symmetric normalized Laplacian: L_sym = D^{-1/2} L D^{-1/2}
        
        Also written as: L_sym = I - D^{-1/2} W D^{-1/2}
        
        Properties:
        -----------
        - Symmetric, positive semi-definite
        - Eigenvalues in [0, 2]
        - Related to random walk normalized Laplacian
        - Better for graphs with varying degrees
        
        Returns:
        --------
        L_sym : ndarray, shape (n, n)
            Symmetric normalized Laplacian
        """
        # Handle zero degrees carefully
        d_inv_sqrt = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv_sqrt[nonzero] = 1.0 / np.sqrt(self.d[nonzero])
        
        D_inv_sqrt = np.diag(d_inv_sqrt)
        
        # L_sym = I - D^{-1/2} W D^{-1/2}
        L_sym = np.eye(self.n) - D_inv_sqrt @ self.W @ D_inv_sqrt
        
        return L_sym
    
    def normalized_random_walk(self):
        """
        Compute the random walk normalized Laplacian: L_rw = D^{-1} L = I - D^{-1} W
        
        Properties:
        -----------
        - NOT symmetric (unless graph is regular)
        - Related to random walk transition matrix P = D^{-1} W
        - L_rw = I - P
        - Same eigenvalues as L_sym (similar matrices)
        - Eigenvectors related by: v_{L_sym} = D^{1/2} v_{L_rw}
        
        Returns:
        --------
        L_rw : ndarray, shape (n, n)
            Random walk normalized Laplacian
        """
        # Handle zero degrees
        d_inv = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv[nonzero] = 1.0 / self.d[nonzero]
        
        D_inv = np.diag(d_inv)
        
        L_rw = np.eye(self.n) - D_inv @ self.W
        
        return L_rw
    
    def transition_matrix(self):
        """
        Compute the random walk transition matrix: P = D^{-1} W
        
        P[i,j] = probability of walking from i to j
               = W[i,j] / d_i
        
        Returns:
        --------
        P : ndarray, shape (n, n)
            Row-stochastic transition matrix
        """
        d_inv = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv[nonzero] = 1.0 / self.d[nonzero]
        
        P = np.diag(d_inv) @ self.W
        return P
    
    def eigendecomposition(self, which='unnormalized', k=None):
        """
        Compute eigendecomposition of a Laplacian variant.
        
        Parameters:
        -----------
        which : str
            'unnormalized', 'symmetric', or 'random_walk'
        k : int or None
            Number of smallest eigenvalues/vectors to compute.
            If None, compute all.
        
        Returns:
        --------
        eigenvalues : ndarray
            Sorted eigenvalues (ascending)
        eigenvectors : ndarray
            Corresponding eigenvectors as columns
        """
        if which == 'unnormalized':
            L = self.unnormalized()
        elif which == 'symmetric':
            L = self.normalized_symmetric()
        elif which == 'random_walk':
            L = self.normalized_random_walk()
        else:
            raise ValueError(f"Unknown Laplacian type: {which}")
        
        if k is None:
            from scipy.linalg import eigh
            eigenvalues, eigenvectors = eigh(L)
        else:
            # For sparse matrices or when only few eigenvalues needed
            eigenvalues, eigenvectors = eigsh(
                csr_matrix(L), k=k, which='SM'  # Smallest Magnitude
            )
            # Sort by eigenvalue
            idx = np.argsort(eigenvalues)
            eigenvalues = eigenvalues[idx]
            eigenvectors = eigenvectors[:, idx]
        
        return eigenvalues, eigenvectors
 
 
def demonstrate_laplacian_properties():
    """
    Demonstrate fundamental properties of the graph Laplacian.
    """
    # Create a simple graph: two connected components (two triangles)
    # Triangle 1: nodes 0-1-2
    # Triangle 2: nodes 3-4-5
    W = np.zeros((6, 6))
    # Triangle 1
    W[0, 1] = W[1, 0] = 1
    W[0, 2] = W[2, 0] = 1
    W[1, 2] = W[2, 1] = 1
    # Triangle 2
    W[3, 4] = W[4, 3] = 1
    W[3, 5] = W[5, 3] = 1
    W[4, 5] = W[5, 4] = 1
    
    gl = GraphLaplacian(W)
    
    print("Graph Laplacian Properties Demonstration")
    print("=" * 55)
    print("\nGraph: Two disconnected triangles (6 nodes)")
    print("\nWeight matrix W:")
    print(W)
    
    L = gl.unnormalized()
    print("\nUnnormalized Laplacian L = D - W:")
    print(L)
    
    # Verify row sums are zero
    print(f"\nRow sums: {L.sum(axis=1)}")
    
    # Eigendecomposition
    eigenvalues, eigenvectors = gl.eigendecomposition('unnormalized')
    print(f"\nEigenvalues: {eigenvalues}")
    print("\nNote: Two zero eigenvalues -> Two connected components!")
    
    # Eigenvectors corresponding to zero eigenvalue
    print("\nEigenvectors for λ=0 (span the nullspace):")
    print(eigenvectors[:, :2].round(3))
    print("Note: Each eigenvector is constant within a connected component")
    
    # Verify quadratic form
    result = gl.verify_quadratic_form()
    print(f"\nQuadratic form identity verified: {result['are_equal']}")
    
    # Add a weak connection between components
    print("\n" + "=" * 55)
    print("Adding weak edge between triangles (W[2,3] = 0.1)...")
    W[2, 3] = W[3, 2] = 0.1
    
    gl2 = GraphLaplacian(W)
    eigenvalues2, eigenvectors2 = gl2.eigendecomposition('unnormalized')
    print(f"\nNew eigenvalues: {eigenvalues2.round(4)}")
    print("Note: Now only ONE zero eigenvalue (graph is connected)")
    print(f"Second eigenvalue (algebraic connectivity): {eigenvalues2[1]:.6f}")
    print("This small value indicates weak connectivity between components")
    
    return gl, gl2
 
 
if __name__ == "__main__":
    demonstrate_laplacian_properties()

Fundamental Properties of the Laplacian

The graph Laplacian possesses several remarkable properties that make it central to spectral graph theory. Understanding these properties is essential for understanding why spectral clustering works.

Property 1: Symmetry and Positive Semi-Definiteness

The unnormalized Laplacian $L = D - W$ is:

Symmetric: $L = L^T$ (since both $D$ and $W$ are symmetric)
Positive semi-definite: All eigenvalues $\lambda_i \geq 0$

Positive semi-definiteness follows from the quadratic form (Property 2).

Property 2: The Quadratic Form Identity

For any vector $f \in \mathbb{R}^n$:

$$f^T L f = \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} W_{ij}(f_i - f_j)^2$$

This identity is profound. The left side is a quadratic form—a standard object in linear algebra. The right side measures how much $f$ varies across graph edges, weighted by edge strength.

Interpretation: $f^T L f$ measures the smoothness of the function $f$ over the graph. If $f$ assigns similar values to connected nodes, the sum is small. If $f$ differs greatly across edges, the sum is large.

Proof of the Quadratic Form Identity

We derive this crucial identity. Start with $f^T L f = f^T (D - W) f = f^T D f - f^T W f$.

The first term: $f^T D f = \sum_i d_i f_i^2 = \sum_i f_i^2 \sum_j W_{ij} = \sum_{i,j} W_{ij} f_i^2$.

The second term: $f^T W f = \sum_{i,j} W_{ij} f_i f_j$.

Thus: $f^T L f = \sum_{i,j} W_{ij} f_i^2 - \sum_{i,j} W_{ij} f_i f_j = \sum_{i,j} W_{ij} (f_i^2 - f_i f_j)$.

By symmetry of $W$, this equals $(1/2) \sum_{i,j} W_{ij} [(f_i^2 - f_i f_j) + (f_j^2 - f_j f_i)] = (1/2) \sum_{i,j} W_{ij} (f_i - f_j)^2$. QED.

Property 3: Zero Eigenvalue and the Constant Eigenvector

The vector $\mathbf{1} = (1, 1, \ldots, 1)^T$ is always an eigenvector of $L$ with eigenvalue 0:

$$L \mathbf{1} = (D - W) \mathbf{1} = D\mathbf{1} - W\mathbf{1}$$

$D\mathbf{1} = (d_1, d_2, \ldots, d_n)^T$ and $W\mathbf{1} = (\sum_j W_{1j}, \ldots)^T = (d_1, \ldots, d_n)^T$

Thus $L\mathbf{1} = \mathbf{0}$.

Intuition: A constant function on the graph has zero "variation" across edges, so the quadratic form is zero. This corresponds to eigenvalue 0.

Property 4: Connected Components and Zero Eigenvalue Multiplicity

Theorem: The multiplicity of eigenvalue 0 equals the number of connected components in the graph.

If the graph has $k$ connected components, then $\lambda_1 = \cdots = \lambda_k = 0$
The corresponding eigenvectors are indicator vectors constant on each component

This is fundamental: eigenvalue 0 detects disconnection. If we see two zero eigenvalues, we know the graph has (at least) two components, each of which is already a natural cluster.

Eigenvalue Interpretation for Clustering
Zero Eigenvalues	Graph Structure	Clustering Implication
1	Graph is connected	Look at higher eigenvectors for structure
k (k > 1)	Graph has k connected components	Components are automatic clusters
Near-zero eigenvalues	Weak connections between subgraphs	Natural cluster structure exists

Property 5: The Algebraic Connectivity (Fiedler Value)

The second smallest eigenvalue $\lambda_2$ is called the algebraic connectivity or Fiedler value. For a connected graph:

$$\lambda_2 > 0$$

$\lambda_2$ measures how well-connected the graph is:

Large $\lambda_2$: Graph is well-connected, hard to partition
Small $\lambda_2$: Graph has a bottleneck, easy to partition

The corresponding eigenvector $v_2$ (Fiedler vector) provides the optimal partition direction for RatioCut.

The Cheeger Inequality relates algebraic connectivity to the minimum cut:

$$\frac{h^2}{2} \leq \lambda_2 \leq 2h$$

where $h$ is the Cheeger constant (minimum normalized cut). This inequality guarantees that spectral methods find cuts within a constant factor of optimal.

Normalized Laplacian Variants

The unnormalized Laplacian has a bias: high-degree nodes contribute more to the quadratic form than low-degree nodes. For graphs with varying node degrees, normalization can lead to better clustering.

Symmetric Normalized Laplacian $L_{sym}$:

$$L_{sym} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} W D^{-1/2}$$

where $D^{-1/2} = \text{diag}(d_1^{-1/2}, \ldots, d_n^{-1/2})$.

Properties of $L_{sym}$:

Symmetric (suitable for standard eigendecomposition)
Eigenvalues in $[0, 2]$
Zero eigenvalue with eigenvector $D^{1/2}\mathbf{1}$
Connected to the normalized cut objective

Random Walk Laplacian $L_{rw}$:

$$L_{rw} = D^{-1} L = I - D^{-1} W = I - P$$

where $P = D^{-1}W$ is the random walk transition matrix (row-stochastic).

Properties of $L_{rw}$:

Generally not symmetric (unless graph is regular)
Same eigenvalues as $L_{sym}$ (they are similar matrices)
Eigenvectors related by $v_{L_{sym}} = D^{1/2} v_{L_{rw}}$
Directly interprets clustering as random walk analysis

Relationship Between Normalized Laplacians:

If $(\lambda, v)$ is an eigenpair of $L_{rw}$, then $(\lambda, D^{1/2}v)$ is an eigenpair of $L_{sym}$.

Conversely, if $(\lambda, u)$ is an eigenpair of $L_{sym}$, then $(\lambda, D^{-1/2}u)$ is an eigenpair of $L_{rw}$.

This relationship means we can work with the symmetric $L_{sym}$ (better numerical properties) and transform eigenvectors when needed.

When to Use Which Laplacian:

Choosing the Right Laplacian
Laplacian	Use Case	Advantages	Disadvantages
Unnormalized L	Uniform degree graphs; RatioCut objective	Simple; direct interpretation	Biased toward high-degree nodes
Symmetric L_sym	General graphs; NCut objective	Symmetric; normalized by degree	Requires degree normalization of results
Random Walk L_rw	Random walk interpretation; NCut	Direct cluster indicator interpretation	Not symmetric (complicates computation)

laplacian_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
import numpy as np
from scipy.linalg import eigh
from sklearn.datasets import make_moons
from sklearn.neighbors import kneighbors_graph
from sklearn.cluster import KMeans
 
def compare_laplacians_for_clustering(X, n_clusters=2, n_neighbors=10):
    """
    Compare spectral clustering using different Laplacian variants.
    
    This demonstration shows how the choice of Laplacian affects
    the resulting clustering, particularly for graphs with
    varying node degrees.
    
    Parameters:
    -----------
    X : ndarray, shape (n_samples, n_features)
        Data points to cluster
    n_clusters : int
        Number of clusters
    n_neighbors : int
        Number of neighbors for KNN graph construction
    
    Returns:
    --------
    results : dict
        Clustering results for each Laplacian type
    """
    n = X.shape[0]
    
    # Construct KNN graph
    connectivity = kneighbors_graph(X, n_neighbors, mode='connectivity')
    W = 0.5 * (connectivity + connectivity.T).toarray()
    
    # Compute degree and degree matrix
    d = np.sum(W, axis=1)
    D = np.diag(d)
    
    results = {}
    
    # 1. Unnormalized Laplacian
    L = D - W
    eigenvalues_L, eigenvectors_L = eigh(L)
    embedding_L = eigenvectors_L[:, :n_clusters]
    # Cluster in embedding space
    kmeans_L = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels_L = kmeans_L.fit_predict(embedding_L)
    results['unnormalized'] = {
        'labels': labels_L,
        'eigenvalues': eigenvalues_L[:n_clusters+2],
        'embedding': embedding_L
    }
    
    # 2. Symmetric Normalized Laplacian
    d_inv_sqrt = np.zeros(n)
    d_inv_sqrt[d > 0] = 1.0 / np.sqrt(d[d > 0])
    D_inv_sqrt = np.diag(d_inv_sqrt)
    
    L_sym = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt
    eigenvalues_sym, eigenvectors_sym = eigh(L_sym)
    embedding_sym = eigenvectors_sym[:, :n_clusters]
    # Row-normalize embedding (important for normalized spectral clustering!)
    row_norms = np.linalg.norm(embedding_sym, axis=1, keepdims=True)
    row_norms[row_norms == 0] = 1
    embedding_sym_normalized = embedding_sym / row_norms
    
    kmeans_sym = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels_sym = kmeans_sym.fit_predict(embedding_sym_normalized)
    results['symmetric'] = {
        'labels': labels_sym,
        'eigenvalues': eigenvalues_sym[:n_clusters+2],
        'embedding': embedding_sym_normalized
    }
    
    # 3. Random Walk Laplacian (use symmetric for computation, transform)
    # For clustering, we use the same eigenvectors (they have same eigenvalues)
    # The difference is in interpretation and possible row normalization
    L_rw = np.eye(n) - np.diag(1.0/np.maximum(d, 1e-10)) @ W
    eigenvalues_rw, eigenvectors_rw = eigh(
        L_sym  # Use symmetric for computation (same eigenvalues)
    )
    # Transform eigenvectors: v_rw = D^{-1/2} v_sym
    eigenvectors_rw = D_inv_sqrt @ eigenvectors_sym
    embedding_rw = eigenvectors_rw[:, :n_clusters]
    
    kmeans_rw = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels_rw = kmeans_rw.fit_predict(embedding_rw)
    results['random_walk'] = {
        'labels': labels_rw,
        'eigenvalues': eigenvalues_rw[:n_clusters+2],
        'embedding': embedding_rw
    }
    
    return results, W, d
 
 
def analyze_degree_effect():
    """
    Analyze how node degree variation affects different Laplacians.
    """
    # Create two moons with varying noise (creates varying local density)
    X, y_true = make_moons(n_samples=200, noise=0.08, random_state=42)
    
    results, W, degrees = compare_laplacians_for_clustering(X)
    
    print("Spectral Clustering: Laplacian Comparison")
    print("=" * 55)
    print(f"\nDegree statistics: min={degrees.min():.1f}, max={degrees.max():.1f}, "
          f"mean={degrees.mean():.1f}")
    
    for name, res in results.items():
        acc1 = np.mean(res['labels'] == y_true)
        acc2 = np.mean(res['labels'] != y_true)
        accuracy = max(acc1, acc2)
        print(f"\n{name.upper()} Laplacian:")
        print(f"  Eigenvalues: {res['eigenvalues'][:4].round(4)}")
        print(f"  Accuracy: {accuracy:.1%}")
        
    print("\n" + "=" * 55)
    print("Note: For two moons with uniform density, results are similar.")
    print("Differences emerge with varying density or graph degree distribution.")
    
    return results, X, y_true
 
 
if __name__ == "__main__":
    analyze_degree_effect()

Practical Recommendation

In practice, the symmetric normalized Laplacian (L_sym) combined with row-normalization of eigenvectors (the Ng-Jordan-Weiss algorithm) is most commonly used. It handles varying node degrees well and connects directly to the normalized cut objective. The unnormalized Laplacian is simpler but can be biased toward isolating high-degree regions.

Connection to the Continuous Laplace Operator

The graph Laplacian is not merely a convenient construction—it is the discrete analog of the Laplace operator (or Laplacian) from calculus, which is fundamental in physics and mathematics.

The Continuous Laplacian:

For a function $f: \mathbb{R}^d \to \mathbb{R}$, the Laplace operator is:

$$\Delta f = \nabla^2 f = \sum_{i=1}^{d} \frac{\partial^2 f}{\partial x_i^2}$$

This operator appears in:

Heat equation: $\frac{\partial u}{\partial t} = \Delta u$ (heat diffuses proportional to Laplacian)
Wave equation: $\frac{\partial^2 u}{\partial t^2} = \Delta u$
Poisson equation: $\Delta u = f$
Laplace equation: $\Delta u = 0$ (harmonic functions)

Discretization on a Grid:

Consider a 1D grid with spacing $h$. The second derivative is approximated by:

$$f''(x) \approx \frac{f(x+h) - 2f(x) + f(x-h)}{h^2}$$

In matrix form for $n$ grid points with periodic boundary:

$$L_{grid} = \frac{1}{h^2} \begin{pmatrix} 2 & -1 & 0 & \cdots & -1 \ -1 & 2 & -1 & \cdots & 0 \ \vdots & & \ddots & & \vdots \ -1 & 0 & \cdots & -1 & 2 \end{pmatrix}$$

This is exactly the graph Laplacian for a cycle graph (up to the $1/h^2$ scaling)!

Why This Connection Matters:

The continuous Laplacian has well-studied spectral properties:

On a circle (1D torus): Eigenfunctions are sines and cosines of increasing frequency. Lower eigenfunctions vary slowly (smooth); higher eigenfunctions vary rapidly (oscillatory).
On a rectangle with Dirichlet boundary: Eigenfunctions are products of sines, with eigenvalues depending on spatial frequencies.
On general manifolds: Eigenfunctions encode geometric structure. The first nontrivial eigenfunction often separates the manifold into two "halves."

The graph Laplacian inherits these properties in discrete form:

Low eigenvectors: Vary slowly across the graph (respect connectivity)
High eigenvectors: Vary rapidly, capturing fine-grained structure
Spectral gap: Measures connectivity (analogous to manifold geometry)

Manifold Learning Connection:

When data lies on a manifold sampled with noise, the graph Laplacian (constructed with appropriate bandwidth) converges to the Laplace-Beltrami operator on the manifold as sample size grows. This justifies using spectral methods for manifold learning and explains why they discover intrinsic geometry.

Heat Diffusion Interpretation

Think of heat diffusion on a graph: each node has a temperature, and heat flows along edges proportional to temperature difference weighted by edge weight. The heat equation on graphs is df/dt = -Lf, where f(t) is the temperature vector at time t. The eigenvectors of L are the 'modes' of this diffusion—the first eigenvector (constant) is the equilibrium, and higher eigenvectors decay proportional to their eigenvalue. This diffusion view underlies diffusion maps and connects to random walks.

How Laplacian Eigenvectors Encode Cluster Structure

We now connect the Laplacian's spectral properties directly to the clustering problem. This section explains why eigenvectors of $L$ provide cluster information.

The Rayleigh Quotient Perspective:

For the unnormalized Laplacian, the Rayleigh quotient of a vector $f$ is:

$$R(f) = \frac{f^T L f}{f^T f} = \frac{\sum_{i,j} W_{ij}(f_i - f_j)^2 / 2}{\sum_i f_i^2}$$

Minimizing $R(f)$ finds vectors that vary minimally across graph edges. By the Rayleigh-Ritz theorem:

The minimum of $R(f)$ is $\lambda_1 = 0$, achieved by $f = \mathbf{1}$
The minimum over vectors orthogonal to $\mathbf{1}$ is $\lambda_2$, achieved by the Fiedler vector
The $k$-th smallest eigenvalue is the minimum over vectors orthogonal to the first $k-1$ eigenvectors

Why Eigenvectors Reveal Clusters:

If the graph has $k$ well-separated clusters (clusters connected internally but weakly between each other), then:

There are $k$ eigenvalues close to zero
The corresponding eigenvectors are approximately constant within each cluster
Points in the same cluster have similar values across these eigenvectors
Clustering in the space of these eigenvectors separates the original clusters

The Ideal Case: Disconnected Clusters

Consider a graph with exactly $k$ disconnected components. The Laplacian is block-diagonal:

$$L = \begin{pmatrix} L_1 & 0 & \cdots & 0 \ 0 & L_2 & \cdots & 0 \ \vdots & & \ddots & \vdots \ 0 & 0 & \cdots & L_k \end{pmatrix}$$

Each block $L_i$ is the Laplacian of component $i$, each having its own zero eigenvalue with constant eigenvector on that component.

The first $k$ eigenvectors of $L$ are indicator vectors: $$v_j^{(i)} = \begin{cases} 1 & \text{if node } j \text{ is in component } i \ 0 & \text{otherwise} \end{cases}$$

(scaled appropriately for orthonormality)

Clustering in the space of these $k$ eigenvectors is trivial—each cluster occupies a distinct corner of the $k$-dimensional cube.

The Near-Ideal Case: Weakly Connected Clusters

In practice, clusters are connected by weak edges. Perturbation theory tells us:

Eigenvalues perturb continuously from zero to small positive values
Eigenvectors smoothly deform from perfect indicators
Points within a cluster remain close in eigenvector space

This is why spectral clustering uses K-Means on eigenvectors: even with perturbation, cluster structure is preserved in the embedding.

eigenvector_cluster_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
import numpy as np
from scipy.linalg import eigh
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
 
def analyze_eigenvector_clustering():
    """
    Demonstrate how Laplacian eigenvectors encode cluster structure.
    
    We construct a graph with clear cluster structure and analyze
    how the eigenvectors reflect this structure.
    """
    np.random.seed(42)
    
    # Create 3 well-separated Gaussian clusters
    n_per_cluster = 50
    
    cluster1 = np.random.randn(n_per_cluster, 2) * 0.5 + [0, 0]
    cluster2 = np.random.randn(n_per_cluster, 2) * 0.5 + [5, 0]
    cluster3 = np.random.randn(n_per_cluster, 2) * 0.5 + [2.5, 4]
    
    X = np.vstack([cluster1, cluster2, cluster3])
    true_labels = np.array([0]*n_per_cluster + [1]*n_per_cluster + [2]*n_per_cluster)
    n = X.shape[0]
    
    # Construct Gaussian similarity graph
    distances = cdist(X, X)
    sigma = 1.0  # Controls connectivity
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    # Compute unnormalized Laplacian
    d = np.sum(W, axis=1)
    D = np.diag(d)
    L = D - W
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(L)
    
    print("Eigenvector Cluster Analysis")
    print("=" * 55)
    print(f"\nNumber of points: {n} ({n_per_cluster} per cluster)")
    print(f"\nFirst 6 eigenvalues: {eigenvalues[:6].round(4)}")
    print("\nNote: ~3 near-zero eigenvalues suggest 3 clusters")
    print(f"Spectral gap (λ₄ - λ₃): {eigenvalues[3] - eigenvalues[2]:.4f}")
    
    # Analyze eigenvector values by cluster
    print("\nEigenvector 2 (Fiedler) statistics by cluster:")
    v2 = eigenvectors[:, 1]
    for c in range(3):
        mask = true_labels == c
        print(f"  Cluster {c}: mean={v2[mask].mean():.3f}, std={v2[mask].std():.3f}")
    
    print("\nEigenvector 3 statistics by cluster:")
    v3 = eigenvectors[:, 2]
    for c in range(3):
        mask = true_labels == c
        print(f"  Cluster {c}: mean={v3[mask].mean():.3f}, std={v3[mask].std():.3f}")
    
    # Spectral embedding
    k = 3
    embedding = eigenvectors[:, 1:k+1]  # Skip first (constant) eigenvector
    
    # Row normalize (for normalized spectral clustering)
    row_norms = np.linalg.norm(embedding, axis=1, keepdims=True)
    row_norms[row_norms == 0] = 1
    embedding_normalized = embedding / row_norms
    
    print("\nSpectral Embedding Analysis:")
    print("After embedding in eigenvector space, clusters are linearly separable:")
    
    # Simple analysis: within-cluster vs between-cluster distances
    within_dists = []
    between_dists = []
    
    for i in range(n):
        for j in range(i+1, n):
            dist = np.linalg.norm(embedding_normalized[i] - embedding_normalized[j])
            if true_labels[i] == true_labels[j]:
                within_dists.append(dist)
            else:
                between_dists.append(dist)
    
    print(f"  Within-cluster mean distance: {np.mean(within_dists):.3f}")
    print(f"  Between-cluster mean distance: {np.mean(between_dists):.3f}")
    print(f"  Separation ratio: {np.mean(between_dists)/np.mean(within_dists):.2f}x")
    
    return X, eigenvalues, eigenvectors, true_labels
 
 
def visualize_spectral_gap():
    """
    Visualize how the spectral gap indicates cluster quality.
    
    We vary the separation between clusters and observe the
    eigenvalue spectrum.
    """
    print("\n" + "=" * 55)
    print("Spectral Gap Analysis: Varying Cluster Separation")
    print("=" * 55)
    
    separations = [1, 2, 4, 8]  # Distance between cluster centers
    
    for sep in separations:
        np.random.seed(42)
        n = 60
        
        # Two clusters with varying separation
        c1 = np.random.randn(n//2, 2) * 0.5
        c2 = np.random.randn(n//2, 2) * 0.5 + [sep, 0]
        X = np.vstack([c1, c2])
        
        # Build graph
        distances = cdist(X, X)
        sigma = 1.0
        W = np.exp(-distances**2 / (2 * sigma**2))
        np.fill_diagonal(W, 0)
        
        d = np.sum(W, axis=1)
        L = np.diag(d) - W
        
        eigenvalues = np.linalg.eigvalsh(L)
        gap = eigenvalues[2] - eigenvalues[1]
        
        print(f"\nSeparation = {sep}:")
        print(f"  λ₁={eigenvalues[0]:.4f}, λ₂={eigenvalues[1]:.4f}, λ₃={eigenvalues[2]:.4f}")
        print(f"  Spectral gap (λ₃-λ₂) = {gap:.4f}")
        print(f"  Interpretation: {'Clear clusters' if gap > 0.5 else 'Weak separation'}")
 
 
if __name__ == "__main__":
    analyze_eigenvector_clustering()
    visualize_spectral_gap()

The Spectral Gap Heuristic

A large gap between eigenvalue λₖ and λₖ₊₁ suggests the data has k natural clusters. This "eigengap heuristic" is widely used to choose the number of clusters: plot eigenvalues and look for the largest gap. However, this is a heuristic, not a guarantee—real data often has gradual eigenvalue decay without clear gaps.

Computational Considerations

Working with graph Laplacians at scale requires understanding their computational properties.

Matrix Properties:

Size: $n \times n$ where $n$ is the number of data points
Symmetry: All three Laplacian variants are symmetric (exploitable by specialized algorithms)
Sparsity: For sparse graphs (KNN, ε-neighborhood), the Laplacian is sparse with $O(n)$ to $O(kn)$ nonzeros
Positive semi-definite: Enables specialized solvers

Eigendecomposition Complexity:

Full eigendecomposition: $O(n^3)$ for dense matrices
For sparse matrices, iterative methods (Lanczos, ARPACK): $O(n k^2)$ to $O(n k \cdot \text{nnz})$ for $k$ eigenvalues
We only need the $k$ smallest eigenvalues, so iterative methods dominate

Memory Considerations:

Dense Laplacian: $O(n^2)$ memory—prohibitive for large $n$
Sparse Laplacian: $O(\text{nnz})$ memory—feasible for large $n$ with sparse graphs
Eigenvectors: $O(nk)$ memory for $k$ eigenvectors

Practical Limit: Direct spectral clustering is typically feasible for $n \lesssim 10^4-10^5$ with sparse graphs. Beyond this, approximations (Nyström, landmark methods) are necessary.

Computational Complexity Summary
Operation	Dense Graph	Sparse Graph (KNN with k neighbors)
Graph construction	O(n²d) for n points in d dimensions	O(n log n · d) with KD-tree
Laplacian storage	O(n²)	O(nk)
Full eigendecomp	O(n³)	Not applicable
Top-k eigendecomp	O(n²k)	O(nk² · iterations)
K-Means on embedding	O(n · k² · iters)	Same

Numerical Stability

Computing the Laplacian requires care: (1) Ensure degree matrix is computed accurately—small errors accumulate. (2) For normalized Laplacians, handle zero-degree nodes (isolated nodes) explicitly. (3) Use established libraries (scipy.sparse.linalg, scikit-learn) rather than implementing from scratch. (4) The smallest eigenvalue should be exactly 0—numerical error may give small positive or negative values; threshold appropriately.

Summary: Laplacian Matrices

We have developed a deep understanding of the graph Laplacian matrix—the mathematical heart of spectral clustering. Let's consolidate the key insights:

Key Takeaways

•Laplacian Definition — $L = D - W$ where $D$ is the degree matrix and $W$ is the weight matrix. Simple construction, profound properties.
•Quadratic Form — $f^T L f = (1/2) \sum_{i,j} W_{ij}(f_i - f_j)^2$ measures how much $f$ varies across edges—smoothness over the graph.
•Zero Eigenvalue Multiplicity — The number of zero eigenvalues equals the number of connected components. Near-zero eigenvalues indicate near-disconnection.
•Normalized Variants — $L_{sym}$ and $L_{rw}$ normalize by degree, handling varying node connectivity. $L_{sym}$ is preferred for computation (symmetric).
•Continuous Connection — The graph Laplacian discretizes the Laplace operator, inheriting its spectral properties and geometric meaning.
•Eigenvectors Encode Clusters — Low eigenvectors are approximately constant within clusters, making clustering in eigenvector space effective.

What's Next:

With the Laplacian understood, we next examine the eigenstructure in detail: how eigenvalues and eigenvectors together reveal graph and cluster properties. We'll prove the key theorems connecting spectra to cuts and develop intuition for the spectral embedding that makes clustering work.

Page Complete

You now understand the graph Laplacian matrix—its construction, properties, variants, and connection to continuous mathematics. This matrix is the computational engine of spectral clustering: its eigenvectors encode the cluster structure we seek. Next, we explore how to interpret and use this eigenstructure for clustering.

2 / 5

Loading learning content...

Machine LearningSpectral Clustering

Spectral Clustering

LevelAdvanced

Duration90 mins

TopicSpectral Clustering

2 / 5

Laplacian Matrices

The Mathematical Heart of Spectral Clustering

What You Will Learn

The Unnormalized Graph Laplacian

Given an undirected weighted graph $G = (V, E, W)$ with $n$ vertices, we define three fundamental matrices:

Adjacency/Weight Matrix $W$: $$W_{ij} = \begin{cases} w_{ij} & \text{if edge } (i,j) \in E \ 0 & \text{otherwise} \end{cases}$$

For an undirected graph, $W = W^T$ (symmetric). Weights $w_{ij} \geq 0$ encode similarity.

Degree Matrix $D$: $$D = \text{diag}(d_1, d_2, \ldots, d_n)$$

where $d_i = \sum_{j=1}^{n} W_{ij}$ is the degree of vertex $i$—the total weight of edges incident to $i$.

Unnormalized Graph Laplacian $L$: $$L = D - W$$

This deceptively simple definition encodes profound structure. Let's examine what this matrix looks like and why it matters.

Matrix Structure:

The Laplacian has a specific pattern:

$$L_{ij} = \begin{cases} d_i & \text{if } i = j \ -W_{ij} & \text{if } i \neq j \text{ and } (i,j) \in E \ 0 & \text{otherwise} \end{cases}$$

Diagonal entries: The degree of each vertex (positive)
Off-diagonal entries: Negative of edge weights (non-positive)
Row sums: Each row sums to exactly zero: $\sum_j L_{ij} = d_i - \sum_j W_{ij} = 0$

Example: Simple Graph with 4 Nodes:

Consider an unweighted path graph: 1—2—3—4

$$L = D - W = \begin{pmatrix} 1 & -1 & 0 & 0 \ -1 & 2 & -1 & 0 \ 0 & -1 & 2 & -1 \ 0 & 0 & -1 & 1 \end{pmatrix}$$

Notice: rows sum to zero, matrix is symmetric, diagonal entries are degrees.

laplacian_construction.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
import numpy as np
from scipy.sparse import diags, csr_matrix
from scipy.sparse.linalg import eigsh
 
class GraphLaplacian:
                                    """
    Comprehensive implementation of graph Laplacian matrices and their variants.
    
    The graph Laplacian is the cornerstone of spectral graph theory and spectral
    clustering. This class implements multiple variants and demonstrates their
    properties through eigendecomposition.
    """
    
    def __init__(self, W):
        """
        Initialize with a symmetric weight matrix W.
        
        Parameters:
        -----------
        W : ndarray, shape (n, n)
            Symmetric weight matrix. W[i,j] >= 0 represents the edge weight
            between nodes i and j. Self-loops (diagonal) should typically be 0.
        """
        self.W = np.asarray(W, dtype=np.float64)
        self.n = W.shape[0]
        
        # Verify symmetry
        if not np.allclose(self.W, self.W.T):
            raise ValueError("Weight matrix must be symmetric")
        
        # Compute degrees
        self.d = np.sum(self.W, axis=1)  # Row sums = degrees
        self.D = np.diag(self.d)          # Degree matrix
        
    def unnormalized(self):
        """
        Compute the unnormalized graph Laplacian: L = D - W
        
        Properties:
        -----------
        - Symmetric, positive semi-definite
        - Smallest eigenvalue is 0 with eigenvector = [1, 1, ..., 1]^T
        - Number of zero eigenvalues = number of connected components
        - For vector f: f^T L f = (1/2) * sum_{i,j} W_{ij}(f_i - f_j)^2
        
        Returns:
        --------
        L : ndarray, shape (n, n)
            Unnormalized Laplacian matrix
        """
        L = self.D - self.W
        return L
    
    def verify_quadratic_form(self, f=None):
        """
        Verify the Laplacian quadratic form identity:
        
        f^T L f = (1/2) * sum_{i,j} W_{ij} * (f_i - f_j)^2
        
        This identity shows L measures "smoothness" of f over the graph.
        A function f is smooth if it varies slowly along edges.
        """
        if f is None:
            f = np.random.randn(self.n)  # Random test vector
        
        L = self.unnormalized()
        
        # Left side: quadratic form
        left_side = f @ L @ f
        
        # Right side: sum of squared differences
        right_side = 0.0
        for i in range(self.n):
            for j in range(self.n):
                right_side += self.W[i, j] * (f[i] - f[j])**2
        right_side *= 0.5
        
        return {
            'quadratic_form': left_side,
            'sum_of_differences': right_side,
            'are_equal': np.isclose(left_side, right_side)
        }
    
    def normalized_symmetric(self):
        """
        Compute the symmetric normalized Laplacian: L_sym = D^{-1/2} L D^{-1/2}
        
        Also written as: L_sym = I - D^{-1/2} W D^{-1/2}
        
        Properties:
        -----------
        - Symmetric, positive semi-definite
        - Eigenvalues in [0, 2]
        - Related to random walk normalized Laplacian
        - Better for graphs with varying degrees
        
        Returns:
        --------
        L_sym : ndarray, shape (n, n)
            Symmetric normalized Laplacian
        """
        # Handle zero degrees carefully
        d_inv_sqrt = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv_sqrt[nonzero] = 1.0 / np.sqrt(self.d[nonzero])
        
        D_inv_sqrt = np.diag(d_inv_sqrt)
        
        # L_sym = I - D^{-1/2} W D^{-1/2}
        L_sym = np.eye(self.n) - D_inv_sqrt @ self.W @ D_inv_sqrt
        
        return L_sym
    
    def normalized_random_walk(self):
        """
        Compute the random walk normalized Laplacian: L_rw = D^{-1} L = I - D^{-1} W
        
        Properties:
        -----------
        - NOT symmetric (unless graph is regular)
        - Related to random walk transition matrix P = D^{-1} W
        - L_rw = I - P
        - Same eigenvalues as L_sym (similar matrices)
        - Eigenvectors related by: v_{L_sym} = D^{1/2} v_{L_rw}
        
        Returns:
        --------
        L_rw : ndarray, shape (n, n)
            Random walk normalized Laplacian
        """
        # Handle zero degrees
        d_inv = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv[nonzero] = 1.0 / self.d[nonzero]
        
        D_inv = np.diag(d_inv)
        
        L_rw = np.eye(self.n) - D_inv @ self.W
        
        return L_rw
    
    def transition_matrix(self):
        """
        Compute the random walk transition matrix: P = D^{-1} W
        
        P[i,j] = probability of walking from i to j
               = W[i,j] / d_i
        
        Returns:
        --------
        P : ndarray, shape (n, n)
            Row-stochastic transition matrix
        """
        d_inv = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv[nonzero] = 1.0 / self.d[nonzero]
        
        P = np.diag(d_inv) @ self.W
        return P
    
    def eigendecomposition(self, which='unnormalized', k=None):
        """
        Compute eigendecomposition of a Laplacian variant.
        
        Parameters:
        -----------
        which : str
            'unnormalized', 'symmetric', or 'random_walk'
        k : int or None
            Number of smallest eigenvalues/vectors to compute.
            If None, compute all.
        
        Returns:
        --------
        eigenvalues : ndarray
            Sorted eigenvalues (ascending)
        eigenvectors : ndarray
            Corresponding eigenvectors as columns
        """
        if which == 'unnormalized':
            L = self.unnormalized()
        elif which == 'symmetric':
            L = self.normalized_symmetric()
        elif which == 'random_walk':
            L = self.normalized_random_walk()
        else:
            raise ValueError(f"Unknown Laplacian type: {which}")
        
        if k is None:
            from scipy.linalg import eigh
            eigenvalues, eigenvectors = eigh(L)
        else:
            # For sparse matrices or when only few eigenvalues needed
            eigenvalues, eigenvectors = eigsh(
                csr_matrix(L), k=k, which='SM'  # Smallest Magnitude
            )
            # Sort by eigenvalue
            idx = np.argsort(eigenvalues)
            eigenvalues = eigenvalues[idx]
            eigenvectors = eigenvectors[:, idx]
        
        return eigenvalues, eigenvectors
 
 
def demonstrate_laplacian_properties():
    """
    Demonstrate fundamental properties of the graph Laplacian.
    """
    # Create a simple graph: two connected components (two triangles)
    # Triangle 1: nodes 0-1-2
    # Triangle 2: nodes 3-4-5
    W = np.zeros((6, 6))
    # Triangle 1
    W[0, 1] = W[1, 0] = 1
    W[0, 2] = W[2, 0] = 1
    W[1, 2] = W[2, 1] = 1
    # Triangle 2
    W[3, 4] = W[4, 3] = 1
    W[3, 5] = W[5, 3] = 1
    W[4, 5] = W[5, 4] = 1
    
    gl = GraphLaplacian(W)
    
    print("Graph Laplacian Properties Demonstration")
    print("=" * 55)
    print("\nGraph: Two disconnected triangles (6 nodes)")
    print("\nWeight matrix W:")
    print(W)
    
    L = gl.unnormalized()
    print("\nUnnormalized Laplacian L = D - W:")
    print(L)
    
    # Verify row sums are zero
    print(f"\nRow sums: {L.sum(axis=1)}")
    
    # Eigendecomposition
    eigenvalues, eigenvectors = gl.eigendecomposition('unnormalized')
    print(f"\nEigenvalues: {eigenvalues}")
    print("\nNote: Two zero eigenvalues -> Two connected components!")
    
    # Eigenvectors corresponding to zero eigenvalue
    print("\nEigenvectors for λ=0 (span the nullspace):")
    print(eigenvectors[:, :2].round(3))
    print("Note: Each eigenvector is constant within a connected component")
    
    # Verify quadratic form
    result = gl.verify_quadratic_form()
    print(f"\nQuadratic form identity verified: {result['are_equal']}")
    
    # Add a weak connection between components
    print("\n" + "=" * 55)
    print("Adding weak edge between triangles (W[2,3] = 0.1)...")
    W[2, 3] = W[3, 2] = 0.1
    
    gl2 = GraphLaplacian(W)
    eigenvalues2, eigenvectors2 = gl2.eigendecomposition('unnormalized')
    print(f"\nNew eigenvalues: {eigenvalues2.round(4)}")
    print("Note: Now only ONE zero eigenvalue (graph is connected)")
    print(f"Second eigenvalue (algebraic connectivity): {eigenvalues2[1]:.6f}")
    print("This small value indicates weak connectivity between components")
    
    return gl, gl2
 
 
if __name__ == "__main__":
    demonstrate_laplacian_properties()

Fundamental Properties of the Laplacian

The graph Laplacian possesses several remarkable properties that make it central to spectral graph theory. Understanding these properties is essential for understanding why spectral clustering works.

Property 1: Symmetry and Positive Semi-Definiteness

The unnormalized Laplacian $L = D - W$ is:

Symmetric: $L = L^T$ (since both $D$ and $W$ are symmetric)
Positive semi-definite: All eigenvalues $\lambda_i \geq 0$

Positive semi-definiteness follows from the quadratic form (Property 2).

Property 2: The Quadratic Form Identity

For any vector $f \in \mathbb{R}^n$:

$$f^T L f = \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} W_{ij}(f_i - f_j)^2$$

This identity is profound. The left side is a quadratic form—a standard object in linear algebra. The right side measures how much $f$ varies across graph edges, weighted by edge strength.

Proof of the Quadratic Form Identity

We derive this crucial identity. Start with $f^T L f = f^T (D - W) f = f^T D f - f^T W f$.

The first term: $f^T D f = \sum_i d_i f_i^2 = \sum_i f_i^2 \sum_j W_{ij} = \sum_{i,j} W_{ij} f_i^2$.

The second term: $f^T W f = \sum_{i,j} W_{ij} f_i f_j$.

Thus: $f^T L f = \sum_{i,j} W_{ij} f_i^2 - \sum_{i,j} W_{ij} f_i f_j = \sum_{i,j} W_{ij} (f_i^2 - f_i f_j)$.

By symmetry of $W$, this equals $(1/2) \sum_{i,j} W_{ij} [(f_i^2 - f_i f_j) + (f_j^2 - f_j f_i)] = (1/2) \sum_{i,j} W_{ij} (f_i - f_j)^2$. QED.

Property 3: Zero Eigenvalue and the Constant Eigenvector

The vector $\mathbf{1} = (1, 1, \ldots, 1)^T$ is always an eigenvector of $L$ with eigenvalue 0:

$$L \mathbf{1} = (D - W) \mathbf{1} = D\mathbf{1} - W\mathbf{1}$$

$D\mathbf{1} = (d_1, d_2, \ldots, d_n)^T$ and $W\mathbf{1} = (\sum_j W_{1j}, \ldots)^T = (d_1, \ldots, d_n)^T$

Thus $L\mathbf{1} = \mathbf{0}$.

Intuition: A constant function on the graph has zero "variation" across edges, so the quadratic form is zero. This corresponds to eigenvalue 0.

Property 4: Connected Components and Zero Eigenvalue Multiplicity

Theorem: The multiplicity of eigenvalue 0 equals the number of connected components in the graph.

If the graph has $k$ connected components, then $\lambda_1 = \cdots = \lambda_k = 0$
The corresponding eigenvectors are indicator vectors constant on each component

This is fundamental: eigenvalue 0 detects disconnection. If we see two zero eigenvalues, we know the graph has (at least) two components, each of which is already a natural cluster.

Eigenvalue Interpretation for Clustering
Zero Eigenvalues	Graph Structure	Clustering Implication
1	Graph is connected	Look at higher eigenvectors for structure
k (k > 1)	Graph has k connected components	Components are automatic clusters
Near-zero eigenvalues	Weak connections between subgraphs	Natural cluster structure exists

Property 5: The Algebraic Connectivity (Fiedler Value)

The second smallest eigenvalue $\lambda_2$ is called the algebraic connectivity or Fiedler value. For a connected graph:

$$\lambda_2 > 0$$

$\lambda_2$ measures how well-connected the graph is:

Large $\lambda_2$: Graph is well-connected, hard to partition
Small $\lambda_2$: Graph has a bottleneck, easy to partition

The corresponding eigenvector $v_2$ (Fiedler vector) provides the optimal partition direction for RatioCut.

The Cheeger Inequality relates algebraic connectivity to the minimum cut:

$$\frac{h^2}{2} \leq \lambda_2 \leq 2h$$

where $h$ is the Cheeger constant (minimum normalized cut). This inequality guarantees that spectral methods find cuts within a constant factor of optimal.

Normalized Laplacian Variants

Symmetric Normalized Laplacian $L_{sym}$:

$$L_{sym} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} W D^{-1/2}$$

where $D^{-1/2} = \text{diag}(d_1^{-1/2}, \ldots, d_n^{-1/2})$.

Properties of $L_{sym}$:

Symmetric (suitable for standard eigendecomposition)
Eigenvalues in $[0, 2]$
Zero eigenvalue with eigenvector $D^{1/2}\mathbf{1}$
Connected to the normalized cut objective

Random Walk Laplacian $L_{rw}$:

$$L_{rw} = D^{-1} L = I - D^{-1} W = I - P$$

where $P = D^{-1}W$ is the random walk transition matrix (row-stochastic).

Properties of $L_{rw}$:

Generally not symmetric (unless graph is regular)
Same eigenvalues as $L_{sym}$ (they are similar matrices)
Eigenvectors related by $v_{L_{sym}} = D^{1/2} v_{L_{rw}}$
Directly interprets clustering as random walk analysis

Relationship Between Normalized Laplacians:

If $(\lambda, v)$ is an eigenpair of $L_{rw}$, then $(\lambda, D^{1/2}v)$ is an eigenpair of $L_{sym}$.

Conversely, if $(\lambda, u)$ is an eigenpair of $L_{sym}$, then $(\lambda, D^{-1/2}u)$ is an eigenpair of $L_{rw}$.

This relationship means we can work with the symmetric $L_{sym}$ (better numerical properties) and transform eigenvectors when needed.

When to Use Which Laplacian:

Choosing the Right Laplacian
Laplacian	Use Case	Advantages	Disadvantages
Unnormalized L	Uniform degree graphs; RatioCut objective	Simple; direct interpretation	Biased toward high-degree nodes
Symmetric L_sym	General graphs; NCut objective	Symmetric; normalized by degree	Requires degree normalization of results
Random Walk L_rw	Random walk interpretation; NCut	Direct cluster indicator interpretation	Not symmetric (complicates computation)

laplacian_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
import numpy as np
from scipy.linalg import eigh
from sklearn.datasets import make_moons
from sklearn.neighbors import kneighbors_graph
from sklearn.cluster import KMeans
 
def compare_laplacians_for_clustering(X, n_clusters=2, n_neighbors=10):
    """
    Compare spectral clustering using different Laplacian variants.
    
    This demonstration shows how the choice of Laplacian affects
    the resulting clustering, particularly for graphs with
    varying node degrees.
    
    Parameters:
    -----------
    X : ndarray, shape (n_samples, n_features)
        Data points to cluster
    n_clusters : int
        Number of clusters
    n_neighbors : int
        Number of neighbors for KNN graph construction
    
    Returns:
    --------
    results : dict
        Clustering results for each Laplacian type
    """
    n = X.shape[0]
    
    # Construct KNN graph
    connectivity = kneighbors_graph(X, n_neighbors, mode='connectivity')
    W = 0.5 * (connectivity + connectivity.T).toarray()
    
    # Compute degree and degree matrix
    d = np.sum(W, axis=1)
    D = np.diag(d)
    
    results = {}
    
    # 1. Unnormalized Laplacian
    L = D - W
    eigenvalues_L, eigenvectors_L = eigh(L)
    embedding_L = eigenvectors_L[:, :n_clusters]
    # Cluster in embedding space
    kmeans_L = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels_L = kmeans_L.fit_predict(embedding_L)
    results['unnormalized'] = {
        'labels': labels_L,
        'eigenvalues': eigenvalues_L[:n_clusters+2],
        'embedding': embedding_L
    }
    
    # 2. Symmetric Normalized Laplacian
    d_inv_sqrt = np.zeros(n)
    d_inv_sqrt[d > 0] = 1.0 / np.sqrt(d[d > 0])
    D_inv_sqrt = np.diag(d_inv_sqrt)
    
    L_sym = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt
    eigenvalues_sym, eigenvectors_sym = eigh(L_sym)
    embedding_sym = eigenvectors_sym[:, :n_clusters]
    # Row-normalize embedding (important for normalized spectral clustering!)
    row_norms = np.linalg.norm(embedding_sym, axis=1, keepdims=True)
    row_norms[row_norms == 0] = 1
    embedding_sym_normalized = embedding_sym / row_norms
    
    kmeans_sym = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels_sym = kmeans_sym.fit_predict(embedding_sym_normalized)
    results['symmetric'] = {
        'labels': labels_sym,
        'eigenvalues': eigenvalues_sym[:n_clusters+2],
        'embedding': embedding_sym_normalized
    }
    
    # 3. Random Walk Laplacian (use symmetric for computation, transform)
    # For clustering, we use the same eigenvectors (they have same eigenvalues)
    # The difference is in interpretation and possible row normalization
    L_rw = np.eye(n) - np.diag(1.0/np.maximum(d, 1e-10)) @ W
    eigenvalues_rw, eigenvectors_rw = eigh(
        L_sym  # Use symmetric for computation (same eigenvalues)
    )
    # Transform eigenvectors: v_rw = D^{-1/2} v_sym
    eigenvectors_rw = D_inv_sqrt @ eigenvectors_sym
    embedding_rw = eigenvectors_rw[:, :n_clusters]
    
    kmeans_rw = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    labels_rw = kmeans_rw.fit_predict(embedding_rw)
    results['random_walk'] = {
        'labels': labels_rw,
        'eigenvalues': eigenvalues_rw[:n_clusters+2],
        'embedding': embedding_rw
    }
    
    return results, W, d
 
 
def analyze_degree_effect():
    """
    Analyze how node degree variation affects different Laplacians.
    """
    # Create two moons with varying noise (creates varying local density)
    X, y_true = make_moons(n_samples=200, noise=0.08, random_state=42)
    
    results, W, degrees = compare_laplacians_for_clustering(X)
    
    print("Spectral Clustering: Laplacian Comparison")
    print("=" * 55)
    print(f"\nDegree statistics: min={degrees.min():.1f}, max={degrees.max():.1f}, "
          f"mean={degrees.mean():.1f}")
    
    for name, res in results.items():
        acc1 = np.mean(res['labels'] == y_true)
        acc2 = np.mean(res['labels'] != y_true)
        accuracy = max(acc1, acc2)
        print(f"\n{name.upper()} Laplacian:")
        print(f"  Eigenvalues: {res['eigenvalues'][:4].round(4)}")
        print(f"  Accuracy: {accuracy:.1%}")
        
    print("\n" + "=" * 55)
    print("Note: For two moons with uniform density, results are similar.")
    print("Differences emerge with varying density or graph degree distribution.")
    
    return results, X, y_true
 
 
if __name__ == "__main__":
    analyze_degree_effect()

Practical Recommendation

Connection to the Continuous Laplace Operator

The graph Laplacian is not merely a convenient construction—it is the discrete analog of the Laplace operator (or Laplacian) from calculus, which is fundamental in physics and mathematics.

The Continuous Laplacian:

For a function $f: \mathbb{R}^d \to \mathbb{R}$, the Laplace operator is:

$$\Delta f = \nabla^2 f = \sum_{i=1}^{d} \frac{\partial^2 f}{\partial x_i^2}$$

This operator appears in:

Heat equation: $\frac{\partial u}{\partial t} = \Delta u$ (heat diffuses proportional to Laplacian)
Wave equation: $\frac{\partial^2 u}{\partial t^2} = \Delta u$
Poisson equation: $\Delta u = f$
Laplace equation: $\Delta u = 0$ (harmonic functions)

Discretization on a Grid:

Consider a 1D grid with spacing $h$. The second derivative is approximated by:

$$f''(x) \approx \frac{f(x+h) - 2f(x) + f(x-h)}{h^2}$$

In matrix form for $n$ grid points with periodic boundary:

$$L_{grid} = \frac{1}{h^2} \begin{pmatrix} 2 & -1 & 0 & \cdots & -1 \ -1 & 2 & -1 & \cdots & 0 \ \vdots & & \ddots & & \vdots \ -1 & 0 & \cdots & -1 & 2 \end{pmatrix}$$

This is exactly the graph Laplacian for a cycle graph (up to the $1/h^2$ scaling)!

Why This Connection Matters:

The continuous Laplacian has well-studied spectral properties:

On a circle (1D torus): Eigenfunctions are sines and cosines of increasing frequency. Lower eigenfunctions vary slowly (smooth); higher eigenfunctions vary rapidly (oscillatory).
On a rectangle with Dirichlet boundary: Eigenfunctions are products of sines, with eigenvalues depending on spatial frequencies.
On general manifolds: Eigenfunctions encode geometric structure. The first nontrivial eigenfunction often separates the manifold into two "halves."

The graph Laplacian inherits these properties in discrete form:

Low eigenvectors: Vary slowly across the graph (respect connectivity)
High eigenvectors: Vary rapidly, capturing fine-grained structure
Spectral gap: Measures connectivity (analogous to manifold geometry)

Manifold Learning Connection:

Heat Diffusion Interpretation

How Laplacian Eigenvectors Encode Cluster Structure

We now connect the Laplacian's spectral properties directly to the clustering problem. This section explains why eigenvectors of $L$ provide cluster information.

The Rayleigh Quotient Perspective:

For the unnormalized Laplacian, the Rayleigh quotient of a vector $f$ is:

$$R(f) = \frac{f^T L f}{f^T f} = \frac{\sum_{i,j} W_{ij}(f_i - f_j)^2 / 2}{\sum_i f_i^2}$$

Minimizing $R(f)$ finds vectors that vary minimally across graph edges. By the Rayleigh-Ritz theorem:

The minimum of $R(f)$ is $\lambda_1 = 0$, achieved by $f = \mathbf{1}$
The minimum over vectors orthogonal to $\mathbf{1}$ is $\lambda_2$, achieved by the Fiedler vector
The $k$-th smallest eigenvalue is the minimum over vectors orthogonal to the first $k-1$ eigenvectors

Why Eigenvectors Reveal Clusters:

If the graph has $k$ well-separated clusters (clusters connected internally but weakly between each other), then:

There are $k$ eigenvalues close to zero
The corresponding eigenvectors are approximately constant within each cluster
Points in the same cluster have similar values across these eigenvectors
Clustering in the space of these eigenvectors separates the original clusters

The Ideal Case: Disconnected Clusters

Consider a graph with exactly $k$ disconnected components. The Laplacian is block-diagonal:

$$L = \begin{pmatrix} L_1 & 0 & \cdots & 0 \ 0 & L_2 & \cdots & 0 \ \vdots & & \ddots & \vdots \ 0 & 0 & \cdots & L_k \end{pmatrix}$$

Each block $L_i$ is the Laplacian of component $i$, each having its own zero eigenvalue with constant eigenvector on that component.

The first $k$ eigenvectors of $L$ are indicator vectors: $$v_j^{(i)} = \begin{cases} 1 & \text{if node } j \text{ is in component } i \ 0 & \text{otherwise} \end{cases}$$

(scaled appropriately for orthonormality)

Clustering in the space of these $k$ eigenvectors is trivial—each cluster occupies a distinct corner of the $k$-dimensional cube.

The Near-Ideal Case: Weakly Connected Clusters

In practice, clusters are connected by weak edges. Perturbation theory tells us:

Eigenvalues perturb continuously from zero to small positive values
Eigenvectors smoothly deform from perfect indicators
Points within a cluster remain close in eigenvector space

This is why spectral clustering uses K-Means on eigenvectors: even with perturbation, cluster structure is preserved in the embedding.

eigenvector_cluster_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
import numpy as np
from scipy.linalg import eigh
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt
 
def analyze_eigenvector_clustering():
    """
    Demonstrate how Laplacian eigenvectors encode cluster structure.
    
    We construct a graph with clear cluster structure and analyze
    how the eigenvectors reflect this structure.
    """
    np.random.seed(42)
    
    # Create 3 well-separated Gaussian clusters
    n_per_cluster = 50
    
    cluster1 = np.random.randn(n_per_cluster, 2) * 0.5 + [0, 0]
    cluster2 = np.random.randn(n_per_cluster, 2) * 0.5 + [5, 0]
    cluster3 = np.random.randn(n_per_cluster, 2) * 0.5 + [2.5, 4]
    
    X = np.vstack([cluster1, cluster2, cluster3])
    true_labels = np.array([0]*n_per_cluster + [1]*n_per_cluster + [2]*n_per_cluster)
    n = X.shape[0]
    
    # Construct Gaussian similarity graph
    distances = cdist(X, X)
    sigma = 1.0  # Controls connectivity
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    # Compute unnormalized Laplacian
    d = np.sum(W, axis=1)
    D = np.diag(d)
    L = D - W
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(L)
    
    print("Eigenvector Cluster Analysis")
    print("=" * 55)
    print(f"\nNumber of points: {n} ({n_per_cluster} per cluster)")
    print(f"\nFirst 6 eigenvalues: {eigenvalues[:6].round(4)}")
    print("\nNote: ~3 near-zero eigenvalues suggest 3 clusters")
    print(f"Spectral gap (λ₄ - λ₃): {eigenvalues[3] - eigenvalues[2]:.4f}")
    
    # Analyze eigenvector values by cluster
    print("\nEigenvector 2 (Fiedler) statistics by cluster:")
    v2 = eigenvectors[:, 1]
    for c in range(3):
        mask = true_labels == c
        print(f"  Cluster {c}: mean={v2[mask].mean():.3f}, std={v2[mask].std():.3f}")
    
    print("\nEigenvector 3 statistics by cluster:")
    v3 = eigenvectors[:, 2]
    for c in range(3):
        mask = true_labels == c
        print(f"  Cluster {c}: mean={v3[mask].mean():.3f}, std={v3[mask].std():.3f}")
    
    # Spectral embedding
    k = 3
    embedding = eigenvectors[:, 1:k+1]  # Skip first (constant) eigenvector
    
    # Row normalize (for normalized spectral clustering)
    row_norms = np.linalg.norm(embedding, axis=1, keepdims=True)
    row_norms[row_norms == 0] = 1
    embedding_normalized = embedding / row_norms
    
    print("\nSpectral Embedding Analysis:")
    print("After embedding in eigenvector space, clusters are linearly separable:")
    
    # Simple analysis: within-cluster vs between-cluster distances
    within_dists = []
    between_dists = []
    
    for i in range(n):
        for j in range(i+1, n):
            dist = np.linalg.norm(embedding_normalized[i] - embedding_normalized[j])
            if true_labels[i] == true_labels[j]:
                within_dists.append(dist)
            else:
                between_dists.append(dist)
    
    print(f"  Within-cluster mean distance: {np.mean(within_dists):.3f}")
    print(f"  Between-cluster mean distance: {np.mean(between_dists):.3f}")
    print(f"  Separation ratio: {np.mean(between_dists)/np.mean(within_dists):.2f}x")
    
    return X, eigenvalues, eigenvectors, true_labels
 
 
def visualize_spectral_gap():
    """
    Visualize how the spectral gap indicates cluster quality.
    
    We vary the separation between clusters and observe the
    eigenvalue spectrum.
    """
    print("\n" + "=" * 55)
    print("Spectral Gap Analysis: Varying Cluster Separation")
    print("=" * 55)
    
    separations = [1, 2, 4, 8]  # Distance between cluster centers
    
    for sep in separations:
        np.random.seed(42)
        n = 60
        
        # Two clusters with varying separation
        c1 = np.random.randn(n//2, 2) * 0.5
        c2 = np.random.randn(n//2, 2) * 0.5 + [sep, 0]
        X = np.vstack([c1, c2])
        
        # Build graph
        distances = cdist(X, X)
        sigma = 1.0
        W = np.exp(-distances**2 / (2 * sigma**2))
        np.fill_diagonal(W, 0)
        
        d = np.sum(W, axis=1)
        L = np.diag(d) - W
        
        eigenvalues = np.linalg.eigvalsh(L)
        gap = eigenvalues[2] - eigenvalues[1]
        
        print(f"\nSeparation = {sep}:")
        print(f"  λ₁={eigenvalues[0]:.4f}, λ₂={eigenvalues[1]:.4f}, λ₃={eigenvalues[2]:.4f}")
        print(f"  Spectral gap (λ₃-λ₂) = {gap:.4f}")
        print(f"  Interpretation: {'Clear clusters' if gap > 0.5 else 'Weak separation'}")
 
 
if __name__ == "__main__":
    analyze_eigenvector_clustering()
    visualize_spectral_gap()

The Spectral Gap Heuristic

Computational Considerations

Working with graph Laplacians at scale requires understanding their computational properties.

Matrix Properties:

Size: $n \times n$ where $n$ is the number of data points
Symmetry: All three Laplacian variants are symmetric (exploitable by specialized algorithms)
Sparsity: For sparse graphs (KNN, ε-neighborhood), the Laplacian is sparse with $O(n)$ to $O(kn)$ nonzeros
Positive semi-definite: Enables specialized solvers

Eigendecomposition Complexity:

Full eigendecomposition: $O(n^3)$ for dense matrices
For sparse matrices, iterative methods (Lanczos, ARPACK): $O(n k^2)$ to $O(n k \cdot \text{nnz})$ for $k$ eigenvalues
We only need the $k$ smallest eigenvalues, so iterative methods dominate

Memory Considerations:

Dense Laplacian: $O(n^2)$ memory—prohibitive for large $n$
Sparse Laplacian: $O(\text{nnz})$ memory—feasible for large $n$ with sparse graphs
Eigenvectors: $O(nk)$ memory for $k$ eigenvectors

Practical Limit: Direct spectral clustering is typically feasible for $n \lesssim 10^4-10^5$ with sparse graphs. Beyond this, approximations (Nyström, landmark methods) are necessary.

Computational Complexity Summary
Operation	Dense Graph	Sparse Graph (KNN with k neighbors)
Graph construction	O(n²d) for n points in d dimensions	O(n log n · d) with KD-tree
Laplacian storage	O(n²)	O(nk)
Full eigendecomp	O(n³)	Not applicable
Top-k eigendecomp	O(n²k)	O(nk² · iterations)
K-Means on embedding	O(n · k² · iters)	Same

Numerical Stability

Summary: Laplacian Matrices

We have developed a deep understanding of the graph Laplacian matrix—the mathematical heart of spectral clustering. Let's consolidate the key insights:

Key Takeaways

•Laplacian Definition — $L = D - W$ where $D$ is the degree matrix and $W$ is the weight matrix. Simple construction, profound properties.
•Quadratic Form — $f^T L f = (1/2) \sum_{i,j} W_{ij}(f_i - f_j)^2$ measures how much $f$ varies across edges—smoothness over the graph.
•Zero Eigenvalue Multiplicity — The number of zero eigenvalues equals the number of connected components. Near-zero eigenvalues indicate near-disconnection.
•Normalized Variants — $L_{sym}$ and $L_{rw}$ normalize by degree, handling varying node connectivity. $L_{sym}$ is preferred for computation (symmetric).
•Continuous Connection — The graph Laplacian discretizes the Laplace operator, inheriting its spectral properties and geometric meaning.
•Eigenvectors Encode Clusters — Low eigenvectors are approximately constant within clusters, making clustering in eigenvector space effective.

What's Next:

Page Complete

2 / 5

Spectral Clustering

Laplacian Matrices

Zero Eigenvalues

Spectral Clustering

Laplacian Matrices

Zero Eigenvalues