Loading learning content...
Traditional clustering algorithms like K-Means operate in a geometric paradigm: they assume clusters are compact, convex regions in feature space that can be characterized by centroids or prototypes. This assumption works beautifully for well-separated, spherical clusters—but catastrophically fails when clusters have complex shapes, manifold structures, or non-convex boundaries.
Spectral clustering represents a fundamental paradigm shift. Instead of treating data points as coordinates in Euclidean space, we view them as nodes in a graph where edges encode pairwise similarities. Clustering then becomes graph partitioning: finding cuts that separate the graph into coherent subgraphs while minimizing the connections severed.
This reformulation unlocks remarkable capabilities. Spectral methods can discover clusters that wrap around each other, separate interlocking rings, and detect communities in complex networks—all tasks where geometric methods fail completely.
By the end of this page, you will understand: (1) Why clustering can be reformulated as a graph partitioning problem, (2) How similarity graphs encode the structure of data, (3) The different types of similarity graphs and their construction, (4) How graph cuts provide a mathematical objective for clustering, and (5) The intuition behind why graph-based methods succeed where geometric methods fail.
To understand why spectral clustering exists, we must first confront the fundamental limitations of geometric clustering approaches. Consider the K-Means objective:
$$\min_{\mu_1, \ldots, \mu_k} \sum_{i=1}^{n} \min_{j} |x_i - \mu_j|^2$$
This objective minimizes the sum of squared distances from each point to its nearest centroid. The implicit assumption is devastating: clusters are Voronoi cells—convex polytopes defined by proximity to centroids. Any cluster that cannot be expressed as a Voronoi cell is fundamentally invisible to K-Means.
The Two Moons Problem illustrates this limitation dramatically. Two crescent-shaped clusters interlock like parentheses. Geometrically, points from different clusters may be closer to each other than points within the same cluster. K-Means fails spectacularly, typically splitting each moon in half rather than separating them.
The insight behind spectral clustering is profound: what matters for clustering is not absolute position, but connectivity patterns. If we construct a graph where similar points are connected, the cluster structure becomes visible as graph communities—densely connected subgraphs with sparse connections between them.
| Aspect | Geometric Paradigm (K-Means) | Graph Paradigm (Spectral) |
|---|---|---|
| Data representation | Points in ℝᵈ | Nodes in a graph G = (V, E) |
| Similarity concept | Distance to centroids | Edge weights between nodes |
| Cluster assumption | Convex, compact regions | Densely connected subgraphs |
| Objective function | Minimize within-cluster variance | Minimize cut weight between clusters |
| Handles non-convex shapes | No | Yes |
| Computational approach | Iterative centroid updates | Eigendecomposition of Laplacian |
| Parameter sensitivity | Sensitive to initialization | Sensitive to similarity metric |
The graph perspective reveals hidden structure. When we translate the two moons problem into a graph where each point connects to its nearest neighbors, a clear picture emerges: points within each moon form densely connected chains, while only a few edges bridge between moons. The "correct" clustering is now obvious: cut those few bridge edges.
This intuition generalizes powerfully. Any clustering problem can be reframed as: Given a similarity graph, find a partition that keeps similar points together while separating dissimilar ones. The mathematical machinery of spectral graph theory then provides elegant tools to find optimal partitions.
In spectral clustering, we never directly cluster in the original feature space. Instead, we: (1) Build a similarity graph, (2) This graph encodes local neighborhood relationships, (3) Analyze the graph's spectral properties (eigenvalues/eigenvectors), (4) Cluster in the spectral embedding space. The spectral embedding "unfolds" complex structures into spaces where simple methods like K-Means work perfectly.
The first step in spectral clustering is constructing a similarity graph $G = (V, E, W)$ where:
The choice of how to construct this graph profoundly influences clustering results. Three main approaches dominate practice, each with distinct characteristics and trade-offs.
Understanding the Gaussian Kernel Weight Function:
The Gaussian (RBF) similarity is the most widely used:
$$W_{ij} = \exp\left(-\frac{|x_i - x_j|^2}{2\sigma^2}\right)$$
This function has elegant properties:
The parameter σ acts as a scale parameter. When $|x_i - x_j| \ll \sigma$, the weight is close to 1 (highly similar). When $|x_i - x_j| \gg \sigma$, the weight approaches 0 (effectively disconnected). A useful heuristic is to set σ based on the distribution of pairwise distances—often the median or a percentile of the distance distribution.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195
import numpy as npfrom scipy.spatial.distance import cdistfrom scipy.sparse import lil_matrixfrom sklearn.neighbors import NearestNeighbors class SimilarityGraph: """ Construct different types of similarity graphs for spectral clustering. This class implements three fundamental graph construction strategies, each encoding different notions of local neighborhood structure. """ def __init__(self, X): """ Initialize with data matrix X of shape (n_samples, n_features). """ self.X = X self.n = X.shape[0] self.distances = None # Cached pairwise distances def _compute_distances(self): """Compute and cache pairwise Euclidean distances.""" if self.distances is None: self.distances = cdist(self.X, self.X, metric='euclidean') return self.distances def epsilon_neighborhood(self, epsilon): """ Construct ε-neighborhood graph. Connect points i and j if ||x_i - x_j|| < epsilon. Returns an unweighted (binary) adjacency matrix. Properties: - Simple and intuitive - May create disconnected components if epsilon too small - All neighborhoods have same radius (may not suit varying density) """ D = self._compute_distances() W = (D < epsilon).astype(float) np.fill_diagonal(W, 0) # No self-loops return W def knn_graph(self, k, mode='connectivity', mutual=False): """ Construct K-nearest neighbor graph. Parameters: ----------- k : int Number of nearest neighbors mode : str 'connectivity' for binary edges, 'distance' for weighted mutual : bool If True, only connect if both points are in each other's KNN If False, connect if either is in other's KNN (symmetric KNN) Properties: - Adapts to local density naturally - Guarantees each point has at least k neighbors - Produces sparse graphs (efficient for large n) """ nn = NearestNeighbors(n_neighbors=k+1) # +1 because point is its own neighbor nn.fit(self.X) if mode == 'connectivity': # Binary adjacency A = nn.kneighbors_graph(mode='connectivity') else: # Distance-weighted A = nn.kneighbors_graph(mode='distance') # Remove self-connections A = lil_matrix(A) A.setdiag(0) A = A.tocsr() if mutual: # Mutual KNN: keep edge only if both directions present W = A.multiply(A.T) else: # Symmetric KNN: symmetrize the adjacency matrix W = A + A.T W.data[W.data > 1] = 1 # Binary weights return W.toarray() def gaussian_kernel(self, sigma=None, sigma_percentile=50): """ Construct fully connected graph with Gaussian (RBF) kernel weights. W_ij = exp(-||x_i - x_j||^2 / (2*sigma^2)) Parameters: ----------- sigma : float or None Kernel bandwidth. If None, computed from distance distribution. sigma_percentile : int If sigma is None, use this percentile of distances as sigma. Properties: - Soft notion of similarity (no hard cutoffs) - Dense matrix (O(n^2) storage) - Smooth, differentiable with respect to data """ D = self._compute_distances() if sigma is None: # Heuristic: use percentile of non-zero distances nonzero_dists = D[D > 0] sigma = np.percentile(nonzero_dists, sigma_percentile) # Gaussian kernel W = np.exp(-D**2 / (2 * sigma**2)) np.fill_diagonal(W, 0) # No self-loops return W, sigma def local_scaling_kernel(self, k=7): """ Construct graph with local scaling (Zelnik-Manor & Perona, 2004). Uses local statistics to define point-specific scales: W_ij = exp(-||x_i - x_j||^2 / (sigma_i * sigma_j)) where sigma_i is the distance to the k-th nearest neighbor of x_i. Properties: - Automatically adapts to local data density - No global bandwidth parameter to tune - Handles clusters of different scales """ D = self._compute_distances() # Compute local scales (distance to k-th neighbor) nn = NearestNeighbors(n_neighbors=k+1) nn.fit(self.X) distances, _ = nn.kneighbors(self.X) sigma_local = distances[:, k] # k-th neighbor distance # Outer product of local scales sigma_matrix = np.outer(sigma_local, sigma_local) sigma_matrix[sigma_matrix == 0] = 1e-10 # Avoid division by zero # Local scaling kernel W = np.exp(-D**2 / sigma_matrix) np.fill_diagonal(W, 0) return W, sigma_local def demonstrate_graph_construction(): """ Demonstrate different graph constructions on two moons dataset. """ from sklearn.datasets import make_moons # Generate two moons data X, y_true = make_moons(n_samples=200, noise=0.05, random_state=42) sg = SimilarityGraph(X) # Compare different graphs print("Graph Construction Comparison") print("=" * 50) # Epsilon neighborhood for eps in [0.2, 0.4, 0.6]: W = sg.epsilon_neighborhood(eps) n_edges = np.sum(W > 0) // 2 connected = np.all(np.sum(W, axis=1) > 0) print(f"ε-neighborhood (ε={eps}): {n_edges} edges, connected={connected}") print() # KNN graph for k in [5, 10, 15]: W = sg.knn_graph(k, mutual=False) n_edges = np.sum(W > 0) // 2 print(f"KNN graph (k={k}): {n_edges} edges") print() # Gaussian kernel for sigma in [0.1, 0.3, 0.5]: W, _ = sg.gaussian_kernel(sigma=sigma) effective_edges = np.sum(W > 0.01) // 2 # Count "significant" edges print(f"Gaussian (σ={sigma}): {effective_edges} edges with W>0.01") return X, y_true, sg if __name__ == "__main__": demonstrate_graph_construction()The quality of spectral clustering depends heavily on the similarity graph. Poor graph construction can obscure cluster structure entirely. Key guidelines: (1) The graph should be connected—at minimum, contain a path between any two nodes. Disconnected components become automatic clusters. (2) Edge weights should reflect true similarity concepts for your domain. (3) When in doubt, KNN graphs are often most robust, adapting naturally to varying density.
With a similarity graph constructed, clustering becomes graph partitioning: finding a division of vertices into groups such that within-group edges have high weight and between-group edges have low weight.
The Minimum Cut Problem:
Given a partition of vertices into disjoint sets $A$ and $B$ (where $A \cup B = V$ and $A \cap B = \emptyset$), the cut is the total weight of edges crossing between them:
$$\text{cut}(A, B) = \sum_{i \in A, j \in B} W_{ij}$$
A natural objective is to minimize this cut—sever as few connections as possible. For multiple clusters $C_1, \ldots, C_k$, we minimize:
$$\text{cut}(C_1, \ldots, C_k) = \frac{1}{2} \sum_{i=1}^{k} W(C_i, \bar{C_i})$$
where $\bar{C_i}$ denotes the complement of cluster $C_i$ and $W(A, B) = \sum_{i \in A, j \in B} W_{ij}$.
The Problem with Minimum Cut:
Unfortunately, naive minimum cut produces degenerate solutions. Consider a graph with 100 nodes and one outlier loosely connected to the main mass. The minimum cut would isolate that single outlier—technically minimizing cut weight but producing a useless "cluster" of size 1.
The minimum cut objective is biased toward cutting off small, isolated portions of the graph rather than finding balanced partitions. We need objective functions that account for cluster sizes.
Ratio Cut and Normalized Cut:
Two classical solutions normalize the cut by cluster size:
Ratio Cut (RatioCut): $$\text{RatioCut}(C_1, \ldots, C_k) = \sum_{i=1}^{k} \frac{\text{cut}(C_i, \bar{C_i})}{|C_i|}$$
Divides by the number of vertices in each cluster. Penalizes small clusters because $|C_i|$ appears in the denominator.
Normalized Cut (NCut): $$\text{NCut}(C_1, \ldots, C_k) = \sum_{i=1}^{k} \frac{\text{cut}(C_i, \bar{C_i})}{\text{vol}(C_i)}$$
where $\text{vol}(C) = \sum_{i \in C} d_i$ and $d_i = \sum_j W_{ij}$ is the degree of node $i$.
NCut normalizes by the total edge weight (volume) incident to each cluster, accounting for the connectivity of nodes rather than just their count. This is generally preferred and connects more naturally to random walk interpretations.
| Objective | Formula | Normalization | Properties |
|---|---|---|---|
| Minimum Cut | Σ cut(Cᵢ, C̄ᵢ) | None | Favors isolating small groups; often degenerate |
| Ratio Cut | Σ cut(Cᵢ, C̄ᵢ)/|Cᵢ| | Cluster size (# vertices) | Balanced in vertex count; ignores vertex degrees |
| Normalized Cut | Σ cut(Cᵢ, C̄ᵢ)/vol(Cᵢ) | Cluster volume (sum of degrees) | Balanced in connectivity; relates to random walks |
Computational Complexity:
Here's the critical challenge: finding the exact minimum of RatioCut or NCut is NP-hard. We cannot efficiently solve these problems by exhaustive search over all possible partitions—the number of partitions grows exponentially with $n$.
This is where spectral methods enter. By relaxing the discrete optimization problem (partitions) to a continuous one (real-valued vectors), we transform an intractable combinatorial problem into an eigenvalue problem solvable in polynomial time. The eigenvectors of graph Laplacian matrices provide approximate solutions to RatioCut and NCut that are, in practice, remarkably effective.
This connection—between graph cuts and eigenvalue problems—is the mathematical heart of spectral clustering and the subject of the following pages.
The key insight enabling spectral clustering: (1) Express the cut objective in matrix form using indicator vectors, (2) Show this equals a Rayleigh quotient involving the graph Laplacian, (3) Relax discrete indicators to continuous vectors, (4) Solve the relaxed problem via eigendecomposition, (5) Discretize the continuous solution back to cluster assignments. This elegant strategy converts NP-hard combinatorics into tractable linear algebra.
To build deep intuition, let's examine the simplest case: partitioning a graph into exactly two clusters. This reveals the core mathematics before generalizing to k clusters.
Indicator Vector Representation:
We encode a partition ${A, B}$ using an indicator vector $f \in \mathbb{R}^n$:
$$f_i = \begin{cases} \sqrt{|B|/|A|} & \text{if } i \in A \ -\sqrt{|A|/|B|} & \text{if } i \in B \end{cases}$$
This specific scaling ensures the vector is orthogonal to the constant vector and has unit norm in an appropriate sense. For the simple case of equal-sized clusters: $f_i = +1$ if $i \in A$, $f_i = -1$ if $i \in B$.
Connection to RatioCut:
A remarkable fact emerges: the RatioCut objective can be written as:
$$\text{RatioCut}(A, B) = \frac{f^T L f}{f^T f}$$
where $L$ is the graph Laplacian matrix (to be developed in the next page). This is a Rayleigh quotient—a ratio whose minimization over vectors is a classic eigenvalue problem!
Why This Matters:
Minimizing the Rayleigh quotient $f^T L f / f^T f$ over all vectors $f$ orthogonal to constants yields the second-smallest eigenvector of $L$—called the Fiedler vector.
The Fiedler vector provides a real-valued "ordering" of vertices that respects cluster structure:
For two moons data, the Fiedler vector might assign positive values to all points in one moon and negative values to all points in the other—even though the moons geometrically interlock!
The Spectral Clustering Recipe (Two Clusters):
This elegant procedure transforms a combinatorial nightmare into straightforward linear algebra.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
import numpy as npfrom scipy.linalg import eighfrom sklearn.datasets import make_moonsimport matplotlib.pyplot as plt def spectral_two_clusters(W): """ Spectral clustering for exactly two clusters using the Fiedler vector. This function demonstrates the core spectral clustering idea: the second eigenvector of the graph Laplacian encodes cluster structure. Parameters: ----------- W : ndarray, shape (n, n) Symmetric weight/adjacency matrix with non-negative entries. W[i,j] represents similarity between points i and j. Returns: -------- labels : ndarray, shape (n,) Binary cluster assignments (0 or 1) fiedler_vector : ndarray, shape (n,) The second eigenvector (Fiedler vector) eigenvalues : ndarray First few eigenvalues of the Laplacian """ n = W.shape[0] # Step 1: Compute the degree matrix # d_i = sum of all edge weights incident to node i d = np.sum(W, axis=1) D = np.diag(d) # Step 2: Compute the unnormalized graph Laplacian # L = D - W # Key properties of L: # - Symmetric, positive semi-definite # - Smallest eigenvalue is 0, with eigenvector = constant # - Second-smallest eigenvalue is the algebraic connectivity L = D - W # Step 3: Eigendecomposition # We only need the smallest eigenvalues/eigenvectors # eigsh would be more efficient for large sparse matrices eigenvalues, eigenvectors = eigh(L) # Step 4: Extract the Fiedler vector (2nd eigenvector) # Note: eigenvalues are sorted in ascending order fiedler_vector = eigenvectors[:, 1] # Step 5: Cluster assignment based on sign # Positive -> cluster 0, Negative -> cluster 1 labels = (fiedler_vector >= 0).astype(int) return labels, fiedler_vector, eigenvalues[:5] def visualize_fiedler_clustering(): """ Demonstrate spectral clustering on two moons dataset. This visualization shows: 1. The original data with true labels 2. The Fiedler vector values for each point 3. The spectral clustering result """ # Generate two moons dataset X, y_true = make_moons(n_samples=200, noise=0.06, random_state=42) # Construct similarity graph using Gaussian kernel from scipy.spatial.distance import cdist distances = cdist(X, X) sigma = np.median(distances[distances > 0]) W = np.exp(-distances**2 / (2 * sigma**2)) np.fill_diagonal(W, 0) # Apply spectral clustering labels, fiedler, eigenvalues = spectral_two_clusters(W) # Calculate accuracy (accounting for possible label flip) acc1 = np.mean(labels == y_true) acc2 = np.mean(labels != y_true) accuracy = max(acc1, acc2) print("Spectral Clustering Results (Two Moons)") print("=" * 50) print(f"First 5 eigenvalues: {eigenvalues}") print(f"Fiedler vector range: [{fiedler.min():.3f}, {fiedler.max():.3f}]") print(f"Clustering accuracy: {accuracy:.1%}") # The eigenvalue gap indicates cluster separation quality spectral_gap = eigenvalues[2] - eigenvalues[1] print(f"Spectral gap (λ₃ - λ₂): {spectral_gap:.4f}") # Create visualization fig, axes = plt.subplots(1, 3, figsize=(15, 4)) # Plot 1: True labels axes[0].scatter(X[:, 0], X[:, 1], c=y_true, cmap='coolwarm', s=50, alpha=0.7) axes[0].set_title("True Labels") axes[0].set_aspect('equal') # Plot 2: Fiedler vector values scatter = axes[1].scatter(X[:, 0], X[:, 1], c=fiedler, cmap='RdBu', s=50, alpha=0.7) plt.colorbar(scatter, ax=axes[1], label='Fiedler value') axes[1].axhline(y=0, color='k', linestyle='--', alpha=0.3) axes[1].set_title("Fiedler Vector Values") axes[1].set_aspect('equal') # Plot 3: Spectral clustering result axes[2].scatter(X[:, 0], X[:, 1], c=labels, cmap='coolwarm', s=50, alpha=0.7) axes[2].set_title(f"Spectral Clustering (Acc: {accuracy:.1%})") axes[2].set_aspect('equal') plt.tight_layout() plt.savefig('fiedler_clustering.png', dpi=150) plt.close() return X, y_true, labels, fiedler if __name__ == "__main__": visualize_fiedler_clustering()Notice what happened: the Fiedler vector transformed a problem where K-Means fails completely (non-convex, interlocking clusters) into a problem where simple sign-based thresholding achieves perfect separation. The spectral embedding has 'unfolded' the data structure. This is the magic of spectral clustering—it finds representations where cluster structure becomes linearly separable.
The success of graph-based methods isn't accidental—it reflects deep mathematical principles about how graphs encode geometric and topological information.
Local Connectivity Preserves Global Structure:
Consider a point $x$ on one moon of the two moons dataset. Its K-nearest neighbors are all on the same moon (assuming noise is moderate). This point connects to its neighbors, those neighbors connect to their neighbors, and so on—forming a chain of connectivity tracing along the moon's shape.
The graph doesn't "know" about the moon's crescent shape in any explicit way. But the pattern of local connections implicitly encodes the manifold structure. When we analyze the graph spectrally, this structure is revealed and exploited.
Random Walk Interpretation:
A beautiful alternative view of spectral clustering comes from random walks. Imagine a random walker hopping between nodes, choosing the next node with probability proportional to edge weights.
The key insight: a random walker tends to stay within clusters if between-cluster edges are sparse. The walker will bounce around within densely connected regions before occasionally escaping to another.
The normalized cut objective exactly measures how quickly a random walk escapes from a cluster. Minimizing NCut finds partitions where random walks are "trapped" within clusters the longest—precisely what we want!
$$\text{NCut probability interpretation: } \frac{\text{cut}(A, \bar{A})}{\text{vol}(A)} = P(\text{transition out of } A | \text{currently in } A)$$
Manifold Structure and Geodesic Distance:
Graph-based clustering also connects to manifold learning. When data lies on a curved manifold embedded in high-dimensional space, Euclidean distance can be misleading—two points might be close in ambient space but far apart along the manifold.
Similarity graphs with local connectivity (like KNN graphs) approximate the manifold's topology. The geodesic distance (shortest path along the manifold) is approximated by the graph distance (shortest path along edges). Spectral clustering thus respects manifold structure in ways Euclidean methods cannot.
Multi-Scale Information:
Eigenvectors of the graph Laplacian capture structure at different scales:
Using multiple eigenvectors for clustering ($k$ eigenvectors for $k$ clusters) leverages multi-scale information simultaneously, finding partitions that are optimal at both global and local levels.
Spectral clustering works because it asks the right question: 'Which points belong together based on their pattern of connections?' rather than 'Which points are geometrically close?' This shift in perspective—from absolute position to relative connectivity—unlocks structure invisible to geometric methods.
Graph-based clustering extends far beyond standard data clustering, connecting to diverse fields and applications:
Image Segmentation:
Treat pixels as graph nodes, with edges connecting neighboring pixels weighted by color/texture similarity. Spectral clustering finds segments—contiguous regions of similar appearance. The normalized cut formulation was originally proposed for this application (Shi & Malik, 2000).
Community Detection in Networks:
Social networks, citation networks, and biological networks are naturally graphs. Spectral clustering discovers communities—groups of nodes more connected internally than externally. This is the same mathematical problem as data clustering!
Natural Language Processing:
Document similarity graphs enable document clustering. Word co-occurrence graphs support semantic analysis. Spectral methods find topics and categories without explicit labels.
| Domain | Nodes | Edges (Similarity) | Application |
|---|---|---|---|
| Computer Vision | Pixels/superpixels | Color/texture similarity | Image segmentation, object detection |
| Social Networks | Users | Friendships/interactions | Community detection, influencer identification |
| Bioinformatics | Genes/proteins | Co-expression/interaction | Gene module discovery, protein complex detection |
| NLP | Documents/words | Semantic similarity | Topic modeling, document organization |
| Recommendation | Users/items | Interaction patterns | User segmentation, item categorization |
| Neuroscience | Brain regions | Functional connectivity | Brain network parcellation |
Connections to Other Methods:
Kernel K-Means: Spectral clustering is equivalent to kernel K-Means with specific kernels derived from the graph. The spectral embedding is the implicit feature space.
Laplacian Eigenmaps: A dimensionality reduction technique that uses the same eigenvector computation. Spectral clustering adds the final clustering step after embedding.
Diffusion Maps: Uses powers of the graph's transition matrix. Spectral clustering is related to the limit of diffusion as time goes to infinity.
Graph Neural Networks: Modern GNNs can be seen as learnable versions of spectral methods, with message passing replacing fixed Laplacian eigenvectors.
Understanding spectral clustering provides deep insight into this entire family of graph-based machine learning methods.
We have established the conceptual foundation for spectral clustering—the shift from geometric to graph-theoretic thinking about cluster structure. Let's consolidate the key insights:
What's Next:
This page established the why and what of graph-based clustering. The next page dives into the how: the graph Laplacian matrix. We will formally define the Laplacian, derive its fundamental properties, and understand why its eigenvectors encode cluster structure. This mathematical foundation is essential for truly mastering spectral clustering.
You now understand the conceptual shift from geometric to graph-based clustering. The key insight is that cluster structure can be defined by connectivity patterns rather than geometric proximity. This perspective enables discovery of clusters invisible to methods like K-Means. Next, we formalize this intuition through the mathematical machinery of graph Laplacians.