Loading learning content...
The Normalized Cut (NCut) objective, introduced by Shi and Malik in their seminal 2000 paper, represents a fundamental advancement in graph partitioning for clustering. While RatioCut normalizes by cluster size (number of vertices), NCut normalizes by cluster volume (sum of edge weights incident to each cluster)—a distinction with profound theoretical and practical implications.
NCut has become the standard objective in spectral clustering for several compelling reasons:
This page develops the NCut formulation completely, from definition through to practical computation.
By the end of this page, you will understand: (1) The mathematical definition of NCut and its relationship to RatioCut, (2) Why volume normalization is often superior to cardinality normalization, (3) The random walk and Markov chain interpretation of NCut, (4) How NCut connects to the normalized Laplacian's eigenstructure, and (5) Practical algorithms for NCut-based spectral clustering (Shi-Malik, Ng-Jordan-Weiss).
Recall: The Cut and Association Measures
For a partition of vertices into sets $A$ and $B = \bar{A}$:
Cut: The total edge weight crossing between clusters $$\text{cut}(A, B) = \sum_{i \in A, j \in B} W_{ij}$$
Association (Volume): The total edge weight incident to each cluster $$\text{assoc}(A, V) = \text{vol}(A) = \sum_{i \in A} d_i = \sum_{i \in A, j \in V} W_{ij}$$
where $d_i = \sum_j W_{ij}$ is the degree of node $i$.
Note: $\text{vol}(A) = \text{cut}(A, B) + \text{cut}(A, A)$, where $\text{cut}(A, A)$ counts edges within $A$.
The Normalized Cut:
$$\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$$
This is the sum of two ratios:
Alternative Formulations:
NCut can be rewritten in several equivalent forms:
$$\text{NCut}(A, B) = \text{cut}(A, B) \cdot \left( \frac{1}{\text{vol}(A)} + \frac{1}{\text{vol}(B)} \right)$$
$$\text{NCut}(A, B) = \frac{\text{cut}(A, B) \cdot \text{vol}(V)}{\text{vol}(A) \cdot \text{vol}(B)}$$
(using $\text{vol}(A) + \text{vol}(B) = \text{vol}(V)$)
Relationship to Normalized Association (NAssoc):
Shi and Malik also defined: $$\text{NAssoc}(A, B) = \frac{\text{assoc}(A, A)}{\text{vol}(A)} + \frac{\text{assoc}(B, B)}{\text{vol}(B)}$$
where $\text{assoc}(A, A) = \sum_{i,j \in A} W_{ij}$ is the internal connectivity of $A$.
Key Identity: $$\text{NCut}(A, B) + \text{NAssoc}(A, B) = 2$$
This means minimizing NCut is equivalent to maximizing NAssoc—finding clusters with high internal connectivity relative to their total connectivity.
| Aspect | RatioCut | NCut |
|---|---|---|
| Formula (2-way) | cut/(|A|) + cut/(|B|) | cut/vol(A) + cut/vol(B) |
| Normalization | Number of vertices | Sum of vertex degrees (volume) |
| Bias toward | Equal-sized clusters | Equal-volume clusters |
| Handles varying degrees | Poorly | Well |
| Related Laplacian | Unnormalized L | Normalized L_sym or L_rw |
| Interpretation | Number-balanced partition | Connectivity-balanced partition |
Consider a social network where some users are highly connected (influencers) and others have few connections. RatioCut might isolate a few high-degree influencers into tiny clusters (each is expensive to cut due to high degree). NCut prevents this by normalizing by volume—isolating a high-degree node means high volume in the denominator, penalizing the imbalance. NCut finds partitions balanced in terms of total connectivity, not just node count.
One of NCut's most elegant features is its interpretation in terms of random walks on the graph. This provides deep intuition for why NCut-based clustering works.
Random Walk on a Graph:
Define a random walk where:
The matrix $P = D^{-1}W$ is the transition matrix—row-stochastic with $\sum_j P_{ij} = 1$.
Stationary Distribution:
For a connected, non-bipartite graph, the random walk has a unique stationary distribution: $$\pi_i = \frac{d_i}{\text{vol}(V)}$$
Nodes are visited in proportion to their degree. This is intuitive: a random walker spends more time at well-connected nodes.
NCut as Escape Probability:
The NCut has a beautiful random walk interpretation:
$$\frac{\text{cut}(A, B)}{\text{vol}(A)} = P(\text{transition from } A \text{ to } B | \text{currently in } A)$$
This is the probability that a random walker currently in cluster $A$ will transition to cluster $B$ in one step.
Proof of the Escape Probability Interpretation:
The probability of being at node $i \in A$ (under stationary distribution) is $\pi_i = d_i / \text{vol}(V)$.
Conditional on being in $A$, the probability of being at $i$ is: $$P(\text{at } i | \text{in } A) = \frac{d_i / \text{vol}(V)}{\sum_{k \in A} d_k / \text{vol}(V)} = \frac{d_i}{\text{vol}(A)}$$
The probability of transitioning from $i$ to $B$ is $\sum_{j \in B} P_{ij} = \sum_{j \in B} W_{ij}/d_i$.
Thus: $$P(\text{transition to } B | \text{in } A) = \sum_{i \in A} \frac{d_i}{\text{vol}(A)} \cdot \frac{\sum_{j \in B} W_{ij}}{d_i} = \frac{\text{cut}(A, B)}{\text{vol}(A)}$$
NCut Interpretation:
Since $\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$:
$$\text{NCut}(A, B) = P(A \to B) + P(B \to A)$$
Minimizing NCut means finding partitions where random walks are unlikely to escape from either cluster!
This is precisely what we want: a good cluster traps random walkers, keeping them bouncing within the cluster rather than wandering to other clusters.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187
import numpy as npfrom scipy.spatial.distance import cdistfrom sklearn.datasets import make_moons def compute_transition_matrix(W): """ Compute the random walk transition matrix P = D^{-1} W. P[i,j] = probability of walking from node i to node j = W[i,j] / sum_k W[i,k] """ d = np.sum(W, axis=1) # Handle isolated nodes d_inv = np.zeros_like(d) d_inv[d > 0] = 1.0 / d[d > 0] P = np.diag(d_inv) @ W return P, d def verify_ncut_random_walk_interpretation(W, cluster_labels): """ Verify that NCut equals the sum of transition probabilities between clusters under the random walk model. Parameters: ----------- W : ndarray, shape (n, n) Symmetric weight matrix cluster_labels : ndarray, shape (n,) Binary cluster assignments (0 or 1) Returns: -------- ncut : float Normalized cut value escape_probs : dict Escape probabilities P(A→B) and P(B→A) """ P, d = compute_transition_matrix(W) n = len(cluster_labels) # Identify clusters A_mask = cluster_labels == 0 B_mask = cluster_labels == 1 A_indices = np.where(A_mask)[0] B_indices = np.where(B_mask)[0] # Compute volumes vol_A = np.sum(d[A_mask]) vol_B = np.sum(d[B_mask]) # Compute cut(A, B) cut_AB = 0.0 for i in A_indices: for j in B_indices: cut_AB += W[i, j] # NCut formula ncut = cut_AB / vol_A + cut_AB / vol_B # Random walk interpretation: # P(A → B) = sum_i [P(at i | in A) * P(transition to B | at i)] # = sum_i [(d_i/vol_A) * (sum_{j in B} W[i,j]/d_i)] # = sum_i (sum_{j in B} W[i,j]) / vol_A # = cut(A,B) / vol_A P_A_to_B = cut_AB / vol_A P_B_to_A = cut_AB / vol_B return { 'ncut': ncut, 'cut_AB': cut_AB, 'vol_A': vol_A, 'vol_B': vol_B, 'P_A_to_B': P_A_to_B, 'P_B_to_A': P_B_to_A, 'ncut_equals_sum': np.isclose(ncut, P_A_to_B + P_B_to_A) } def simulate_random_walk(P, start_cluster, cluster_labels, n_steps=10000): """ Simulate a random walk and estimate escape probability empirically. Parameters: ----------- P : ndarray Transition matrix start_cluster : int Starting cluster (0 or 1) cluster_labels : ndarray Cluster assignments n_steps : int Number of random walk steps Returns: -------- escape_prob : float Empirical probability of escaping to the other cluster """ n = len(cluster_labels) # Start at a random node in the starting cluster start_nodes = np.where(cluster_labels == start_cluster)[0] escapes = 0 in_cluster_steps = 0 # Multiple random walks n_walks = 100 steps_per_walk = n_steps // n_walks for _ in range(n_walks): current = np.random.choice(start_nodes) for step in range(steps_per_walk): if cluster_labels[current] == start_cluster: in_cluster_steps += 1 # Take a step probs = P[current] if probs.sum() > 0: next_node = np.random.choice(n, p=probs) if cluster_labels[next_node] != start_cluster: escapes += 1 current = next_node escape_prob = escapes / in_cluster_steps if in_cluster_steps > 0 else 0 return escape_prob def demonstrate_random_walk_ncut(): """ Demonstrate the random walk interpretation of NCut. """ np.random.seed(42) # Generate two moons data X, y_true = make_moons(n_samples=100, noise=0.05, random_state=42) # Build similarity graph distances = cdist(X, X) sigma = 0.3 W = np.exp(-distances**2 / (2 * sigma**2)) np.fill_diagonal(W, 0) print("Random Walk Interpretation of NCut") print("=" * 55) # Compute NCut for true clustering result = verify_ncut_random_walk_interpretation(W, y_true) print(f"\nTrue clustering NCut analysis:") print(f" cut(A, B) = {result['cut_AB']:.4f}") print(f" vol(A) = {result['vol_A']:.4f}") print(f" vol(B) = {result['vol_B']:.4f}") print(f" NCut = {result['ncut']:.6f}") print(f"\nRandom walk escape probabilities:") print(f" P(A → B) = cut/vol(A) = {result['P_A_to_B']:.6f}") print(f" P(B → A) = cut/vol(B) = {result['P_B_to_A']:.6f}") print(f" NCut = P(A→B) + P(B→A): {result['ncut_equals_sum']}") # Simulate random walks to verify empirically P, _ = compute_transition_matrix(W) print(f"\nEmpirical verification (10000 random walk steps):") emp_P_A_to_B = simulate_random_walk(P, 0, y_true) emp_P_B_to_A = simulate_random_walk(P, 1, y_true) print(f" Simulated P(A → B) ≈ {emp_P_A_to_B:.6f}") print(f" Simulated P(B → A) ≈ {emp_P_B_to_A:.6f}") print(f" Theoretical: {result['P_A_to_B']:.6f}, {result['P_B_to_A']:.6f}") # Compare with a bad clustering bad_labels = np.random.randint(0, 2, size=len(y_true)) result_bad = verify_ncut_random_walk_interpretation(W, bad_labels) print(f"\nComparison with random clustering:") print(f" True clustering NCut: {result['ncut']:.6f}") print(f" Random clustering NCut: {result_bad['ncut']:.6f}") print(f" → True clustering has lower NCut (random walks trapped better)") return result if __name__ == "__main__": demonstrate_random_walk_ncut()Think of each cluster as a room and edges between clusters as doors. NCut measures how easily a random walker can wander from one room to another. Small NCut means: (1) Doors between rooms are narrow (low cut), (2) Each room is internally well-connected (high volume), (3) Random walkers stay trapped in rooms for a long time. This is exactly the intuition of a 'good cluster': cohesive internally, well-separated externally.
The connection between NCut and the normalized Laplacian parallels the RatioCut-unnormalized Laplacian connection, but requires careful handling of the volume weighting.
NCut Matrix Formulation:
For a partition into $A$ and $B$, define:
$$f_i = \begin{cases} \sqrt{\text{vol}(B) / \text{vol}(A)} & \text{if } i \in A \ -\sqrt{\text{vol}(A) / \text{vol}(B)} & \text{if } i \in B \end{cases}$$
Unlike RatioCut, this scaling uses volume instead of cardinality.
Key Properties of This Encoding:
The Connection:
$$f^T L f = \text{vol}(V) \cdot \text{NCut}(A, B)$$
where $L = D - W$ is the unnormalized Laplacian.
To convert to a standard eigenvalue problem, substitute $g = D^{1/2} f$:
$$g^T L_{sym} g = \text{vol}(V) \cdot \text{NCut}(A, B)$$
where $L_{sym} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} W D^{-1/2}$.
The Optimization Problem:
Minimizing NCut becomes:
$$\min_{g \in \mathbb{R}^n} g^T L_{sym} g \quad \text{subject to } g \perp D^{1/2}\mathbf{1}, g^T g = \text{vol}(V), g \text{ takes two values}$$
Spectral Relaxation:
Dropping the discrete constraint:
$$\min_{g \in \mathbb{R}^n} \frac{g^T L_{sym} g}{g^T g} \quad \text{subject to } g \perp D^{1/2}\mathbf{1}$$
The solution is the second eigenvector of $L_{sym}$.
Back-Transformation:
The eigenvector $g_2$ of $L_{sym}$ relates to a vector $f = D^{-1/2} g_2$, which is the second eigenvector of the random walk Laplacian $L_{rw} = I - D^{-1}W$.
Summary of Connections:
| Objective | Laplacian | Eigenvector gives |
|---|---|---|
| RatioCut | $L = D - W$ | Indicator scaled by $\sqrt{ |
| NCut | $L_{sym} = I - D^{-1/2}WD^{-1/2}$ | Indicator scaled by $\sqrt{\text{vol}(B)/\text{vol}(A)}$, transformed by $D^{1/2}$ |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
import numpy as npfrom scipy.linalg import eigh def demonstrate_ncut_spectral_connection(): """ Demonstrate the connection between NCut and normalized Laplacian eigenvectors. """ np.random.seed(42) # Create a simple graph: two cliques weakly connected # Clique 1: nodes 0-4, Clique 2: nodes 5-9 n = 10 W = np.zeros((n, n)) # Clique 1 (fully connected) for i in range(5): for j in range(i+1, 5): W[i, j] = W[j, i] = 1.0 # Clique 2 (fully connected) for i in range(5, 10): for j in range(i+1, 10): W[i, j] = W[j, i] = 1.0 # Weak connection between cliques W[4, 5] = W[5, 4] = 0.1 # Compute degrees and Laplacians d = np.sum(W, axis=1) D = np.diag(d) L = D - W # Normalized Laplacian L_sym d_inv_sqrt = 1.0 / np.sqrt(d) D_inv_sqrt = np.diag(d_inv_sqrt) L_sym = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt # Eigendecomposition of L_sym eigenvalues, eigenvectors = eigh(L_sym) print("NCut and Normalized Laplacian Connection") print("=" * 55) print("\nGraph: Two 5-node cliques connected by edge weight 0.1") print(f"Degrees: {d}") print(f"\nFirst 4 eigenvalues of L_sym: {eigenvalues[:4].round(6)}") # Define true partition A = np.array([0, 1, 2, 3, 4]) B = np.array([5, 6, 7, 8, 9]) # Compute NCut directly cut_AB = W[4, 5] # Only edge between cliques vol_A = np.sum(d[:5]) vol_B = np.sum(d[5:]) ncut_direct = cut_AB / vol_A + cut_AB / vol_B print(f"\nDirect NCut computation:") print(f" cut(A, B) = {cut_AB}") print(f" vol(A) = {vol_A}, vol(B) = {vol_B}") print(f" NCut = {ncut_direct:.6f}") # Verify via quadratic form # Create the indicator vector f with volume weighting f = np.zeros(n) f[:5] = np.sqrt(vol_B / vol_A) f[5:] = -np.sqrt(vol_A / vol_B) # Check f^T L f = vol(V) * NCut vol_V = vol_A + vol_B quadratic_form = f @ L @ f ncut_from_qf = quadratic_form / vol_V print(f"\nQuadratic form verification:") print(f" f^T L f = {quadratic_form:.6f}") print(f" vol(V) = {vol_V}") print(f" NCut from quadratic form = {ncut_from_qf:.6f}") print(f" Match: {np.isclose(ncut_direct, ncut_from_qf)}") # Analyze second eigenvector g2 = eigenvectors[:, 1] print(f"\nSecond eigenvector of L_sym (Fiedler vector):") print(f" Values: {g2.round(4)}") print(f" Signs: {np.sign(g2).astype(int)}") print(" → Clear separation: positive for clique 1, negative for clique 2") # Transform back to get f = D^{-1/2} g f_from_eig = D_inv_sqrt @ g2 print(f"\nTransformed vector D^{{-1/2}} g2:") print(f" Values: {f_from_eig.round(4)}") return eigenvalues, eigenvectors, W if __name__ == "__main__": demonstrate_ncut_spectral_connection()Two major algorithmic variants implement spectral clustering for NCut: the Shi-Malik algorithm and the Ng-Jordan-Weiss (NJW) algorithm. Both use the normalized Laplacian but differ in eigenvector handling.
Shi-Malik Algorithm (2000):
Notes on Shi-Malik:
Ng-Jordan-Weiss (NJW) Algorithm (2001):
The Critical Row-Normalization Step:
Row normalization projects each point onto the unit sphere. This step is crucial for NJW's success:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222
import numpy as npfrom scipy.linalg import eighfrom scipy.sparse.linalg import eigshfrom sklearn.cluster import KMeansfrom scipy.spatial.distance import cdist class SpectralClusteringAlgorithms: """ Implementation of major spectral clustering algorithms. """ def __init__(self, W): """ Initialize with weight matrix W. Parameters: ----------- W : ndarray, shape (n, n) Symmetric non-negative weight matrix """ self.W = W self.n = W.shape[0] self.d = np.sum(W, axis=1) self.D = np.diag(self.d) def shi_malik(self, n_clusters=2): """ Shi-Malik algorithm using generalized eigenvalue problem. Solves: Lv = λDv Equivalent to finding eigenvectors of L_rw = D^{-1}L Returns: -------- labels : ndarray Cluster assignments """ # Unnormalized Laplacian L = self.D - self.W # Solve generalized eigenvalue problem Lv = λDv # This is equivalent to eigenvectors of D^{-1}L # scipy.linalg.eigh can handle generalized problems eigenvalues, eigenvectors = eigh(L, self.D) # Take first k eigenvectors (smallest eigenvalues) U = eigenvectors[:, :n_clusters] # K-Means on embedding kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) labels = kmeans.fit_predict(U) return labels, eigenvalues[:n_clusters+2], U def ng_jordan_weiss(self, n_clusters=2): """ Ng-Jordan-Weiss algorithm with row normalization. Uses symmetric normalized Laplacian L_sym with row-normalized eigenvector embedding. Returns: -------- labels : ndarray Cluster assignments """ # Compute D^{-1/2} d_inv_sqrt = np.zeros(self.n) nonzero = self.d > 0 d_inv_sqrt[nonzero] = 1.0 / np.sqrt(self.d[nonzero]) D_inv_sqrt = np.diag(d_inv_sqrt) # Normalized Laplacian L_sym L_sym = np.eye(self.n) - D_inv_sqrt @ self.W @ D_inv_sqrt # Eigendecomposition eigenvalues, eigenvectors = eigh(L_sym) # Take first k eigenvectors U = eigenvectors[:, :n_clusters] # Row normalization (critical step!) row_norms = np.linalg.norm(U, axis=1, keepdims=True) row_norms[row_norms == 0] = 1 # Avoid division by zero T = U / row_norms # K-Means on normalized embedding kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) labels = kmeans.fit_predict(T) return labels, eigenvalues[:n_clusters+2], T def unnormalized_spectral(self, n_clusters=2): """ Unnormalized spectral clustering (for RatioCut). Uses unnormalized Laplacian without row normalization. """ L = self.D - self.W eigenvalues, eigenvectors = eigh(L) U = eigenvectors[:, :n_clusters] kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) labels = kmeans.fit_predict(U) return labels, eigenvalues[:n_clusters+2], U def compare_algorithms(): """ Compare spectral clustering algorithms on various datasets. """ from sklearn.datasets import make_moons, make_blobs from sklearn.metrics import adjusted_rand_score print("Comparison of Spectral Clustering Algorithms") print("=" * 60) # Dataset 1: Two moons (tests non-convex capability) X_moons, y_moons = make_moons(n_samples=200, noise=0.05, random_state=42) # Dataset 2: Blobs with varying sizes X_blobs, y_blobs = make_blobs( n_samples=[100, 50, 150], centers=[[0, 0], [5, 0], [2.5, 4]], cluster_std=[0.5, 0.3, 0.8], random_state=42 ) datasets = [ ("Two Moons", X_moons, y_moons, 2), ("Unbalanced Blobs", X_blobs, y_blobs, 3), ] for name, X, y_true, k in datasets: print(f"\n--- {name} ---") # Build similarity graph distances = cdist(X, X) sigma = np.median(distances[distances > 0]) * 0.5 W = np.exp(-distances**2 / (2 * sigma**2)) np.fill_diagonal(W, 0) alg = SpectralClusteringAlgorithms(W) # Shi-Malik labels_sm, _, _ = alg.shi_malik(k) ari_sm = adjusted_rand_score(y_true, labels_sm) # Ng-Jordan-Weiss labels_njw, _, _ = alg.ng_jordan_weiss(k) ari_njw = adjusted_rand_score(y_true, labels_njw) # Unnormalized labels_un, _, _ = alg.unnormalized_spectral(k) ari_un = adjusted_rand_score(y_true, labels_un) print(f" Shi-Malik ARI: {ari_sm:.4f}") print(f" Ng-Jordan-Weiss ARI: {ari_njw:.4f}") print(f" Unnormalized ARI: {ari_un:.4f}") print("\n" + "=" * 60) print("Summary:") print("- NJW with row normalization is often most robust") print("- Shi-Malik and NJW both minimize NCut (different formulations)") print("- Unnormalized can be biased for varying-degree graphs") def visualize_row_normalization_effect(): """ Show why row normalization matters in NJW algorithm. """ np.random.seed(42) # Create data with 3 clusters of different sizes/densities n1, n2, n3 = 80, 40, 30 c1 = np.random.randn(n1, 2) * 0.5 c2 = np.random.randn(n2, 2) * 0.3 + [4, 0] c3 = np.random.randn(n3, 2) * 0.4 + [2, 3] X = np.vstack([c1, c2, c3]) y_true = np.array([0]*n1 + [1]*n2 + [2]*n3) # Build graph distances = cdist(X, X) sigma = 0.8 W = np.exp(-distances**2 / (2 * sigma**2)) np.fill_diagonal(W, 0) d = np.sum(W, axis=1) d_inv_sqrt = 1.0 / np.sqrt(np.maximum(d, 1e-10)) L_sym = np.eye(len(X)) - np.diag(d_inv_sqrt) @ W @ np.diag(d_inv_sqrt) eigenvalues, eigenvectors = eigh(L_sym) U = eigenvectors[:, :3] # Without row normalization kmeans_no_norm = KMeans(n_clusters=3, random_state=42, n_init=10) labels_no_norm = kmeans_no_norm.fit_predict(U) # With row normalization row_norms = np.linalg.norm(U, axis=1, keepdims=True) row_norms[row_norms == 0] = 1 T = U / row_norms kmeans_norm = KMeans(n_clusters=3, random_state=42, n_init=10) labels_norm = kmeans_norm.fit_predict(T) from sklearn.metrics import adjusted_rand_score print("\nRow Normalization Effect") print("=" * 55) print(f"Without row normalization ARI: {adjusted_rand_score(y_true, labels_no_norm):.4f}") print(f"With row normalization ARI: {adjusted_rand_score(y_true, labels_norm):.4f}") print("\n→ Row normalization often improves results, especially") print(" for clusters of different sizes/densities") if __name__ == "__main__": compare_algorithms() visualize_row_normalization_effect()| Algorithm | Laplacian Used | Key Step | Best For |
|---|---|---|---|
| Unnormalized | L = D - W | Direct eigenvectors | Uniform degree graphs |
| Shi-Malik | L_rw = D⁻¹L or generalized L, D | Generalized eigenvalue problem | NCut objective; varying degrees |
| Ng-Jordan-Weiss | L_sym = D⁻¹/²LD⁻¹/² | Row normalization of embedding | Most robust; standard choice |
For most applications, use the Ng-Jordan-Weiss algorithm with the symmetric normalized Laplacian and row normalization. This combination is: (1) Robust to varying node degrees, (2) Computationally convenient (symmetric matrix), (3) Widely implemented in libraries (scikit-learn uses this), (4) Theoretically sound (minimizes NCut). The row normalization step is often critical for good results.
The NCut formulation extends naturally to $k$ clusters. The multi-way NCut objective is:
$$\text{NCut}k(C_1, \ldots, C_k) = \sum{j=1}^{k} \frac{\text{cut}(C_j, \bar{C_j})}{\text{vol}(C_j)}$$
Matrix Formulation:
Using indicator vectors $h^{(j)}$ for each cluster with appropriate volume-based scaling, and collecting into matrix $H$:
$$\text{NCut}_k = \text{Tr}((H^T D H)^{-1} H^T L H)$$
With the transformation $U = D^{1/2} H (H^T D H)^{-1/2}$, this becomes:
$$\text{NCut}k = \text{Tr}(U^T L{sym} U)$$
subject to $U^T U = I$.
The relaxed minimizer is again the matrix of first $k$ eigenvectors of $L_{sym}$.
The Full NCut Spectral Clustering Pipeline:
The final K-Means step introduces variability due to random initialization. Best practices: (1) Use multiple initializations (n_init ≥ 10), (2) Use K-Means++ initialization, (3) For critical applications, consider K-Means variants like spherical K-Means or K-Medoids. The spectral embedding does the heavy lifting; K-Means just needs to identify well-separated clusters in the embedding space.
scikit-learn provides a robust implementation of spectral clustering that handles many practical details automatically.
Key Parameters:
n_clusters: Number of clusters $k$affinity: How to construct similarity graph
'rbf' (default): Gaussian kernel'nearest_neighbors': KNN graph'precomputed': Provide your own similarity matrixgamma: Kernel coefficient for RBF (= $1/(2\sigma^2)$)n_neighbors: Number of neighbors for KNN affinityeigen_solver: Algorithm for eigendecompositionassign_labels: Method for cluster assignment ('kmeans' or 'discretize')123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205
import numpy as npfrom sklearn.cluster import SpectralClusteringfrom sklearn.datasets import make_moons, make_circles, make_blobsfrom sklearn.metrics import adjusted_rand_score, silhouette_scorefrom sklearn.preprocessing import StandardScalerimport warnings def spectral_clustering_tutorial(): """ Comprehensive tutorial on using scikit-learn's SpectralClustering. """ print("scikit-learn Spectral Clustering Tutorial") print("=" * 60) # Generate challenging datasets np.random.seed(42) # Two moons - classic spectral clustering showcase X_moons, y_moons = make_moons(n_samples=300, noise=0.05) # Two circles (concentric) X_circles, y_circles = make_circles(n_samples=300, noise=0.05, factor=0.5) # Standard Gaussian blobs (for comparison) X_blobs, y_blobs = make_blobs(n_samples=300, centers=3, cluster_std=0.5) datasets = [ ("Two Moons", X_moons, y_moons, 2), ("Concentric Circles", X_circles, y_circles, 2), ("Gaussian Blobs", X_blobs, y_blobs, 3), ] for name, X, y_true, n_clusters in datasets: print(f"\n--- {name} (k={n_clusters}) ---") # Basic usage with RBF kernel sc_rbf = SpectralClustering( n_clusters=n_clusters, affinity='rbf', gamma=1.0, # Adjust based on data scale random_state=42, n_init=10 ) labels_rbf = sc_rbf.fit_predict(X) ari_rbf = adjusted_rand_score(y_true, labels_rbf) # With nearest neighbors affinity sc_knn = SpectralClustering( n_clusters=n_clusters, affinity='nearest_neighbors', n_neighbors=10, random_state=42, n_init=10 ) labels_knn = sc_knn.fit_predict(X) ari_knn = adjusted_rand_score(y_true, labels_knn) print(f" RBF kernel (gamma=1.0): ARI = {ari_rbf:.4f}") print(f" KNN (n_neighbors=10): ARI = {ari_knn:.4f}") def compare_affinity_types(): """ Compare different affinity/similarity graph constructions. """ print("\n" + "=" * 60) print("Comparing Affinity Types on Two Moons") print("=" * 60) X, y_true = make_moons(n_samples=300, noise=0.05, random_state=42) # Test different gamma values for RBF print("\nRBF kernel with different gamma values:") for gamma in [0.1, 0.5, 1.0, 5.0, 10.0]: with warnings.catch_warnings(): warnings.simplefilter("ignore") sc = SpectralClustering( n_clusters=2, affinity='rbf', gamma=gamma, random_state=42, n_init=10 ) labels = sc.fit_predict(X) ari = adjusted_rand_score(y_true, labels) print(f" gamma={gamma:5.1f}: ARI = {ari:.4f}") # Test different n_neighbors for KNN print("\nKNN with different n_neighbors:") for k in [3, 5, 10, 20, 50]: sc = SpectralClustering( n_clusters=2, affinity='nearest_neighbors', n_neighbors=k, random_state=42, n_init=10 ) labels = sc.fit_predict(X) ari = adjusted_rand_score(y_true, labels) print(f" k={k:3d}: ARI = {ari:.4f}") def precomputed_affinity_example(): """ Example using a precomputed similarity matrix. Useful when: - You have a custom similarity measure - Data is already a distance/similarity matrix - You want full control over graph construction """ print("\n" + "=" * 60) print("Using Precomputed Affinity Matrix") print("=" * 60) from scipy.spatial.distance import cdist # Generate data X, y_true = make_moons(n_samples=200, noise=0.05, random_state=42) # Method 1: Gaussian kernel manually distances = cdist(X, X) sigma = np.median(distances[distances > 0]) * 0.5 # Heuristic W_gaussian = np.exp(-distances**2 / (2 * sigma**2)) np.fill_diagonal(W_gaussian, 0) sc_precomputed = SpectralClustering( n_clusters=2, affinity='precomputed', random_state=42, n_init=10 ) labels_gauss = sc_precomputed.fit_predict(W_gaussian) ari_gauss = adjusted_rand_score(y_true, labels_gauss) # Method 2: Local scaling kernel (Zelnik-Manor & Perona) from sklearn.neighbors import NearestNeighbors nn = NearestNeighbors(n_neighbors=8) nn.fit(X) dist_to_knn, _ = nn.kneighbors() sigma_local = dist_to_knn[:, -1] # Distance to 7th neighbor sigma_matrix = np.outer(sigma_local, sigma_local) W_local = np.exp(-distances**2 / np.maximum(sigma_matrix, 1e-10)) np.fill_diagonal(W_local, 0) labels_local = sc_precomputed.fit_predict(W_local) ari_local = adjusted_rand_score(y_true, labels_local) print(f"\nGaussian kernel (σ={sigma:.3f}): ARI = {ari_gauss:.4f}") print(f"Local scaling kernel: ARI = {ari_local:.4f}") def choosing_number_of_clusters(): """ Using eigenvalue analysis to help choose k. """ print("\n" + "=" * 60) print("Choosing Number of Clusters via Eigengap") print("=" * 60) from scipy.linalg import eigh from scipy.spatial.distance import cdist # Generate data with 4 clusters X, y_true = make_blobs( n_samples=200, centers=4, cluster_std=0.5, random_state=42 ) # Build affinity matrix distances = cdist(X, X) sigma = np.median(distances[distances > 0]) W = np.exp(-distances**2 / (2 * sigma**2)) np.fill_diagonal(W, 0) # Compute Laplacian d = np.sum(W, axis=1) d_inv_sqrt = 1.0 / np.sqrt(np.maximum(d, 1e-10)) D_inv_sqrt = np.diag(d_inv_sqrt) L_sym = np.eye(len(X)) - D_inv_sqrt @ W @ D_inv_sqrt # Eigendecomposition eigenvalues, _ = eigh(L_sym) print("\nFirst 10 eigenvalues:") for i in range(10): gap = eigenvalues[i+1] - eigenvalues[i] if i < 9 else 0 print(f" λ_{i+1} = {eigenvalues[i]:.6f} (gap to next: {gap:.6f})") # Find largest gap gaps = np.diff(eigenvalues[:10]) suggested_k = np.argmax(gaps[1:]) + 2 # Skip first gap (always large) print(f"\nLargest gap after λ₁ suggests k = {suggested_k}") print(f"True number of clusters: 4") # Verify by clustering for k in [2, 3, 4, 5]: sc = SpectralClustering(n_clusters=k, affinity='precomputed', random_state=42, n_init=10) labels = sc.fit_predict(W) sil = silhouette_score(X, labels) if k > 1 else 0 ari = adjusted_rand_score(y_true, labels) print(f" k={k}: Silhouette={sil:.3f}, ARI={ari:.3f}") if __name__ == "__main__": spectral_clustering_tutorial() compare_affinity_types() precomputed_affinity_example() choosing_number_of_clusters()We have developed a comprehensive understanding of the Normalized Cut objective and its role in spectral clustering:
What's Next:
With the theoretical foundation complete, we conclude with the complete spectral clustering algorithm and its limitations. We'll discuss computational considerations, failure modes, parameter sensitivity, and when to use (or not use) spectral clustering in practice.
You now understand the Normalized Cut objective—its definition, random walk interpretation, connection to normalized Laplacians, and practical implementation. NCut-based spectral clustering is the standard approach because it balances cluster connectivity rather than just cluster size, leading to more meaningful partitions in real applications.