Machine LearningSpectral Clustering

Spectral Clustering

LevelAdvanced

Duration90 mins

TopicSpectral Clustering

4 / 5

NCut Objective

Normalized Cuts: Balancing Separation and Connectivity

The Normalized Cut (NCut) objective, introduced by Shi and Malik in their seminal 2000 paper, represents a fundamental advancement in graph partitioning for clustering. While RatioCut normalizes by cluster size (number of vertices), NCut normalizes by cluster volume (sum of edge weights incident to each cluster)—a distinction with profound theoretical and practical implications.

NCut has become the standard objective in spectral clustering for several compelling reasons:

Degree-aware normalization: High-degree nodes don't dominate the clustering
Random walk interpretation: NCut measures transition probabilities between clusters
Better empirical performance: Especially for graphs with varying node degrees
Theoretical elegance: Direct connection to the normalized Laplacian's spectrum

This page develops the NCut formulation completely, from definition through to practical computation.

What You Will Learn

By the end of this page, you will understand: (1) The mathematical definition of NCut and its relationship to RatioCut, (2) Why volume normalization is often superior to cardinality normalization, (3) The random walk and Markov chain interpretation of NCut, (4) How NCut connects to the normalized Laplacian's eigenstructure, and (5) Practical algorithms for NCut-based spectral clustering (Shi-Malik, Ng-Jordan-Weiss).

The Normalized Cut Definition

Recall: The Cut and Association Measures

For a partition of vertices into sets $A$ and $B = \bar{A}$:

Cut: The total edge weight crossing between clusters $$\text{cut}(A, B) = \sum_{i \in A, j \in B} W_{ij}$$

Association (Volume): The total edge weight incident to each cluster $$\text{assoc}(A, V) = \text{vol}(A) = \sum_{i \in A} d_i = \sum_{i \in A, j \in V} W_{ij}$$

where $d_i = \sum_j W_{ij}$ is the degree of node $i$.

Note: $\text{vol}(A) = \text{cut}(A, B) + \text{cut}(A, A)$, where $\text{cut}(A, A)$ counts edges within $A$.

The Normalized Cut:

$$\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$$

This is the sum of two ratios:

Fraction of $A$'s total connectivity going to $B$
Fraction of $B$'s total connectivity going to $A$

Alternative Formulations:

NCut can be rewritten in several equivalent forms:

$$\text{NCut}(A, B) = \text{cut}(A, B) \cdot \left( \frac{1}{\text{vol}(A)} + \frac{1}{\text{vol}(B)} \right)$$

$$\text{NCut}(A, B) = \frac{\text{cut}(A, B) \cdot \text{vol}(V)}{\text{vol}(A) \cdot \text{vol}(B)}$$

(using $\text{vol}(A) + \text{vol}(B) = \text{vol}(V)$)

Relationship to Normalized Association (NAssoc):

Shi and Malik also defined: $$\text{NAssoc}(A, B) = \frac{\text{assoc}(A, A)}{\text{vol}(A)} + \frac{\text{assoc}(B, B)}{\text{vol}(B)}$$

where $\text{assoc}(A, A) = \sum_{i,j \in A} W_{ij}$ is the internal connectivity of $A$.

Key Identity: $$\text{NCut}(A, B) + \text{NAssoc}(A, B) = 2$$

This means minimizing NCut is equivalent to maximizing NAssoc—finding clusters with high internal connectivity relative to their total connectivity.

Comparison: NCut vs RatioCut
Aspect	RatioCut	NCut
Formula (2-way)	cut/(\|A\|) + cut/(\|B\|)	cut/vol(A) + cut/vol(B)
Normalization	Number of vertices	Sum of vertex degrees (volume)
Bias toward	Equal-sized clusters	Equal-volume clusters
Handles varying degrees	Poorly	Well
Related Laplacian	Unnormalized L	Normalized L_sym or L_rw
Interpretation	Number-balanced partition	Connectivity-balanced partition

When NCut Outperforms RatioCut

Consider a social network where some users are highly connected (influencers) and others have few connections. RatioCut might isolate a few high-degree influencers into tiny clusters (each is expensive to cut due to high degree). NCut prevents this by normalizing by volume—isolating a high-degree node means high volume in the denominator, penalizing the imbalance. NCut finds partitions balanced in terms of total connectivity, not just node count.

The Random Walk Interpretation

One of NCut's most elegant features is its interpretation in terms of random walks on the graph. This provides deep intuition for why NCut-based clustering works.

Random Walk on a Graph:

Define a random walk where:

At each step, the walker moves from node $i$ to a neighbor $j$ chosen with probability proportional to $W_{ij}$
Transition probability: $P_{ij} = W_{ij} / d_i = (D^{-1}W)_{ij}$

The matrix $P = D^{-1}W$ is the transition matrix—row-stochastic with $\sum_j P_{ij} = 1$.

Stationary Distribution:

For a connected, non-bipartite graph, the random walk has a unique stationary distribution: $$\pi_i = \frac{d_i}{\text{vol}(V)}$$

Nodes are visited in proportion to their degree. This is intuitive: a random walker spends more time at well-connected nodes.

NCut as Escape Probability:

The NCut has a beautiful random walk interpretation:

$$\frac{\text{cut}(A, B)}{\text{vol}(A)} = P(\text{transition from } A \text{ to } B | \text{currently in } A)$$

This is the probability that a random walker currently in cluster $A$ will transition to cluster $B$ in one step.

Proof of the Escape Probability Interpretation:

The probability of being at node $i \in A$ (under stationary distribution) is $\pi_i = d_i / \text{vol}(V)$.

Conditional on being in $A$, the probability of being at $i$ is: $$P(\text{at } i | \text{in } A) = \frac{d_i / \text{vol}(V)}{\sum_{k \in A} d_k / \text{vol}(V)} = \frac{d_i}{\text{vol}(A)}$$

The probability of transitioning from $i$ to $B$ is $\sum_{j \in B} P_{ij} = \sum_{j \in B} W_{ij}/d_i$.

Thus: $$P(\text{transition to } B | \text{in } A) = \sum_{i \in A} \frac{d_i}{\text{vol}(A)} \cdot \frac{\sum_{j \in B} W_{ij}}{d_i} = \frac{\text{cut}(A, B)}{\text{vol}(A)}$$

NCut Interpretation:

Since $\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$:

$$\text{NCut}(A, B) = P(A \to B) + P(B \to A)$$

Minimizing NCut means finding partitions where random walks are unlikely to escape from either cluster!

This is precisely what we want: a good cluster traps random walkers, keeping them bouncing within the cluster rather than wandering to other clusters.

random_walk_ncut.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.datasets import make_moons
 
def compute_transition_matrix(W):
    """
    Compute the random walk transition matrix P = D^{-1} W.
    
    P[i,j] = probability of walking from node i to node j
           = W[i,j] / sum_k W[i,k]
    """
    d = np.sum(W, axis=1)
    # Handle isolated nodes
    d_inv = np.zeros_like(d)
    d_inv[d > 0] = 1.0 / d[d > 0]
    
    P = np.diag(d_inv) @ W
    return P, d
 
 
def verify_ncut_random_walk_interpretation(W, cluster_labels):
    """
    Verify that NCut equals the sum of transition probabilities
    between clusters under the random walk model.
    
    Parameters:
    -----------
    W : ndarray, shape (n, n)
        Symmetric weight matrix
    cluster_labels : ndarray, shape (n,)
        Binary cluster assignments (0 or 1)
    
    Returns:
    --------
    ncut : float
        Normalized cut value
    escape_probs : dict
        Escape probabilities P(A→B) and P(B→A)
    """
    P, d = compute_transition_matrix(W)
    n = len(cluster_labels)
    
    # Identify clusters
    A_mask = cluster_labels == 0
    B_mask = cluster_labels == 1
    
    A_indices = np.where(A_mask)[0]
    B_indices = np.where(B_mask)[0]
    
    # Compute volumes
    vol_A = np.sum(d[A_mask])
    vol_B = np.sum(d[B_mask])
    
    # Compute cut(A, B)
    cut_AB = 0.0
    for i in A_indices:
        for j in B_indices:
            cut_AB += W[i, j]
    
    # NCut formula
    ncut = cut_AB / vol_A + cut_AB / vol_B
    
    # Random walk interpretation:
    # P(A → B) = sum_i [P(at i | in A) * P(transition to B | at i)]
    #          = sum_i [(d_i/vol_A) * (sum_{j in B} W[i,j]/d_i)]
    #          = sum_i (sum_{j in B} W[i,j]) / vol_A
    #          = cut(A,B) / vol_A
    
    P_A_to_B = cut_AB / vol_A
    P_B_to_A = cut_AB / vol_B
    
    return {
        'ncut': ncut,
        'cut_AB': cut_AB,
        'vol_A': vol_A,
        'vol_B': vol_B,
        'P_A_to_B': P_A_to_B,
        'P_B_to_A': P_B_to_A,
        'ncut_equals_sum': np.isclose(ncut, P_A_to_B + P_B_to_A)
    }
 
 
def simulate_random_walk(P, start_cluster, cluster_labels, n_steps=10000):
    """
    Simulate a random walk and estimate escape probability empirically.
    
    Parameters:
    -----------
    P : ndarray
        Transition matrix
    start_cluster : int
        Starting cluster (0 or 1)
    cluster_labels : ndarray
        Cluster assignments
    n_steps : int
        Number of random walk steps
    
    Returns:
    --------
    escape_prob : float
        Empirical probability of escaping to the other cluster
    """
    n = len(cluster_labels)
    
    # Start at a random node in the starting cluster
    start_nodes = np.where(cluster_labels == start_cluster)[0]
    
    escapes = 0
    in_cluster_steps = 0
    
    # Multiple random walks
    n_walks = 100
    steps_per_walk = n_steps // n_walks
    
    for _ in range(n_walks):
        current = np.random.choice(start_nodes)
        
        for step in range(steps_per_walk):
            if cluster_labels[current] == start_cluster:
                in_cluster_steps += 1
                # Take a step
                probs = P[current]
                if probs.sum() > 0:
                    next_node = np.random.choice(n, p=probs)
                    if cluster_labels[next_node] != start_cluster:
                        escapes += 1
                    current = next_node
    
    escape_prob = escapes / in_cluster_steps if in_cluster_steps > 0 else 0
    return escape_prob
 
 
def demonstrate_random_walk_ncut():
    """
    Demonstrate the random walk interpretation of NCut.
    """
    np.random.seed(42)
    
    # Generate two moons data
    X, y_true = make_moons(n_samples=100, noise=0.05, random_state=42)
    
    # Build similarity graph
    distances = cdist(X, X)
    sigma = 0.3
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    print("Random Walk Interpretation of NCut")
    print("=" * 55)
    
    # Compute NCut for true clustering
    result = verify_ncut_random_walk_interpretation(W, y_true)
    
    print(f"\nTrue clustering NCut analysis:")
    print(f"  cut(A, B) = {result['cut_AB']:.4f}")
    print(f"  vol(A) = {result['vol_A']:.4f}")
    print(f"  vol(B) = {result['vol_B']:.4f}")
    print(f"  NCut = {result['ncut']:.6f}")
    print(f"\nRandom walk escape probabilities:")
    print(f"  P(A → B) = cut/vol(A) = {result['P_A_to_B']:.6f}")
    print(f"  P(B → A) = cut/vol(B) = {result['P_B_to_A']:.6f}")
    print(f"  NCut = P(A→B) + P(B→A): {result['ncut_equals_sum']}")
    
    # Simulate random walks to verify empirically
    P, _ = compute_transition_matrix(W)
    
    print(f"\nEmpirical verification (10000 random walk steps):")
    emp_P_A_to_B = simulate_random_walk(P, 0, y_true)
    emp_P_B_to_A = simulate_random_walk(P, 1, y_true)
    print(f"  Simulated P(A → B) ≈ {emp_P_A_to_B:.6f}")
    print(f"  Simulated P(B → A) ≈ {emp_P_B_to_A:.6f}")
    print(f"  Theoretical: {result['P_A_to_B']:.6f}, {result['P_B_to_A']:.6f}")
    
    # Compare with a bad clustering
    bad_labels = np.random.randint(0, 2, size=len(y_true))
    result_bad = verify_ncut_random_walk_interpretation(W, bad_labels)
    
    print(f"\nComparison with random clustering:")
    print(f"  True clustering NCut: {result['ncut']:.6f}")
    print(f"  Random clustering NCut: {result_bad['ncut']:.6f}")
    print(f"  → True clustering has lower NCut (random walks trapped better)")
    
    return result
 
 
if __name__ == "__main__":
    demonstrate_random_walk_ncut()

Intuition: Trapping Random Walkers

Think of each cluster as a room and edges between clusters as doors. NCut measures how easily a random walker can wander from one room to another. Small NCut means: (1) Doors between rooms are narrow (low cut), (2) Each room is internally well-connected (high volume), (3) Random walkers stay trapped in rooms for a long time. This is exactly the intuition of a 'good cluster': cohesive internally, well-separated externally.

NCut and the Normalized Laplacian

The connection between NCut and the normalized Laplacian parallels the RatioCut-unnormalized Laplacian connection, but requires careful handling of the volume weighting.

NCut Matrix Formulation:

For a partition into $A$ and $B$, define:

$$f_i = \begin{cases} \sqrt{\text{vol}(B) / \text{vol}(A)} & \text{if } i \in A \ -\sqrt{\text{vol}(A) / \text{vol}(B)} & \text{if } i \in B \end{cases}$$

Unlike RatioCut, this scaling uses volume instead of cardinality.

Key Properties of This Encoding:

$f^T D \mathbf{1} = 0$: The vector is $D$-orthogonal to constants
$f^T D f = \text{vol}(V)$: Specific normalization under $D$-inner product

The Connection:

$$f^T L f = \text{vol}(V) \cdot \text{NCut}(A, B)$$

where $L = D - W$ is the unnormalized Laplacian.

To convert to a standard eigenvalue problem, substitute $g = D^{1/2} f$:

$$g^T L_{sym} g = \text{vol}(V) \cdot \text{NCut}(A, B)$$

where $L_{sym} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} W D^{-1/2}$.

The Optimization Problem:

Minimizing NCut becomes:

$$\min_{g \in \mathbb{R}^n} g^T L_{sym} g \quad \text{subject to } g \perp D^{1/2}\mathbf{1}, g^T g = \text{vol}(V), g \text{ takes two values}$$

Spectral Relaxation:

Dropping the discrete constraint:

$$\min_{g \in \mathbb{R}^n} \frac{g^T L_{sym} g}{g^T g} \quad \text{subject to } g \perp D^{1/2}\mathbf{1}$$

The solution is the second eigenvector of $L_{sym}$.

Back-Transformation:

The eigenvector $g_2$ of $L_{sym}$ relates to a vector $f = D^{-1/2} g_2$, which is the second eigenvector of the random walk Laplacian $L_{rw} = I - D^{-1}W$.

Summary of Connections:

Objective	Laplacian	Eigenvector gives
RatioCut	$L = D - W$	Indicator scaled by $\sqrt{
NCut	$L_{sym} = I - D^{-1/2}WD^{-1/2}$	Indicator scaled by $\sqrt{\text{vol}(B)/\text{vol}(A)}$, transformed by $D^{1/2}$

ncut_spectral_connection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import numpy as np
from scipy.linalg import eigh
 
def demonstrate_ncut_spectral_connection():
    """
    Demonstrate the connection between NCut and normalized Laplacian
    eigenvectors.
    """
    np.random.seed(42)
    
    # Create a simple graph: two cliques weakly connected
    # Clique 1: nodes 0-4, Clique 2: nodes 5-9
    n = 10
    W = np.zeros((n, n))
    
    # Clique 1 (fully connected)
    for i in range(5):
        for j in range(i+1, 5):
            W[i, j] = W[j, i] = 1.0
    
    # Clique 2 (fully connected)
    for i in range(5, 10):
        for j in range(i+1, 10):
            W[i, j] = W[j, i] = 1.0
    
    # Weak connection between cliques
    W[4, 5] = W[5, 4] = 0.1
    
    # Compute degrees and Laplacians
    d = np.sum(W, axis=1)
    D = np.diag(d)
    L = D - W
    
    # Normalized Laplacian L_sym
    d_inv_sqrt = 1.0 / np.sqrt(d)
    D_inv_sqrt = np.diag(d_inv_sqrt)
    L_sym = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt
    
    # Eigendecomposition of L_sym
    eigenvalues, eigenvectors = eigh(L_sym)
    
    print("NCut and Normalized Laplacian Connection")
    print("=" * 55)
    print("\nGraph: Two 5-node cliques connected by edge weight 0.1")
    print(f"Degrees: {d}")
    print(f"\nFirst 4 eigenvalues of L_sym: {eigenvalues[:4].round(6)}")
    
    # Define true partition
    A = np.array([0, 1, 2, 3, 4])
    B = np.array([5, 6, 7, 8, 9])
    
    # Compute NCut directly
    cut_AB = W[4, 5]  # Only edge between cliques
    vol_A = np.sum(d[:5])
    vol_B = np.sum(d[5:])
    ncut_direct = cut_AB / vol_A + cut_AB / vol_B
    
    print(f"\nDirect NCut computation:")
    print(f"  cut(A, B) = {cut_AB}")
    print(f"  vol(A) = {vol_A}, vol(B) = {vol_B}")
    print(f"  NCut = {ncut_direct:.6f}")
    
    # Verify via quadratic form
    # Create the indicator vector f with volume weighting
    f = np.zeros(n)
    f[:5] = np.sqrt(vol_B / vol_A)
    f[5:] = -np.sqrt(vol_A / vol_B)
    
    # Check f^T L f = vol(V) * NCut
    vol_V = vol_A + vol_B
    quadratic_form = f @ L @ f
    ncut_from_qf = quadratic_form / vol_V
    
    print(f"\nQuadratic form verification:")
    print(f"  f^T L f = {quadratic_form:.6f}")
    print(f"  vol(V) = {vol_V}")
    print(f"  NCut from quadratic form = {ncut_from_qf:.6f}")
    print(f"  Match: {np.isclose(ncut_direct, ncut_from_qf)}")
    
    # Analyze second eigenvector
    g2 = eigenvectors[:, 1]
    print(f"\nSecond eigenvector of L_sym (Fiedler vector):")
    print(f"  Values: {g2.round(4)}")
    print(f"  Signs: {np.sign(g2).astype(int)}")
    print("  → Clear separation: positive for clique 1, negative for clique 2")
    
    # Transform back to get f = D^{-1/2} g
    f_from_eig = D_inv_sqrt @ g2
    print(f"\nTransformed vector D^{{-1/2}} g2:")
    print(f"  Values: {f_from_eig.round(4)}")
    
    return eigenvalues, eigenvectors, W
 
 
if __name__ == "__main__":
    demonstrate_ncut_spectral_connection()

Practical Spectral Clustering Algorithms

Two major algorithmic variants implement spectral clustering for NCut: the Shi-Malik algorithm and the Ng-Jordan-Weiss (NJW) algorithm. Both use the normalized Laplacian but differ in eigenvector handling.

Shi-Malik Algorithm (2000):

Construct similarity graph with weight matrix $W$
Compute unnormalized Laplacian $L = D - W$
Solve the generalized eigenvalue problem $Lv = \lambda Dv$
Take the eigenvector $v_2$ corresponding to second-smallest eigenvalue
For two clusters: assign based on sign of $v_2$
For $k$ clusters: use $v_2, \ldots, v_{k+1}$, then K-Means

Notes on Shi-Malik:

The generalized eigenvalue problem $Lv = \lambda Dv$ is equivalent to finding eigenvectors of $L_{rw} = D^{-1}L = I - D^{-1}W$
Eigenvectors are not orthogonal in the standard sense, but $D$-orthogonal
Directly minimizes NCut (with relaxation)

Ng-Jordan-Weiss (NJW) Algorithm (2001):

Construct similarity graph with weight matrix $W$
Compute normalized Laplacian $L_{sym} = I - D^{-1/2}WD^{-1/2}$
Find the $k$ eigenvectors $u_1, \ldots, u_k$ of $L_{sym}$ with smallest eigenvalues
Form matrix $U \in \mathbb{R}^{n \times k}$ with columns $u_1, \ldots, u_k$
Row-normalize $U$: form $T$ where $T_{ij} = U_{ij} / (\sum_l U_{il}^2)^{1/2}$
Treat each row of $T$ as a point in $\mathbb{R}^k$; cluster with K-Means

The Critical Row-Normalization Step:

Row normalization projects each point onto the unit sphere. This step is crucial for NJW's success:

For ideal clusters (disconnected components), each row of $U$ is a standard basis vector after normalization
Points in the same cluster map to the same point on the sphere
Perturbations from weak inter-cluster edges spread points slightly
K-Means on the sphere finds the original clusters

spectral_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
import numpy as np
from scipy.linalg import eigh
from scipy.sparse.linalg import eigsh
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
 
class SpectralClusteringAlgorithms:
    """
    Implementation of major spectral clustering algorithms.
    """
    
    def __init__(self, W):
        """
        Initialize with weight matrix W.
        
        Parameters:
        -----------
        W : ndarray, shape (n, n)
            Symmetric non-negative weight matrix
        """
        self.W = W
        self.n = W.shape[0]
        self.d = np.sum(W, axis=1)
        self.D = np.diag(self.d)
        
    def shi_malik(self, n_clusters=2):
        """
        Shi-Malik algorithm using generalized eigenvalue problem.
        
        Solves: Lv = λDv
        Equivalent to finding eigenvectors of L_rw = D^{-1}L
        
        Returns:
        --------
        labels : ndarray
            Cluster assignments
        """
        # Unnormalized Laplacian
        L = self.D - self.W
        
        # Solve generalized eigenvalue problem Lv = λDv
        # This is equivalent to eigenvectors of D^{-1}L
        # scipy.linalg.eigh can handle generalized problems
        eigenvalues, eigenvectors = eigh(L, self.D)
        
        # Take first k eigenvectors (smallest eigenvalues)
        U = eigenvectors[:, :n_clusters]
        
        # K-Means on embedding
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        labels = kmeans.fit_predict(U)
        
        return labels, eigenvalues[:n_clusters+2], U
    
    def ng_jordan_weiss(self, n_clusters=2):
        """
        Ng-Jordan-Weiss algorithm with row normalization.
        
        Uses symmetric normalized Laplacian L_sym with 
        row-normalized eigenvector embedding.
        
        Returns:
        --------
        labels : ndarray
            Cluster assignments
        """
        # Compute D^{-1/2}
        d_inv_sqrt = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv_sqrt[nonzero] = 1.0 / np.sqrt(self.d[nonzero])
        D_inv_sqrt = np.diag(d_inv_sqrt)
        
        # Normalized Laplacian L_sym
        L_sym = np.eye(self.n) - D_inv_sqrt @ self.W @ D_inv_sqrt
        
        # Eigendecomposition
        eigenvalues, eigenvectors = eigh(L_sym)
        
        # Take first k eigenvectors
        U = eigenvectors[:, :n_clusters]
        
        # Row normalization (critical step!)
        row_norms = np.linalg.norm(U, axis=1, keepdims=True)
        row_norms[row_norms == 0] = 1  # Avoid division by zero
        T = U / row_norms
        
        # K-Means on normalized embedding
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        labels = kmeans.fit_predict(T)
        
        return labels, eigenvalues[:n_clusters+2], T
    
    def unnormalized_spectral(self, n_clusters=2):
        """
        Unnormalized spectral clustering (for RatioCut).
        
        Uses unnormalized Laplacian without row normalization.
        """
        L = self.D - self.W
        
        eigenvalues, eigenvectors = eigh(L)
        U = eigenvectors[:, :n_clusters]
        
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        labels = kmeans.fit_predict(U)
        
        return labels, eigenvalues[:n_clusters+2], U
 
 
def compare_algorithms():
    """
    Compare spectral clustering algorithms on various datasets.
    """
    from sklearn.datasets import make_moons, make_blobs
    from sklearn.metrics import adjusted_rand_score
    
    print("Comparison of Spectral Clustering Algorithms")
    print("=" * 60)
    
    # Dataset 1: Two moons (tests non-convex capability)
    X_moons, y_moons = make_moons(n_samples=200, noise=0.05, random_state=42)
    
    # Dataset 2: Blobs with varying sizes
    X_blobs, y_blobs = make_blobs(
        n_samples=[100, 50, 150], 
        centers=[[0, 0], [5, 0], [2.5, 4]],
        cluster_std=[0.5, 0.3, 0.8],
        random_state=42
    )
    
    datasets = [
        ("Two Moons", X_moons, y_moons, 2),
        ("Unbalanced Blobs", X_blobs, y_blobs, 3),
    ]
    
    for name, X, y_true, k in datasets:
        print(f"\n--- {name} ---")
        
        # Build similarity graph
        distances = cdist(X, X)
        sigma = np.median(distances[distances > 0]) * 0.5
        W = np.exp(-distances**2 / (2 * sigma**2))
        np.fill_diagonal(W, 0)
        
        alg = SpectralClusteringAlgorithms(W)
        
        # Shi-Malik
        labels_sm, _, _ = alg.shi_malik(k)
        ari_sm = adjusted_rand_score(y_true, labels_sm)
        
        # Ng-Jordan-Weiss
        labels_njw, _, _ = alg.ng_jordan_weiss(k)
        ari_njw = adjusted_rand_score(y_true, labels_njw)
        
        # Unnormalized
        labels_un, _, _ = alg.unnormalized_spectral(k)
        ari_un = adjusted_rand_score(y_true, labels_un)
        
        print(f"  Shi-Malik ARI: {ari_sm:.4f}")
        print(f"  Ng-Jordan-Weiss ARI: {ari_njw:.4f}")
        print(f"  Unnormalized ARI: {ari_un:.4f}")
    
    print("\n" + "=" * 60)
    print("Summary:")
    print("- NJW with row normalization is often most robust")
    print("- Shi-Malik and NJW both minimize NCut (different formulations)")
    print("- Unnormalized can be biased for varying-degree graphs")
 
 
def visualize_row_normalization_effect():
    """
    Show why row normalization matters in NJW algorithm.
    """
    np.random.seed(42)
    
    # Create data with 3 clusters of different sizes/densities
    n1, n2, n3 = 80, 40, 30
    c1 = np.random.randn(n1, 2) * 0.5
    c2 = np.random.randn(n2, 2) * 0.3 + [4, 0]
    c3 = np.random.randn(n3, 2) * 0.4 + [2, 3]
    
    X = np.vstack([c1, c2, c3])
    y_true = np.array([0]*n1 + [1]*n2 + [2]*n3)
    
    # Build graph
    distances = cdist(X, X)
    sigma = 0.8
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    d = np.sum(W, axis=1)
    d_inv_sqrt = 1.0 / np.sqrt(np.maximum(d, 1e-10))
    L_sym = np.eye(len(X)) - np.diag(d_inv_sqrt) @ W @ np.diag(d_inv_sqrt)
    
    eigenvalues, eigenvectors = eigh(L_sym)
    U = eigenvectors[:, :3]
    
    # Without row normalization
    kmeans_no_norm = KMeans(n_clusters=3, random_state=42, n_init=10)
    labels_no_norm = kmeans_no_norm.fit_predict(U)
    
    # With row normalization
    row_norms = np.linalg.norm(U, axis=1, keepdims=True)
    row_norms[row_norms == 0] = 1
    T = U / row_norms
    
    kmeans_norm = KMeans(n_clusters=3, random_state=42, n_init=10)
    labels_norm = kmeans_norm.fit_predict(T)
    
    from sklearn.metrics import adjusted_rand_score
    
    print("\nRow Normalization Effect")
    print("=" * 55)
    print(f"Without row normalization ARI: {adjusted_rand_score(y_true, labels_no_norm):.4f}")
    print(f"With row normalization ARI: {adjusted_rand_score(y_true, labels_norm):.4f}")
    print("\n→ Row normalization often improves results, especially")
    print("   for clusters of different sizes/densities")
 
 
if __name__ == "__main__":
    compare_algorithms()
    visualize_row_normalization_effect()

Spectral Clustering Algorithm Comparison
Algorithm	Laplacian Used	Key Step	Best For
Unnormalized	L = D - W	Direct eigenvectors	Uniform degree graphs
Shi-Malik	L_rw = D⁻¹L or generalized L, D	Generalized eigenvalue problem	NCut objective; varying degrees
Ng-Jordan-Weiss	L_sym = D⁻¹/²LD⁻¹/²	Row normalization of embedding	Most robust; standard choice

Practical Recommendation

For most applications, use the Ng-Jordan-Weiss algorithm with the symmetric normalized Laplacian and row normalization. This combination is: (1) Robust to varying node degrees, (2) Computationally convenient (symmetric matrix), (3) Widely implemented in libraries (scikit-learn uses this), (4) Theoretically sound (minimizes NCut). The row normalization step is often critical for good results.

Multi-Way Normalized Cut

The NCut formulation extends naturally to $k$ clusters. The multi-way NCut objective is:

$$\text{NCut}k(C_1, \ldots, C_k) = \sum{j=1}^{k} \frac{\text{cut}(C_j, \bar{C_j})}{\text{vol}(C_j)}$$

Matrix Formulation:

Using indicator vectors $h^{(j)}$ for each cluster with appropriate volume-based scaling, and collecting into matrix $H$:

$$\text{NCut}_k = \text{Tr}((H^T D H)^{-1} H^T L H)$$

With the transformation $U = D^{1/2} H (H^T D H)^{-1/2}$, this becomes:

$$\text{NCut}k = \text{Tr}(U^T L{sym} U)$$

subject to $U^T U = I$.

The relaxed minimizer is again the matrix of first $k$ eigenvectors of $L_{sym}$.

The Full NCut Spectral Clustering Pipeline:

Input: Data $X$, number of clusters $k$, similarity function
Construct similarity matrix $W$ (Gaussian kernel, KNN, etc.)
Compute $L_{sym} = I - D^{-1/2}WD^{-1/2}$
Find first $k$ eigenvectors $u_1, \ldots, u_k$ of $L_{sym}$
Form $U = [u_1 | \cdots | u_k] \in \mathbb{R}^{n \times k}$
Row-normalize: $T_{ij} = U_{ij} / |U_i|$
Cluster rows of $T$ using K-Means
Output: Cluster assignments from K-Means labels

K-Means Sensitivity in Final Step

The final K-Means step introduces variability due to random initialization. Best practices: (1) Use multiple initializations (n_init ≥ 10), (2) Use K-Means++ initialization, (3) For critical applications, consider K-Means variants like spherical K-Means or K-Medoids. The spectral embedding does the heavy lifting; K-Means just needs to identify well-separated clusters in the embedding space.

Implementation with scikit-learn

scikit-learn provides a robust implementation of spectral clustering that handles many practical details automatically.

Key Parameters:

n_clusters: Number of clusters $k$
affinity: How to construct similarity graph
- 'rbf' (default): Gaussian kernel
- 'nearest_neighbors': KNN graph
- 'precomputed': Provide your own similarity matrix
gamma: Kernel coefficient for RBF (= $1/(2\sigma^2)$)
n_neighbors: Number of neighbors for KNN affinity
eigen_solver: Algorithm for eigendecomposition
assign_labels: Method for cluster assignment ('kmeans' or 'discretize')

sklearn_spectral_clustering.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
import numpy as np
from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons, make_circles, make_blobs
from sklearn.metrics import adjusted_rand_score, silhouette_score
from sklearn.preprocessing import StandardScaler
import warnings
 
def spectral_clustering_tutorial():
    """
    Comprehensive tutorial on using scikit-learn's SpectralClustering.
    """
    print("scikit-learn Spectral Clustering Tutorial")
    print("=" * 60)
    
    # Generate challenging datasets
    np.random.seed(42)
    
    # Two moons - classic spectral clustering showcase
    X_moons, y_moons = make_moons(n_samples=300, noise=0.05)
    
    # Two circles (concentric)
    X_circles, y_circles = make_circles(n_samples=300, noise=0.05, factor=0.5)
    
    # Standard Gaussian blobs (for comparison)
    X_blobs, y_blobs = make_blobs(n_samples=300, centers=3, cluster_std=0.5)
    
    datasets = [
        ("Two Moons", X_moons, y_moons, 2),
        ("Concentric Circles", X_circles, y_circles, 2),
        ("Gaussian Blobs", X_blobs, y_blobs, 3),
    ]
    
    for name, X, y_true, n_clusters in datasets:
        print(f"\n--- {name} (k={n_clusters}) ---")
        
        # Basic usage with RBF kernel
        sc_rbf = SpectralClustering(
            n_clusters=n_clusters,
            affinity='rbf',
            gamma=1.0,  # Adjust based on data scale
            random_state=42,
            n_init=10
        )
        labels_rbf = sc_rbf.fit_predict(X)
        ari_rbf = adjusted_rand_score(y_true, labels_rbf)
        
        # With nearest neighbors affinity
        sc_knn = SpectralClustering(
            n_clusters=n_clusters,
            affinity='nearest_neighbors',
            n_neighbors=10,
            random_state=42,
            n_init=10
        )
        labels_knn = sc_knn.fit_predict(X)
        ari_knn = adjusted_rand_score(y_true, labels_knn)
        
        print(f"  RBF kernel (gamma=1.0):  ARI = {ari_rbf:.4f}")
        print(f"  KNN (n_neighbors=10):    ARI = {ari_knn:.4f}")
 
 
def compare_affinity_types():
    """
    Compare different affinity/similarity graph constructions.
    """
    print("\n" + "=" * 60)
    print("Comparing Affinity Types on Two Moons")
    print("=" * 60)
    
    X, y_true = make_moons(n_samples=300, noise=0.05, random_state=42)
    
    # Test different gamma values for RBF
    print("\nRBF kernel with different gamma values:")
    for gamma in [0.1, 0.5, 1.0, 5.0, 10.0]:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            sc = SpectralClustering(
                n_clusters=2, affinity='rbf', gamma=gamma, 
                random_state=42, n_init=10
            )
            labels = sc.fit_predict(X)
        ari = adjusted_rand_score(y_true, labels)
        print(f"  gamma={gamma:5.1f}: ARI = {ari:.4f}")
    
    # Test different n_neighbors for KNN
    print("\nKNN with different n_neighbors:")
    for k in [3, 5, 10, 20, 50]:
        sc = SpectralClustering(
            n_clusters=2, affinity='nearest_neighbors', n_neighbors=k,
            random_state=42, n_init=10
        )
        labels = sc.fit_predict(X)
        ari = adjusted_rand_score(y_true, labels)
        print(f"  k={k:3d}: ARI = {ari:.4f}")
 
 
def precomputed_affinity_example():
    """
    Example using a precomputed similarity matrix.
    
    Useful when:
    - You have a custom similarity measure
    - Data is already a distance/similarity matrix
    - You want full control over graph construction
    """
    print("\n" + "=" * 60)
    print("Using Precomputed Affinity Matrix")
    print("=" * 60)
    
    from scipy.spatial.distance import cdist
    
    # Generate data
    X, y_true = make_moons(n_samples=200, noise=0.05, random_state=42)
    
    # Method 1: Gaussian kernel manually
    distances = cdist(X, X)
    sigma = np.median(distances[distances > 0]) * 0.5  # Heuristic
    W_gaussian = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W_gaussian, 0)
    
    sc_precomputed = SpectralClustering(
        n_clusters=2,
        affinity='precomputed',
        random_state=42,
        n_init=10
    )
    labels_gauss = sc_precomputed.fit_predict(W_gaussian)
    ari_gauss = adjusted_rand_score(y_true, labels_gauss)
    
    # Method 2: Local scaling kernel (Zelnik-Manor & Perona)
    from sklearn.neighbors import NearestNeighbors
    nn = NearestNeighbors(n_neighbors=8)
    nn.fit(X)
    dist_to_knn, _ = nn.kneighbors()
    sigma_local = dist_to_knn[:, -1]  # Distance to 7th neighbor
    
    sigma_matrix = np.outer(sigma_local, sigma_local)
    W_local = np.exp(-distances**2 / np.maximum(sigma_matrix, 1e-10))
    np.fill_diagonal(W_local, 0)
    
    labels_local = sc_precomputed.fit_predict(W_local)
    ari_local = adjusted_rand_score(y_true, labels_local)
    
    print(f"\nGaussian kernel (σ={sigma:.3f}): ARI = {ari_gauss:.4f}")
    print(f"Local scaling kernel: ARI = {ari_local:.4f}")
 
 
def choosing_number_of_clusters():
    """
    Using eigenvalue analysis to help choose k.
    """
    print("\n" + "=" * 60)
    print("Choosing Number of Clusters via Eigengap")
    print("=" * 60)
    
    from scipy.linalg import eigh
    from scipy.spatial.distance import cdist
    
    # Generate data with 4 clusters
    X, y_true = make_blobs(
        n_samples=200, centers=4, cluster_std=0.5, random_state=42
    )
    
    # Build affinity matrix
    distances = cdist(X, X)
    sigma = np.median(distances[distances > 0])
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    # Compute Laplacian
    d = np.sum(W, axis=1)
    d_inv_sqrt = 1.0 / np.sqrt(np.maximum(d, 1e-10))
    D_inv_sqrt = np.diag(d_inv_sqrt)
    L_sym = np.eye(len(X)) - D_inv_sqrt @ W @ D_inv_sqrt
    
    # Eigendecomposition
    eigenvalues, _ = eigh(L_sym)
    
    print("\nFirst 10 eigenvalues:")
    for i in range(10):
        gap = eigenvalues[i+1] - eigenvalues[i] if i < 9 else 0
        print(f"  λ_{i+1} = {eigenvalues[i]:.6f}  (gap to next: {gap:.6f})")
    
    # Find largest gap
    gaps = np.diff(eigenvalues[:10])
    suggested_k = np.argmax(gaps[1:]) + 2  # Skip first gap (always large)
    
    print(f"\nLargest gap after λ₁ suggests k = {suggested_k}")
    print(f"True number of clusters: 4")
    
    # Verify by clustering
    for k in [2, 3, 4, 5]:
        sc = SpectralClustering(n_clusters=k, affinity='precomputed', 
                               random_state=42, n_init=10)
        labels = sc.fit_predict(W)
        sil = silhouette_score(X, labels) if k > 1 else 0
        ari = adjusted_rand_score(y_true, labels)
        print(f"  k={k}: Silhouette={sil:.3f}, ARI={ari:.3f}")
 
 
if __name__ == "__main__":
    spectral_clustering_tutorial()
    compare_affinity_types()
    precomputed_affinity_example()
    choosing_number_of_clusters()

Summary: NCut Objective

We have developed a comprehensive understanding of the Normalized Cut objective and its role in spectral clustering:

Key Takeaways

•NCut Definition — Normalizes cut by cluster volume (sum of degrees), not cardinality. This prevents bias toward isolating high-degree nodes.
•Random Walk Interpretation — NCut equals the sum of escape probabilities: the probability a random walker leaves each cluster. Minimizing NCut traps random walkers.
•Normalized Laplacian Connection — NCut minimization leads to eigenvectors of L_sym or solutions to the generalized eigenvalue problem Lv = λDv.
•Practical Algorithms — Shi-Malik uses generalized eigenvalue problems; Ng-Jordan-Weiss uses L_sym with row normalization. NJW is the standard choice.
•Row Normalization — Critical for robustness, especially with clusters of varying sizes/densities. Projects embedding onto unit sphere.
•scikit-learn Implementation — Provides easy access with flexible affinity options (RBF, KNN, precomputed).

What's Next:

With the theoretical foundation complete, we conclude with the complete spectral clustering algorithm and its limitations. We'll discuss computational considerations, failure modes, parameter sensitivity, and when to use (or not use) spectral clustering in practice.

Page Complete

You now understand the Normalized Cut objective—its definition, random walk interpretation, connection to normalized Laplacians, and practical implementation. NCut-based spectral clustering is the standard approach because it balances cluster connectivity rather than just cluster size, leading to more meaningful partitions in real applications.

4 / 5

Loading learning content...

Machine LearningSpectral Clustering

Spectral Clustering

LevelAdvanced

Duration90 mins

TopicSpectral Clustering

4 / 5

NCut Objective

Normalized Cuts: Balancing Separation and Connectivity

NCut has become the standard objective in spectral clustering for several compelling reasons:

Degree-aware normalization: High-degree nodes don't dominate the clustering
Random walk interpretation: NCut measures transition probabilities between clusters
Better empirical performance: Especially for graphs with varying node degrees
Theoretical elegance: Direct connection to the normalized Laplacian's spectrum

This page develops the NCut formulation completely, from definition through to practical computation.

What You Will Learn

The Normalized Cut Definition

Recall: The Cut and Association Measures

For a partition of vertices into sets $A$ and $B = \bar{A}$:

Cut: The total edge weight crossing between clusters $$\text{cut}(A, B) = \sum_{i \in A, j \in B} W_{ij}$$

Association (Volume): The total edge weight incident to each cluster $$\text{assoc}(A, V) = \text{vol}(A) = \sum_{i \in A} d_i = \sum_{i \in A, j \in V} W_{ij}$$

where $d_i = \sum_j W_{ij}$ is the degree of node $i$.

Note: $\text{vol}(A) = \text{cut}(A, B) + \text{cut}(A, A)$, where $\text{cut}(A, A)$ counts edges within $A$.

The Normalized Cut:

$$\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$$

This is the sum of two ratios:

Fraction of $A$'s total connectivity going to $B$
Fraction of $B$'s total connectivity going to $A$

Alternative Formulations:

NCut can be rewritten in several equivalent forms:

$$\text{NCut}(A, B) = \text{cut}(A, B) \cdot \left( \frac{1}{\text{vol}(A)} + \frac{1}{\text{vol}(B)} \right)$$

$$\text{NCut}(A, B) = \frac{\text{cut}(A, B) \cdot \text{vol}(V)}{\text{vol}(A) \cdot \text{vol}(B)}$$

(using $\text{vol}(A) + \text{vol}(B) = \text{vol}(V)$)

Relationship to Normalized Association (NAssoc):

Shi and Malik also defined: $$\text{NAssoc}(A, B) = \frac{\text{assoc}(A, A)}{\text{vol}(A)} + \frac{\text{assoc}(B, B)}{\text{vol}(B)}$$

where $\text{assoc}(A, A) = \sum_{i,j \in A} W_{ij}$ is the internal connectivity of $A$.

Key Identity: $$\text{NCut}(A, B) + \text{NAssoc}(A, B) = 2$$

This means minimizing NCut is equivalent to maximizing NAssoc—finding clusters with high internal connectivity relative to their total connectivity.

Comparison: NCut vs RatioCut
Aspect	RatioCut	NCut
Formula (2-way)	cut/(\|A\|) + cut/(\|B\|)	cut/vol(A) + cut/vol(B)
Normalization	Number of vertices	Sum of vertex degrees (volume)
Bias toward	Equal-sized clusters	Equal-volume clusters
Handles varying degrees	Poorly	Well
Related Laplacian	Unnormalized L	Normalized L_sym or L_rw
Interpretation	Number-balanced partition	Connectivity-balanced partition

When NCut Outperforms RatioCut

The Random Walk Interpretation

One of NCut's most elegant features is its interpretation in terms of random walks on the graph. This provides deep intuition for why NCut-based clustering works.

Random Walk on a Graph:

Define a random walk where:

At each step, the walker moves from node $i$ to a neighbor $j$ chosen with probability proportional to $W_{ij}$
Transition probability: $P_{ij} = W_{ij} / d_i = (D^{-1}W)_{ij}$

The matrix $P = D^{-1}W$ is the transition matrix—row-stochastic with $\sum_j P_{ij} = 1$.

Stationary Distribution:

For a connected, non-bipartite graph, the random walk has a unique stationary distribution: $$\pi_i = \frac{d_i}{\text{vol}(V)}$$

Nodes are visited in proportion to their degree. This is intuitive: a random walker spends more time at well-connected nodes.

NCut as Escape Probability:

The NCut has a beautiful random walk interpretation:

$$\frac{\text{cut}(A, B)}{\text{vol}(A)} = P(\text{transition from } A \text{ to } B | \text{currently in } A)$$

This is the probability that a random walker currently in cluster $A$ will transition to cluster $B$ in one step.

Proof of the Escape Probability Interpretation:

The probability of being at node $i \in A$ (under stationary distribution) is $\pi_i = d_i / \text{vol}(V)$.

Conditional on being in $A$, the probability of being at $i$ is: $$P(\text{at } i | \text{in } A) = \frac{d_i / \text{vol}(V)}{\sum_{k \in A} d_k / \text{vol}(V)} = \frac{d_i}{\text{vol}(A)}$$

The probability of transitioning from $i$ to $B$ is $\sum_{j \in B} P_{ij} = \sum_{j \in B} W_{ij}/d_i$.

Thus: $$P(\text{transition to } B | \text{in } A) = \sum_{i \in A} \frac{d_i}{\text{vol}(A)} \cdot \frac{\sum_{j \in B} W_{ij}}{d_i} = \frac{\text{cut}(A, B)}{\text{vol}(A)}$$

NCut Interpretation:

Since $\text{NCut}(A, B) = \frac{\text{cut}(A, B)}{\text{vol}(A)} + \frac{\text{cut}(A, B)}{\text{vol}(B)}$:

$$\text{NCut}(A, B) = P(A \to B) + P(B \to A)$$

Minimizing NCut means finding partitions where random walks are unlikely to escape from either cluster!

This is precisely what we want: a good cluster traps random walkers, keeping them bouncing within the cluster rather than wandering to other clusters.

random_walk_ncut.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.datasets import make_moons
 
def compute_transition_matrix(W):
    """
    Compute the random walk transition matrix P = D^{-1} W.
    
    P[i,j] = probability of walking from node i to node j
           = W[i,j] / sum_k W[i,k]
    """
    d = np.sum(W, axis=1)
    # Handle isolated nodes
    d_inv = np.zeros_like(d)
    d_inv[d > 0] = 1.0 / d[d > 0]
    
    P = np.diag(d_inv) @ W
    return P, d
 
 
def verify_ncut_random_walk_interpretation(W, cluster_labels):
    """
    Verify that NCut equals the sum of transition probabilities
    between clusters under the random walk model.
    
    Parameters:
    -----------
    W : ndarray, shape (n, n)
        Symmetric weight matrix
    cluster_labels : ndarray, shape (n,)
        Binary cluster assignments (0 or 1)
    
    Returns:
    --------
    ncut : float
        Normalized cut value
    escape_probs : dict
        Escape probabilities P(A→B) and P(B→A)
    """
    P, d = compute_transition_matrix(W)
    n = len(cluster_labels)
    
    # Identify clusters
    A_mask = cluster_labels == 0
    B_mask = cluster_labels == 1
    
    A_indices = np.where(A_mask)[0]
    B_indices = np.where(B_mask)[0]
    
    # Compute volumes
    vol_A = np.sum(d[A_mask])
    vol_B = np.sum(d[B_mask])
    
    # Compute cut(A, B)
    cut_AB = 0.0
    for i in A_indices:
        for j in B_indices:
            cut_AB += W[i, j]
    
    # NCut formula
    ncut = cut_AB / vol_A + cut_AB / vol_B
    
    # Random walk interpretation:
    # P(A → B) = sum_i [P(at i | in A) * P(transition to B | at i)]
    #          = sum_i [(d_i/vol_A) * (sum_{j in B} W[i,j]/d_i)]
    #          = sum_i (sum_{j in B} W[i,j]) / vol_A
    #          = cut(A,B) / vol_A
    
    P_A_to_B = cut_AB / vol_A
    P_B_to_A = cut_AB / vol_B
    
    return {
        'ncut': ncut,
        'cut_AB': cut_AB,
        'vol_A': vol_A,
        'vol_B': vol_B,
        'P_A_to_B': P_A_to_B,
        'P_B_to_A': P_B_to_A,
        'ncut_equals_sum': np.isclose(ncut, P_A_to_B + P_B_to_A)
    }
 
 
def simulate_random_walk(P, start_cluster, cluster_labels, n_steps=10000):
    """
    Simulate a random walk and estimate escape probability empirically.
    
    Parameters:
    -----------
    P : ndarray
        Transition matrix
    start_cluster : int
        Starting cluster (0 or 1)
    cluster_labels : ndarray
        Cluster assignments
    n_steps : int
        Number of random walk steps
    
    Returns:
    --------
    escape_prob : float
        Empirical probability of escaping to the other cluster
    """
    n = len(cluster_labels)
    
    # Start at a random node in the starting cluster
    start_nodes = np.where(cluster_labels == start_cluster)[0]
    
    escapes = 0
    in_cluster_steps = 0
    
    # Multiple random walks
    n_walks = 100
    steps_per_walk = n_steps // n_walks
    
    for _ in range(n_walks):
        current = np.random.choice(start_nodes)
        
        for step in range(steps_per_walk):
            if cluster_labels[current] == start_cluster:
                in_cluster_steps += 1
                # Take a step
                probs = P[current]
                if probs.sum() > 0:
                    next_node = np.random.choice(n, p=probs)
                    if cluster_labels[next_node] != start_cluster:
                        escapes += 1
                    current = next_node
    
    escape_prob = escapes / in_cluster_steps if in_cluster_steps > 0 else 0
    return escape_prob
 
 
def demonstrate_random_walk_ncut():
    """
    Demonstrate the random walk interpretation of NCut.
    """
    np.random.seed(42)
    
    # Generate two moons data
    X, y_true = make_moons(n_samples=100, noise=0.05, random_state=42)
    
    # Build similarity graph
    distances = cdist(X, X)
    sigma = 0.3
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    print("Random Walk Interpretation of NCut")
    print("=" * 55)
    
    # Compute NCut for true clustering
    result = verify_ncut_random_walk_interpretation(W, y_true)
    
    print(f"\nTrue clustering NCut analysis:")
    print(f"  cut(A, B) = {result['cut_AB']:.4f}")
    print(f"  vol(A) = {result['vol_A']:.4f}")
    print(f"  vol(B) = {result['vol_B']:.4f}")
    print(f"  NCut = {result['ncut']:.6f}")
    print(f"\nRandom walk escape probabilities:")
    print(f"  P(A → B) = cut/vol(A) = {result['P_A_to_B']:.6f}")
    print(f"  P(B → A) = cut/vol(B) = {result['P_B_to_A']:.6f}")
    print(f"  NCut = P(A→B) + P(B→A): {result['ncut_equals_sum']}")
    
    # Simulate random walks to verify empirically
    P, _ = compute_transition_matrix(W)
    
    print(f"\nEmpirical verification (10000 random walk steps):")
    emp_P_A_to_B = simulate_random_walk(P, 0, y_true)
    emp_P_B_to_A = simulate_random_walk(P, 1, y_true)
    print(f"  Simulated P(A → B) ≈ {emp_P_A_to_B:.6f}")
    print(f"  Simulated P(B → A) ≈ {emp_P_B_to_A:.6f}")
    print(f"  Theoretical: {result['P_A_to_B']:.6f}, {result['P_B_to_A']:.6f}")
    
    # Compare with a bad clustering
    bad_labels = np.random.randint(0, 2, size=len(y_true))
    result_bad = verify_ncut_random_walk_interpretation(W, bad_labels)
    
    print(f"\nComparison with random clustering:")
    print(f"  True clustering NCut: {result['ncut']:.6f}")
    print(f"  Random clustering NCut: {result_bad['ncut']:.6f}")
    print(f"  → True clustering has lower NCut (random walks trapped better)")
    
    return result
 
 
if __name__ == "__main__":
    demonstrate_random_walk_ncut()

Intuition: Trapping Random Walkers

NCut and the Normalized Laplacian

The connection between NCut and the normalized Laplacian parallels the RatioCut-unnormalized Laplacian connection, but requires careful handling of the volume weighting.

NCut Matrix Formulation:

For a partition into $A$ and $B$, define:

$$f_i = \begin{cases} \sqrt{\text{vol}(B) / \text{vol}(A)} & \text{if } i \in A \ -\sqrt{\text{vol}(A) / \text{vol}(B)} & \text{if } i \in B \end{cases}$$

Unlike RatioCut, this scaling uses volume instead of cardinality.

Key Properties of This Encoding:

$f^T D \mathbf{1} = 0$: The vector is $D$-orthogonal to constants
$f^T D f = \text{vol}(V)$: Specific normalization under $D$-inner product

The Connection:

$$f^T L f = \text{vol}(V) \cdot \text{NCut}(A, B)$$

where $L = D - W$ is the unnormalized Laplacian.

To convert to a standard eigenvalue problem, substitute $g = D^{1/2} f$:

$$g^T L_{sym} g = \text{vol}(V) \cdot \text{NCut}(A, B)$$

where $L_{sym} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} W D^{-1/2}$.

The Optimization Problem:

Minimizing NCut becomes:

$$\min_{g \in \mathbb{R}^n} g^T L_{sym} g \quad \text{subject to } g \perp D^{1/2}\mathbf{1}, g^T g = \text{vol}(V), g \text{ takes two values}$$

Spectral Relaxation:

Dropping the discrete constraint:

$$\min_{g \in \mathbb{R}^n} \frac{g^T L_{sym} g}{g^T g} \quad \text{subject to } g \perp D^{1/2}\mathbf{1}$$

The solution is the second eigenvector of $L_{sym}$.

Back-Transformation:

The eigenvector $g_2$ of $L_{sym}$ relates to a vector $f = D^{-1/2} g_2$, which is the second eigenvector of the random walk Laplacian $L_{rw} = I - D^{-1}W$.

Summary of Connections:

Objective	Laplacian	Eigenvector gives
RatioCut	$L = D - W$	Indicator scaled by $\sqrt{
NCut	$L_{sym} = I - D^{-1/2}WD^{-1/2}$	Indicator scaled by $\sqrt{\text{vol}(B)/\text{vol}(A)}$, transformed by $D^{1/2}$

ncut_spectral_connection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import numpy as np
from scipy.linalg import eigh
 
def demonstrate_ncut_spectral_connection():
    """
    Demonstrate the connection between NCut and normalized Laplacian
    eigenvectors.
    """
    np.random.seed(42)
    
    # Create a simple graph: two cliques weakly connected
    # Clique 1: nodes 0-4, Clique 2: nodes 5-9
    n = 10
    W = np.zeros((n, n))
    
    # Clique 1 (fully connected)
    for i in range(5):
        for j in range(i+1, 5):
            W[i, j] = W[j, i] = 1.0
    
    # Clique 2 (fully connected)
    for i in range(5, 10):
        for j in range(i+1, 10):
            W[i, j] = W[j, i] = 1.0
    
    # Weak connection between cliques
    W[4, 5] = W[5, 4] = 0.1
    
    # Compute degrees and Laplacians
    d = np.sum(W, axis=1)
    D = np.diag(d)
    L = D - W
    
    # Normalized Laplacian L_sym
    d_inv_sqrt = 1.0 / np.sqrt(d)
    D_inv_sqrt = np.diag(d_inv_sqrt)
    L_sym = np.eye(n) - D_inv_sqrt @ W @ D_inv_sqrt
    
    # Eigendecomposition of L_sym
    eigenvalues, eigenvectors = eigh(L_sym)
    
    print("NCut and Normalized Laplacian Connection")
    print("=" * 55)
    print("\nGraph: Two 5-node cliques connected by edge weight 0.1")
    print(f"Degrees: {d}")
    print(f"\nFirst 4 eigenvalues of L_sym: {eigenvalues[:4].round(6)}")
    
    # Define true partition
    A = np.array([0, 1, 2, 3, 4])
    B = np.array([5, 6, 7, 8, 9])
    
    # Compute NCut directly
    cut_AB = W[4, 5]  # Only edge between cliques
    vol_A = np.sum(d[:5])
    vol_B = np.sum(d[5:])
    ncut_direct = cut_AB / vol_A + cut_AB / vol_B
    
    print(f"\nDirect NCut computation:")
    print(f"  cut(A, B) = {cut_AB}")
    print(f"  vol(A) = {vol_A}, vol(B) = {vol_B}")
    print(f"  NCut = {ncut_direct:.6f}")
    
    # Verify via quadratic form
    # Create the indicator vector f with volume weighting
    f = np.zeros(n)
    f[:5] = np.sqrt(vol_B / vol_A)
    f[5:] = -np.sqrt(vol_A / vol_B)
    
    # Check f^T L f = vol(V) * NCut
    vol_V = vol_A + vol_B
    quadratic_form = f @ L @ f
    ncut_from_qf = quadratic_form / vol_V
    
    print(f"\nQuadratic form verification:")
    print(f"  f^T L f = {quadratic_form:.6f}")
    print(f"  vol(V) = {vol_V}")
    print(f"  NCut from quadratic form = {ncut_from_qf:.6f}")
    print(f"  Match: {np.isclose(ncut_direct, ncut_from_qf)}")
    
    # Analyze second eigenvector
    g2 = eigenvectors[:, 1]
    print(f"\nSecond eigenvector of L_sym (Fiedler vector):")
    print(f"  Values: {g2.round(4)}")
    print(f"  Signs: {np.sign(g2).astype(int)}")
    print("  → Clear separation: positive for clique 1, negative for clique 2")
    
    # Transform back to get f = D^{-1/2} g
    f_from_eig = D_inv_sqrt @ g2
    print(f"\nTransformed vector D^{{-1/2}} g2:")
    print(f"  Values: {f_from_eig.round(4)}")
    
    return eigenvalues, eigenvectors, W
 
 
if __name__ == "__main__":
    demonstrate_ncut_spectral_connection()

Practical Spectral Clustering Algorithms

Shi-Malik Algorithm (2000):

Construct similarity graph with weight matrix $W$
Compute unnormalized Laplacian $L = D - W$
Solve the generalized eigenvalue problem $Lv = \lambda Dv$
Take the eigenvector $v_2$ corresponding to second-smallest eigenvalue
For two clusters: assign based on sign of $v_2$
For $k$ clusters: use $v_2, \ldots, v_{k+1}$, then K-Means

Notes on Shi-Malik:

The generalized eigenvalue problem $Lv = \lambda Dv$ is equivalent to finding eigenvectors of $L_{rw} = D^{-1}L = I - D^{-1}W$
Eigenvectors are not orthogonal in the standard sense, but $D$-orthogonal
Directly minimizes NCut (with relaxation)

Ng-Jordan-Weiss (NJW) Algorithm (2001):

Construct similarity graph with weight matrix $W$
Compute normalized Laplacian $L_{sym} = I - D^{-1/2}WD^{-1/2}$
Find the $k$ eigenvectors $u_1, \ldots, u_k$ of $L_{sym}$ with smallest eigenvalues
Form matrix $U \in \mathbb{R}^{n \times k}$ with columns $u_1, \ldots, u_k$
Row-normalize $U$: form $T$ where $T_{ij} = U_{ij} / (\sum_l U_{il}^2)^{1/2}$
Treat each row of $T$ as a point in $\mathbb{R}^k$; cluster with K-Means

The Critical Row-Normalization Step:

Row normalization projects each point onto the unit sphere. This step is crucial for NJW's success:

For ideal clusters (disconnected components), each row of $U$ is a standard basis vector after normalization
Points in the same cluster map to the same point on the sphere
Perturbations from weak inter-cluster edges spread points slightly
K-Means on the sphere finds the original clusters

spectral_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
import numpy as np
from scipy.linalg import eigh
from scipy.sparse.linalg import eigsh
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
 
class SpectralClusteringAlgorithms:
    """
    Implementation of major spectral clustering algorithms.
    """
    
    def __init__(self, W):
        """
        Initialize with weight matrix W.
        
        Parameters:
        -----------
        W : ndarray, shape (n, n)
            Symmetric non-negative weight matrix
        """
        self.W = W
        self.n = W.shape[0]
        self.d = np.sum(W, axis=1)
        self.D = np.diag(self.d)
        
    def shi_malik(self, n_clusters=2):
        """
        Shi-Malik algorithm using generalized eigenvalue problem.
        
        Solves: Lv = λDv
        Equivalent to finding eigenvectors of L_rw = D^{-1}L
        
        Returns:
        --------
        labels : ndarray
            Cluster assignments
        """
        # Unnormalized Laplacian
        L = self.D - self.W
        
        # Solve generalized eigenvalue problem Lv = λDv
        # This is equivalent to eigenvectors of D^{-1}L
        # scipy.linalg.eigh can handle generalized problems
        eigenvalues, eigenvectors = eigh(L, self.D)
        
        # Take first k eigenvectors (smallest eigenvalues)
        U = eigenvectors[:, :n_clusters]
        
        # K-Means on embedding
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        labels = kmeans.fit_predict(U)
        
        return labels, eigenvalues[:n_clusters+2], U
    
    def ng_jordan_weiss(self, n_clusters=2):
        """
        Ng-Jordan-Weiss algorithm with row normalization.
        
        Uses symmetric normalized Laplacian L_sym with 
        row-normalized eigenvector embedding.
        
        Returns:
        --------
        labels : ndarray
            Cluster assignments
        """
        # Compute D^{-1/2}
        d_inv_sqrt = np.zeros(self.n)
        nonzero = self.d > 0
        d_inv_sqrt[nonzero] = 1.0 / np.sqrt(self.d[nonzero])
        D_inv_sqrt = np.diag(d_inv_sqrt)
        
        # Normalized Laplacian L_sym
        L_sym = np.eye(self.n) - D_inv_sqrt @ self.W @ D_inv_sqrt
        
        # Eigendecomposition
        eigenvalues, eigenvectors = eigh(L_sym)
        
        # Take first k eigenvectors
        U = eigenvectors[:, :n_clusters]
        
        # Row normalization (critical step!)
        row_norms = np.linalg.norm(U, axis=1, keepdims=True)
        row_norms[row_norms == 0] = 1  # Avoid division by zero
        T = U / row_norms
        
        # K-Means on normalized embedding
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        labels = kmeans.fit_predict(T)
        
        return labels, eigenvalues[:n_clusters+2], T
    
    def unnormalized_spectral(self, n_clusters=2):
        """
        Unnormalized spectral clustering (for RatioCut).
        
        Uses unnormalized Laplacian without row normalization.
        """
        L = self.D - self.W
        
        eigenvalues, eigenvectors = eigh(L)
        U = eigenvectors[:, :n_clusters]
        
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        labels = kmeans.fit_predict(U)
        
        return labels, eigenvalues[:n_clusters+2], U
 
 
def compare_algorithms():
    """
    Compare spectral clustering algorithms on various datasets.
    """
    from sklearn.datasets import make_moons, make_blobs
    from sklearn.metrics import adjusted_rand_score
    
    print("Comparison of Spectral Clustering Algorithms")
    print("=" * 60)
    
    # Dataset 1: Two moons (tests non-convex capability)
    X_moons, y_moons = make_moons(n_samples=200, noise=0.05, random_state=42)
    
    # Dataset 2: Blobs with varying sizes
    X_blobs, y_blobs = make_blobs(
        n_samples=[100, 50, 150], 
        centers=[[0, 0], [5, 0], [2.5, 4]],
        cluster_std=[0.5, 0.3, 0.8],
        random_state=42
    )
    
    datasets = [
        ("Two Moons", X_moons, y_moons, 2),
        ("Unbalanced Blobs", X_blobs, y_blobs, 3),
    ]
    
    for name, X, y_true, k in datasets:
        print(f"\n--- {name} ---")
        
        # Build similarity graph
        distances = cdist(X, X)
        sigma = np.median(distances[distances > 0]) * 0.5
        W = np.exp(-distances**2 / (2 * sigma**2))
        np.fill_diagonal(W, 0)
        
        alg = SpectralClusteringAlgorithms(W)
        
        # Shi-Malik
        labels_sm, _, _ = alg.shi_malik(k)
        ari_sm = adjusted_rand_score(y_true, labels_sm)
        
        # Ng-Jordan-Weiss
        labels_njw, _, _ = alg.ng_jordan_weiss(k)
        ari_njw = adjusted_rand_score(y_true, labels_njw)
        
        # Unnormalized
        labels_un, _, _ = alg.unnormalized_spectral(k)
        ari_un = adjusted_rand_score(y_true, labels_un)
        
        print(f"  Shi-Malik ARI: {ari_sm:.4f}")
        print(f"  Ng-Jordan-Weiss ARI: {ari_njw:.4f}")
        print(f"  Unnormalized ARI: {ari_un:.4f}")
    
    print("\n" + "=" * 60)
    print("Summary:")
    print("- NJW with row normalization is often most robust")
    print("- Shi-Malik and NJW both minimize NCut (different formulations)")
    print("- Unnormalized can be biased for varying-degree graphs")
 
 
def visualize_row_normalization_effect():
    """
    Show why row normalization matters in NJW algorithm.
    """
    np.random.seed(42)
    
    # Create data with 3 clusters of different sizes/densities
    n1, n2, n3 = 80, 40, 30
    c1 = np.random.randn(n1, 2) * 0.5
    c2 = np.random.randn(n2, 2) * 0.3 + [4, 0]
    c3 = np.random.randn(n3, 2) * 0.4 + [2, 3]
    
    X = np.vstack([c1, c2, c3])
    y_true = np.array([0]*n1 + [1]*n2 + [2]*n3)
    
    # Build graph
    distances = cdist(X, X)
    sigma = 0.8
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    d = np.sum(W, axis=1)
    d_inv_sqrt = 1.0 / np.sqrt(np.maximum(d, 1e-10))
    L_sym = np.eye(len(X)) - np.diag(d_inv_sqrt) @ W @ np.diag(d_inv_sqrt)
    
    eigenvalues, eigenvectors = eigh(L_sym)
    U = eigenvectors[:, :3]
    
    # Without row normalization
    kmeans_no_norm = KMeans(n_clusters=3, random_state=42, n_init=10)
    labels_no_norm = kmeans_no_norm.fit_predict(U)
    
    # With row normalization
    row_norms = np.linalg.norm(U, axis=1, keepdims=True)
    row_norms[row_norms == 0] = 1
    T = U / row_norms
    
    kmeans_norm = KMeans(n_clusters=3, random_state=42, n_init=10)
    labels_norm = kmeans_norm.fit_predict(T)
    
    from sklearn.metrics import adjusted_rand_score
    
    print("\nRow Normalization Effect")
    print("=" * 55)
    print(f"Without row normalization ARI: {adjusted_rand_score(y_true, labels_no_norm):.4f}")
    print(f"With row normalization ARI: {adjusted_rand_score(y_true, labels_norm):.4f}")
    print("\n→ Row normalization often improves results, especially")
    print("   for clusters of different sizes/densities")
 
 
if __name__ == "__main__":
    compare_algorithms()
    visualize_row_normalization_effect()

Spectral Clustering Algorithm Comparison
Algorithm	Laplacian Used	Key Step	Best For
Unnormalized	L = D - W	Direct eigenvectors	Uniform degree graphs
Shi-Malik	L_rw = D⁻¹L or generalized L, D	Generalized eigenvalue problem	NCut objective; varying degrees
Ng-Jordan-Weiss	L_sym = D⁻¹/²LD⁻¹/²	Row normalization of embedding	Most robust; standard choice

Practical Recommendation

Multi-Way Normalized Cut

The NCut formulation extends naturally to $k$ clusters. The multi-way NCut objective is:

$$\text{NCut}k(C_1, \ldots, C_k) = \sum{j=1}^{k} \frac{\text{cut}(C_j, \bar{C_j})}{\text{vol}(C_j)}$$

Matrix Formulation:

Using indicator vectors $h^{(j)}$ for each cluster with appropriate volume-based scaling, and collecting into matrix $H$:

$$\text{NCut}_k = \text{Tr}((H^T D H)^{-1} H^T L H)$$

With the transformation $U = D^{1/2} H (H^T D H)^{-1/2}$, this becomes:

$$\text{NCut}k = \text{Tr}(U^T L{sym} U)$$

subject to $U^T U = I$.

The relaxed minimizer is again the matrix of first $k$ eigenvectors of $L_{sym}$.

The Full NCut Spectral Clustering Pipeline:

Input: Data $X$, number of clusters $k$, similarity function
Construct similarity matrix $W$ (Gaussian kernel, KNN, etc.)
Compute $L_{sym} = I - D^{-1/2}WD^{-1/2}$
Find first $k$ eigenvectors $u_1, \ldots, u_k$ of $L_{sym}$
Form $U = [u_1 | \cdots | u_k] \in \mathbb{R}^{n \times k}$
Row-normalize: $T_{ij} = U_{ij} / |U_i|$
Cluster rows of $T$ using K-Means
Output: Cluster assignments from K-Means labels

K-Means Sensitivity in Final Step

Implementation with scikit-learn

scikit-learn provides a robust implementation of spectral clustering that handles many practical details automatically.

Key Parameters:

n_clusters: Number of clusters $k$
affinity: How to construct similarity graph
- 'rbf' (default): Gaussian kernel
- 'nearest_neighbors': KNN graph
- 'precomputed': Provide your own similarity matrix
gamma: Kernel coefficient for RBF (= $1/(2\sigma^2)$)
n_neighbors: Number of neighbors for KNN affinity
eigen_solver: Algorithm for eigendecomposition
assign_labels: Method for cluster assignment ('kmeans' or 'discretize')

sklearn_spectral_clustering.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
import numpy as np
from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons, make_circles, make_blobs
from sklearn.metrics import adjusted_rand_score, silhouette_score
from sklearn.preprocessing import StandardScaler
import warnings
 
def spectral_clustering_tutorial():
    """
    Comprehensive tutorial on using scikit-learn's SpectralClustering.
    """
    print("scikit-learn Spectral Clustering Tutorial")
    print("=" * 60)
    
    # Generate challenging datasets
    np.random.seed(42)
    
    # Two moons - classic spectral clustering showcase
    X_moons, y_moons = make_moons(n_samples=300, noise=0.05)
    
    # Two circles (concentric)
    X_circles, y_circles = make_circles(n_samples=300, noise=0.05, factor=0.5)
    
    # Standard Gaussian blobs (for comparison)
    X_blobs, y_blobs = make_blobs(n_samples=300, centers=3, cluster_std=0.5)
    
    datasets = [
        ("Two Moons", X_moons, y_moons, 2),
        ("Concentric Circles", X_circles, y_circles, 2),
        ("Gaussian Blobs", X_blobs, y_blobs, 3),
    ]
    
    for name, X, y_true, n_clusters in datasets:
        print(f"\n--- {name} (k={n_clusters}) ---")
        
        # Basic usage with RBF kernel
        sc_rbf = SpectralClustering(
            n_clusters=n_clusters,
            affinity='rbf',
            gamma=1.0,  # Adjust based on data scale
            random_state=42,
            n_init=10
        )
        labels_rbf = sc_rbf.fit_predict(X)
        ari_rbf = adjusted_rand_score(y_true, labels_rbf)
        
        # With nearest neighbors affinity
        sc_knn = SpectralClustering(
            n_clusters=n_clusters,
            affinity='nearest_neighbors',
            n_neighbors=10,
            random_state=42,
            n_init=10
        )
        labels_knn = sc_knn.fit_predict(X)
        ari_knn = adjusted_rand_score(y_true, labels_knn)
        
        print(f"  RBF kernel (gamma=1.0):  ARI = {ari_rbf:.4f}")
        print(f"  KNN (n_neighbors=10):    ARI = {ari_knn:.4f}")
 
 
def compare_affinity_types():
    """
    Compare different affinity/similarity graph constructions.
    """
    print("\n" + "=" * 60)
    print("Comparing Affinity Types on Two Moons")
    print("=" * 60)
    
    X, y_true = make_moons(n_samples=300, noise=0.05, random_state=42)
    
    # Test different gamma values for RBF
    print("\nRBF kernel with different gamma values:")
    for gamma in [0.1, 0.5, 1.0, 5.0, 10.0]:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            sc = SpectralClustering(
                n_clusters=2, affinity='rbf', gamma=gamma, 
                random_state=42, n_init=10
            )
            labels = sc.fit_predict(X)
        ari = adjusted_rand_score(y_true, labels)
        print(f"  gamma={gamma:5.1f}: ARI = {ari:.4f}")
    
    # Test different n_neighbors for KNN
    print("\nKNN with different n_neighbors:")
    for k in [3, 5, 10, 20, 50]:
        sc = SpectralClustering(
            n_clusters=2, affinity='nearest_neighbors', n_neighbors=k,
            random_state=42, n_init=10
        )
        labels = sc.fit_predict(X)
        ari = adjusted_rand_score(y_true, labels)
        print(f"  k={k:3d}: ARI = {ari:.4f}")
 
 
def precomputed_affinity_example():
    """
    Example using a precomputed similarity matrix.
    
    Useful when:
    - You have a custom similarity measure
    - Data is already a distance/similarity matrix
    - You want full control over graph construction
    """
    print("\n" + "=" * 60)
    print("Using Precomputed Affinity Matrix")
    print("=" * 60)
    
    from scipy.spatial.distance import cdist
    
    # Generate data
    X, y_true = make_moons(n_samples=200, noise=0.05, random_state=42)
    
    # Method 1: Gaussian kernel manually
    distances = cdist(X, X)
    sigma = np.median(distances[distances > 0]) * 0.5  # Heuristic
    W_gaussian = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W_gaussian, 0)
    
    sc_precomputed = SpectralClustering(
        n_clusters=2,
        affinity='precomputed',
        random_state=42,
        n_init=10
    )
    labels_gauss = sc_precomputed.fit_predict(W_gaussian)
    ari_gauss = adjusted_rand_score(y_true, labels_gauss)
    
    # Method 2: Local scaling kernel (Zelnik-Manor & Perona)
    from sklearn.neighbors import NearestNeighbors
    nn = NearestNeighbors(n_neighbors=8)
    nn.fit(X)
    dist_to_knn, _ = nn.kneighbors()
    sigma_local = dist_to_knn[:, -1]  # Distance to 7th neighbor
    
    sigma_matrix = np.outer(sigma_local, sigma_local)
    W_local = np.exp(-distances**2 / np.maximum(sigma_matrix, 1e-10))
    np.fill_diagonal(W_local, 0)
    
    labels_local = sc_precomputed.fit_predict(W_local)
    ari_local = adjusted_rand_score(y_true, labels_local)
    
    print(f"\nGaussian kernel (σ={sigma:.3f}): ARI = {ari_gauss:.4f}")
    print(f"Local scaling kernel: ARI = {ari_local:.4f}")
 
 
def choosing_number_of_clusters():
    """
    Using eigenvalue analysis to help choose k.
    """
    print("\n" + "=" * 60)
    print("Choosing Number of Clusters via Eigengap")
    print("=" * 60)
    
    from scipy.linalg import eigh
    from scipy.spatial.distance import cdist
    
    # Generate data with 4 clusters
    X, y_true = make_blobs(
        n_samples=200, centers=4, cluster_std=0.5, random_state=42
    )
    
    # Build affinity matrix
    distances = cdist(X, X)
    sigma = np.median(distances[distances > 0])
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)
    
    # Compute Laplacian
    d = np.sum(W, axis=1)
    d_inv_sqrt = 1.0 / np.sqrt(np.maximum(d, 1e-10))
    D_inv_sqrt = np.diag(d_inv_sqrt)
    L_sym = np.eye(len(X)) - D_inv_sqrt @ W @ D_inv_sqrt
    
    # Eigendecomposition
    eigenvalues, _ = eigh(L_sym)
    
    print("\nFirst 10 eigenvalues:")
    for i in range(10):
        gap = eigenvalues[i+1] - eigenvalues[i] if i < 9 else 0
        print(f"  λ_{i+1} = {eigenvalues[i]:.6f}  (gap to next: {gap:.6f})")
    
    # Find largest gap
    gaps = np.diff(eigenvalues[:10])
    suggested_k = np.argmax(gaps[1:]) + 2  # Skip first gap (always large)
    
    print(f"\nLargest gap after λ₁ suggests k = {suggested_k}")
    print(f"True number of clusters: 4")
    
    # Verify by clustering
    for k in [2, 3, 4, 5]:
        sc = SpectralClustering(n_clusters=k, affinity='precomputed', 
                               random_state=42, n_init=10)
        labels = sc.fit_predict(W)
        sil = silhouette_score(X, labels) if k > 1 else 0
        ari = adjusted_rand_score(y_true, labels)
        print(f"  k={k}: Silhouette={sil:.3f}, ARI={ari:.3f}")
 
 
if __name__ == "__main__":
    spectral_clustering_tutorial()
    compare_affinity_types()
    precomputed_affinity_example()
    choosing_number_of_clusters()

Summary: NCut Objective

We have developed a comprehensive understanding of the Normalized Cut objective and its role in spectral clustering:

Key Takeaways

•NCut Definition — Normalizes cut by cluster volume (sum of degrees), not cardinality. This prevents bias toward isolating high-degree nodes.
•Random Walk Interpretation — NCut equals the sum of escape probabilities: the probability a random walker leaves each cluster. Minimizing NCut traps random walkers.
•Normalized Laplacian Connection — NCut minimization leads to eigenvectors of L_sym or solutions to the generalized eigenvalue problem Lv = λDv.
•Practical Algorithms — Shi-Malik uses generalized eigenvalue problems; Ng-Jordan-Weiss uses L_sym with row normalization. NJW is the standard choice.
•Row Normalization — Critical for robustness, especially with clusters of varying sizes/densities. Projects embedding onto unit sphere.
•scikit-learn Implementation — Provides easy access with flexible affinity options (RBF, KNN, precomputed).

What's Next:

Page Complete

4 / 5