Loading learning content...
Hierarchical clustering produces a dendrogram—a complete record of how data points merge at every distance. But most downstream applications require flat clusters: a single partition where each point belongs to exactly one group. The final step in hierarchical clustering is therefore cluster extraction: cutting the dendrogram to produce flat cluster assignments.
This is both an opportunity and a challenge. The opportunity: unlike K-Means where you must choose k upfront, you can explore multiple cut points without reclustering. The challenge: with infinitely many possible cuts (any height produces a different clustering), how do you choose the "right" one?
This page develops a systematic approach to cluster extraction, from simple fixed cuts to sophisticated dynamic tree methods, along with validation techniques to assess whether your chosen cut is meaningful.
By the end of this page, you will master: distance-based cutting for domain-interpretable thresholds; k-based cutting when cluster count is known; inconsistency-based cutting for adaptive structure discovery; dynamic tree cutting for complex hierarchies; the gap statistic and silhouette analysis for optimal k selection; and practical strategies for validating and justifying your cut choice.
Formal Definition:
Given a dendrogram D over n data points and a cut criterion, cluster extraction partitions the data into k non-overlapping clusters C₁, C₂, ..., Cₖ where 1 ≤ k ≤ n.
Every horizontal cut at height h intersects some number of vertical branches. Each intersection defines a cluster: all data points below that intersection (within that subtree) are assigned to the same cluster.
The Fundamental Trade-off:
Three Perspectives on Cutting:
Distance-based: "I want all points within a cluster to be within distance d of each other" (meaningful when d has domain interpretation)
Count-based: "I need exactly k clusters for my downstream task" (imposed by application requirements)
Structure-based: "Let the data tell me where natural breaks occur" (using statistical criteria to detect gaps)
| Approach | Input | When to Use |
|---|---|---|
| Fixed distance | Height threshold h | Distance h has domain meaning (e.g., max within-cluster variance) |
| Fixed count | Target k clusters | Downstream task requires specific k (e.g., k market segments) |
| Inconsistency | Inconsistency threshold | Want adaptive cuts based on local structure |
| Gap statistic | None (automatic) | Want statistical validation of optimal k |
| Dynamic tree cut | None (automatic) | Complex hierarchies with variable branch heights |
Unlike partition-based clustering where one solution exists for each k, hierarchical clustering admits multiple perspectives on the same data. There's often no single 'correct' cut—different cuts reveal different aspects of the data structure. This flexibility is a strength when used thoughtfully and a source of confusion when not.
Distance-based cutting is the most direct approach: draw a horizontal line at height h and count intersections with the dendrogram branches.
Mathematical Definition:
For a cut at height h:
The resulting number of clusters k is: n minus the number of merges at distance ≤ h.
Interpretation by Linkage:
The meaning of the cut height depends on the linkage:
Complete linkage: h is the maximum within-cluster diameter. All points in a cluster are within distance h of each other.
Single linkage: h is the minimum gap between clusters. Any two points in different clusters are at least distance h apart.
Ward linkage: h relates to the cumulative variance increase. Harder to interpret directly.
Average linkage: h is approximately the average within-cluster pairwise distance.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
import numpy as npfrom scipy.cluster.hierarchy import linkage, fcluster, dendrogramimport matplotlib.pyplot as plt # Generate sample datanp.random.seed(42)X = np.vstack([ np.random.randn(40, 2) * 0.5 + [0, 0], np.random.randn(40, 2) * 0.5 + [4, 0], np.random.randn(40, 2) * 0.5 + [2, 4]]) Z = linkage(X, method='complete') # Explore multiple distance thresholdsthresholds = [1.0, 2.0, 3.0, 4.0, 5.0] fig, axes = plt.subplots(2, 3, figsize=(16, 10)) # Dendrogram with thresholdsax = axes[0, 0]dendrogram(Z, ax=ax, truncate_mode='lastp', p=30, no_labels=True)ax.set_title('Dendrogram with Distance Thresholds')ax.set_ylabel('Distance') colors = ['red', 'orange', 'green', 'blue', 'purple']for h, c in zip(thresholds, colors): ax.axhline(y=h, color=c, linestyle='--', alpha=0.7, linewidth=2) ax.text(ax.get_xlim()[1], h, f' h={h}', color=c, va='center') # Number of clusters vs thresholdheights = np.linspace(0.1, 8.0, 100)n_clusters = []for h in heights: labels = fcluster(Z, t=h, criterion='distance') n_clusters.append(len(np.unique(labels))) ax = axes[0, 1]ax.plot(heights, n_clusters, 'b-', linewidth=2)ax.set_xlabel('Cut Height (h)')ax.set_ylabel('Number of Clusters')ax.set_title('Clusters vs. Distance Threshold')ax.grid(True, alpha=0.3) # Highlight thresholdsfor h, c in zip(thresholds, colors): labels = fcluster(Z, t=h, criterion='distance') k = len(np.unique(labels)) ax.scatter([h], [k], c=c, s=100, zorder=5) ax.annotate(f'k={k}', (h, k), xytext=(5, 5), textcoords='offset points') # Show cluster assignments for different thresholdsfor idx, (h, c) in enumerate(zip([1.0, 2.0, 4.0], ['red', 'orange', 'blue'])): ax = axes[1, idx] labels = fcluster(Z, t=h, criterion='distance') k = len(np.unique(labels)) scatter = ax.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50) ax.set_title(f'Distance threshold h={h} → {k} clusters') ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') # Unused subplot - use for summaryax = axes[0, 2]ax.axis('off')summary = """Distance-Based Cutting: criterion='distance' in fcluster() Advantages:• Interpretable threshold• Works with any linkage• Domain-meaningful cutoff Disadvantages:• Must know meaningful h• No automatic k selection• Different k for different h"""ax.text(0.1, 0.9, summary, transform=ax.transAxes, fontsize=11, verticalalignment='top', fontfamily='monospace', bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.8)) plt.tight_layout()plt.savefig('distance_based_cutting.png', dpi=150)print("Distance-based cutting saved to distance_based_cutting.png")With complete linkage, a cut at height h guarantees that every cluster has diameter at most h. This is extremely useful in applications where you need a bound on within-cluster distances—for example, ensuring that all customers in a segment have similarity above a threshold, or that all locations in a cluster are within a maximum travel time.
When you know exactly how many clusters you need—often dictated by business requirements or downstream algorithms—count-based cutting extracts precisely k clusters.
Implementation:
To get k clusters, cut just below the (n-k+1)th highest merge height. Equivalently, use fcluster(Z, t=k, criterion='maxclust') in scipy.
Why Use Fixed k:
Application requirement: Marketing wants exactly 5 customer segments; recommender system needs 10 topic clusters
Resource constraints: Can only afford k focus groups or k product lines
Comparison: Testing multiple k values and choosing the best via validation metrics
Prior knowledge: Domain expertise suggests approximately k natural groups
Choosing k:
If k is not externally specified, several methods can help choose:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
import numpy as npfrom scipy.cluster.hierarchy import linkage, fclusterfrom sklearn.metrics import silhouette_scoreimport matplotlib.pyplot as plt # Generate data with 4 clustersnp.random.seed(42)X = np.vstack([ np.random.randn(40, 2) * 0.5 + [0, 0], np.random.randn(40, 2) * 0.5 + [4, 0], np.random.randn(40, 2) * 0.5 + [0, 4], np.random.randn(40, 2) * 0.5 + [4, 4]]) Z = linkage(X, method='ward') # Evaluate different k valuesk_values = range(2, 11)silhouette_scores = []inertias = [] # Within-cluster sum of squares for k in k_values: labels = fcluster(Z, t=k, criterion='maxclust') # Silhouette score sil = silhouette_score(X, labels) silhouette_scores.append(sil) # Within-cluster variance (inertia) inertia = 0 for cluster_id in np.unique(labels): cluster_points = X[labels == cluster_id] centroid = cluster_points.mean(axis=0) inertia += np.sum((cluster_points - centroid) ** 2) inertias.append(inertia) fig, axes = plt.subplots(2, 3, figsize=(16, 10)) # Silhouette scoresax = axes[0, 0]ax.plot(k_values, silhouette_scores, 'bo-', linewidth=2, markersize=8)ax.set_xlabel('Number of Clusters (k)')ax.set_ylabel('Silhouette Score')ax.set_title('Silhouette Score vs. k')ax.grid(True, alpha=0.3) best_k_sil = k_values[np.argmax(silhouette_scores)]ax.axvline(x=best_k_sil, color='red', linestyle='--', label=f'Best k={best_k_sil}')ax.legend() # Elbow plot (inertia)ax = axes[0, 1]ax.plot(k_values, inertias, 'go-', linewidth=2, markersize=8)ax.set_xlabel('Number of Clusters (k)')ax.set_ylabel('Within-Cluster Variance')ax.set_title('Elbow Method')ax.grid(True, alpha=0.3) # Show cluster assignments for k=2,3,4,5for idx, k in enumerate([2, 3, 4, 5]): row = 1 if idx >= 2 else 0 col = (idx % 2) + 1 if idx < 2 else idx - 2 if idx < 2: ax = axes[0, 2] if idx == 0 else axes[1, 0] else: ax = axes[1, idx - 1] if idx == 0: ax = axes[0, 2] elif idx == 1: ax = axes[1, 0] elif idx == 2: ax = axes[1, 1] else: ax = axes[1, 2] labels = fcluster(Z, t=k, criterion='maxclust') sil = silhouette_score(X, labels) ax.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50) ax.set_title(f'k={k}, Silhouette={sil:.3f}') ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') plt.tight_layout()plt.savefig('count_based_cutting.png', dpi=150)print("Count-based cutting saved to count_based_cutting.png") # Print summaryprint("=== Count-Based Cutting Analysis ===")print(f"{'k':<5} {'Silhouette':>12} {'Inertia':>15}")print("-" * 35)for k, sil, inert in zip(k_values, silhouette_scores, inertias): marker = ' <-- best' if k == best_k_sil else '' print(f"{k:<5} {sil:>12.4f} {inert:>15.2f}{marker}")Inconsistency-based cutting provides an adaptive approach that detects natural cluster boundaries by comparing each merge to its local context.
The Inconsistency Coefficient:
For each merge at height h, we compute how "inconsistent" this merge is compared to previous merges in its subtree:
$$I = \frac{h - \bar{h}{\text{below}}}{\sigma{\text{below}}}$$
where $\bar{h}{\text{below}}$ is the mean height of merges below this point (up to depth d), and $\sigma{\text{below}}$ is their standard deviation.
Interpretation:
How the Cut Works:
Cut all branches where the inconsistency coefficient exceeds threshold t. Points below unchanged merges stay together; points above severed merges split apart.
Depth Parameter:
The depth d controls how far down the tree to look when computing the baseline statistics. Larger d considers more history; smaller d is more local. Typical values: d = 2 or 3.
Imagine climbing a staircase where some steps are much taller than others. Inconsistency detects the 'unusually tall steps'—merges that happen at much greater distance than the merges just below them. These tall steps often correspond to crossing from tight clusters into the gaps between clusters.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
import numpy as npfrom scipy.cluster.hierarchy import linkage, fcluster, inconsistent, dendrogramimport matplotlib.pyplot as plt # Generate data with varying cluster tightnessnp.random.seed(42) # Tight clustertight = np.random.randn(30, 2) * 0.3 + [0, 0]# Moderate clustermoderate = np.random.randn(30, 2) * 0.6 + [4, 0]# Loose clusterloose = np.random.randn(30, 2) * 1.0 + [2, 5] X = np.vstack([tight, moderate, loose])Z = linkage(X, method='average') # Compute inconsistency coefficientsR = inconsistent(Z, d=2) # R has 4 columns:# 0: mean of heights below# 1: std of heights below # 2: number of links below# 3: inconsistency coefficient inconsistency_coeffs = R[:, 3] fig, axes = plt.subplots(2, 3, figsize=(16, 10)) # Dendrogramdendrogram(Z, ax=axes[0, 0], truncate_mode='lastp', p=25, no_labels=True)axes[0, 0].set_title('Dendrogram')axes[0, 0].set_ylabel('Distance') # Inconsistency coefficientsax = axes[0, 1]merge_indices = np.arange(1, len(inconsistency_coeffs) + 1)ax.plot(merge_indices, inconsistency_coeffs, 'b-', linewidth=1)ax.scatter(merge_indices, inconsistency_coeffs, c='blue', s=15)ax.set_xlabel('Merge Index')ax.set_ylabel('Inconsistency Coefficient')ax.set_title('Inconsistency Coefficients per Merge')ax.grid(True, alpha=0.3) # Show thresholdsfor t, c in [(0.7, 'green'), (1.0, 'orange'), (1.5, 'red')]: ax.axhline(y=t, color=c, linestyle='--', alpha=0.7, label=f't={t}')ax.legend() # Inconsistency histogramax = axes[0, 2]ax.hist(inconsistency_coeffs, bins=20, color='steelblue', edgecolor='black', alpha=0.7)ax.set_xlabel('Inconsistency Coefficient')ax.set_ylabel('Count')ax.set_title('Distribution of Inconsistency Coefficients')ax.axvline(x=1.0, color='red', linestyle='--', linewidth=2, label='t=1.0')ax.legend() # Show cuts at different thresholdsthresholds = [0.7, 1.0, 1.5]for idx, t in enumerate(thresholds): labels = fcluster(Z, t=t, criterion='inconsistent', depth=2) k = len(np.unique(labels)) ax = axes[1, idx] ax.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50) ax.set_title(f'Inconsistency t={t} → {k} clusters') ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') plt.tight_layout()plt.savefig('inconsistency_cutting.png', dpi=150)print("Inconsistency cutting saved to inconsistency_cutting.png") # Analyze the inconsistency structureprint("=== Inconsistency Analysis ===")print(f"Total merges: {len(inconsistency_coeffs)}")print(f"Inconsistency range: [{inconsistency_coeffs.min():.3f}, {inconsistency_coeffs.max():.3f}]")print(f"Mean inconsistency: {inconsistency_coeffs.mean():.3f}")print(f"Std inconsistency: {inconsistency_coeffs.std():.3f}") print("Merges with high inconsistency (>1.0):")high_inconsistency = np.where(inconsistency_coeffs > 1.0)[0]for idx in high_inconsistency[-5:]: # Last 5 high-inconsistency merges print(f" Merge {idx+1}: I={inconsistency_coeffs[idx]:.3f}, height={Z[idx, 2]:.3f}")Advantages of Inconsistency-Based Cutting:
Limitations:
The gap statistic (Tibshirani, Walther, and Hastie, 2001) provides a principled, statistically-grounded method for selecting the optimal number of clusters.
Core Idea:
Compare the within-cluster dispersion of your clustering to what you'd expect from data with no cluster structure (uniform distribution). If your data has real clusters, the dispersion will be much smaller than the null expectation—and this gap will be largest at the correct k.
Mathematical Definition:
$$\text{Gap}(k) = E^*[\log W_k] - \log W_k$$
where:
Algorithm:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
import numpy as npfrom scipy.cluster.hierarchy import linkage, fclusterimport matplotlib.pyplot as plt def compute_wk(X, labels): """Compute pooled within-cluster sum of squares.""" wk = 0 for cluster_id in np.unique(labels): cluster_points = X[labels == cluster_id] centroid = cluster_points.mean(axis=0) wk += np.sum((cluster_points - centroid) ** 2) return wk def gap_statistic(X, k_range, n_refs=20, random_state=42): """ Compute gap statistic for a range of k values. Returns: gaps: Gap statistic for each k sk: Standard error for each k log_wks: Log within-cluster dispersion for actual data ref_log_wks_mean: Mean log dispersion for reference data """ np.random.seed(random_state) n, d = X.shape # Bounding box for uniform reference mins = X.min(axis=0) maxs = X.max(axis=0) log_wks = [] ref_log_wks_all = [] # Compute linkage once Z = linkage(X, method='ward') for k in k_range: # Cluster actual data labels = fcluster(Z, t=k, criterion='maxclust') wk = compute_wk(X, labels) log_wks.append(np.log(wk)) # Cluster reference data ref_log_wks = [] for _ in range(n_refs): # Generate uniform reference ref_X = np.random.uniform(mins, maxs, size=(n, d)) ref_Z = linkage(ref_X, method='ward') ref_labels = fcluster(ref_Z, t=k, criterion='maxclust') ref_wk = compute_wk(ref_X, ref_labels) ref_log_wks.append(np.log(ref_wk)) ref_log_wks_all.append(ref_log_wks) log_wks = np.array(log_wks) ref_log_wks_mean = np.array([np.mean(refs) for refs in ref_log_wks_all]) ref_log_wks_std = np.array([np.std(refs) for refs in ref_log_wks_all]) # Gap = E*[log(Wk)] - log(Wk) gaps = ref_log_wks_mean - log_wks # Standard error with sqrt(1 + 1/B) factor sk = ref_log_wks_std * np.sqrt(1 + 1/n_refs) return gaps, sk, log_wks, ref_log_wks_mean # Generate test data with 3 clustersnp.random.seed(42)X = np.vstack([ np.random.randn(50, 2) * 0.5 + [0, 0], np.random.randn(50, 2) * 0.5 + [4, 4], np.random.randn(50, 2) * 0.5 + [8, 0]]) # Compute gap statistick_range = range(1, 11)gaps, sk, log_wks, ref_log_wks = gap_statistic(X, k_range, n_refs=10) # Find optimal k using gap criterion# Select smallest k where Gap(k) >= Gap(k+1) - s_{k+1}optimal_k = 1for i in range(len(gaps) - 1): if gaps[i] >= gaps[i+1] - sk[i+1]: optimal_k = k_range[i] break fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # Gap statisticax = axes[0]ax.errorbar(list(k_range), gaps, yerr=sk, fmt='bo-', capsize=5, linewidth=2)ax.axvline(x=optimal_k, color='red', linestyle='--', linewidth=2, label=f'Optimal k={optimal_k}')ax.set_xlabel('Number of Clusters (k)')ax.set_ylabel('Gap Statistic')ax.set_title('Gap Statistic')ax.legend()ax.grid(True, alpha=0.3) # Log(Wk) comparisonax = axes[1]ax.plot(list(k_range), log_wks, 'b-o', linewidth=2, label='Actual data')ax.plot(list(k_range), ref_log_wks, 'r--s', linewidth=2, label='Reference (uniform)')ax.set_xlabel('Number of Clusters (k)')ax.set_ylabel('log(Wk)')ax.set_title('Within-Cluster Dispersion')ax.legend()ax.grid(True, alpha=0.3) # Resulting clustersax = axes[2]Z = linkage(X, method='ward')labels = fcluster(Z, t=optimal_k, criterion='maxclust')ax.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50)ax.set_title(f'Gap-Optimal Clustering (k={optimal_k})')ax.set_xlabel('Feature 1')ax.set_ylabel('Feature 2') plt.tight_layout()plt.savefig('gap_statistic.png', dpi=150)print(f"Gap statistic analysis saved (optimal k={optimal_k})")Use at least 20-50 reference samples for stable estimates. The uniform distribution may not be appropriate for all data—consider matching the data's marginal distributions instead. Gap statistic can be slow for large datasets due to repeated clustering of reference samples.
Beyond automated methods like the gap statistic, several validation strategies help assess whether a chosen cut produces a meaningful clustering:
1. Internal Validation Metrics:
Measures computed from the data and clustering alone:
Silhouette Score: Mean (b-a)/max(a,b) where a = mean intra-cluster distance, b = mean nearest-cluster distance. Range [-1, 1], higher is better.
Calinski-Harabasz Index: Ratio of between-cluster to within-cluster variance, weighted by cluster counts. Higher is better.
Davies-Bouldin Index: Average similarity of each cluster to its most similar cluster. Lower is better.
2. Stability Analysis:
Do similar cuts emerge when data is perturbed?
3. External Validation (when ground truth exists):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
import numpy as npfrom scipy.cluster.hierarchy import linkage, fclusterfrom sklearn.metrics import ( silhouette_score, calinski_harabasz_score, davies_bouldin_score, adjusted_rand_score)import matplotlib.pyplot as plt # Generate data with ground truthnp.random.seed(42)n_per_cluster = 50X = np.vstack([ np.random.randn(n_per_cluster, 2) * 0.5 + [0, 0], np.random.randn(n_per_cluster, 2) * 0.5 + [4, 0], np.random.randn(n_per_cluster, 2) * 0.5 + [2, 4]])true_labels = np.repeat([0, 1, 2], n_per_cluster) Z = linkage(X, method='ward') # Evaluate metrics for different kk_range = range(2, 11)metrics = { 'Silhouette': [], 'Calinski-Harabasz': [], 'Davies-Bouldin': [], 'ARI vs True': []} for k in k_range: labels = fcluster(Z, t=k, criterion='maxclust') metrics['Silhouette'].append(silhouette_score(X, labels)) metrics['Calinski-Harabasz'].append(calinski_harabasz_score(X, labels)) metrics['Davies-Bouldin'].append(davies_bouldin_score(X, labels)) metrics['ARI vs True'].append(adjusted_rand_score(true_labels, labels)) fig, axes = plt.subplots(2, 2, figsize=(14, 10)) # Plot each metricfor ax, (name, values) in zip(axes.flatten(), metrics.items()): ax.plot(list(k_range), values, 'o-', linewidth=2, markersize=8) ax.set_xlabel('Number of Clusters (k)') ax.set_ylabel(name) ax.set_title(name) ax.grid(True, alpha=0.3) # Mark optimal if name == 'Davies-Bouldin': opt_k = k_range[np.argmin(values)] opt_val = min(values) else: opt_k = k_range[np.argmax(values)] opt_val = max(values) ax.axvline(x=opt_k, color='red', linestyle='--', alpha=0.7) ax.scatter([opt_k], [opt_val], color='red', s=150, zorder=5) ax.annotate(f'k={opt_k}', (opt_k, opt_val), xytext=(10, 10), textcoords='offset points') plt.tight_layout()plt.savefig('validation_metrics.png', dpi=150)print("Validation metrics saved to validation_metrics.png") # Stability analysisprint("=== Stability Analysis ===")n_bootstrap = 20ari_matrix = np.zeros((n_bootstrap, n_bootstrap)) bootstrap_labels = []for i in range(n_bootstrap): # Bootstrap sample indices = np.random.choice(len(X), size=len(X), replace=True) X_boot = X[indices] Z_boot = linkage(X_boot, method='ward') labels_boot = fcluster(Z_boot, t=3, criterion='maxclust') bootstrap_labels.append((indices, labels_boot)) # Compare pairs of bootstrap clusteringsari_scores = []for i in range(n_bootstrap): for j in range(i+1, n_bootstrap): idx_i, labels_i = bootstrap_labels[i] idx_j, labels_j = bootstrap_labels[j] # Find common indices common = np.intersect1d(idx_i, idx_j) if len(common) > 10: # Map labels to common indices labels_i_common = labels_i[[np.where(idx_i == c)[0][0] for c in common]] labels_j_common = labels_j[[np.where(idx_j == c)[0][0] for c in common]] ari = adjusted_rand_score(labels_i_common, labels_j_common) ari_scores.append(ari) print(f"Bootstrap stability (k=3):")print(f" Mean ARI between bootstrap samples: {np.mean(ari_scores):.4f}")print(f" Std ARI: {np.std(ari_scores):.4f}")print(f" (ARI > 0.8 suggests stable clusters)")| Metric | Range | Optimal | Interpretation |
|---|---|---|---|
| Silhouette | [-1, 1] | Higher | How well points fit their cluster vs nearest other |
| Calinski-Harabasz | [0, ∞) | Higher | Between-cluster vs within-cluster variance ratio |
| Davies-Bouldin | [0, ∞) | Lower | Average cluster similarity to most similar neighbor |
| ARI | [-1, 1] | Higher | Agreement with ground truth (if available) |
| Stability | [0, 1] | Higher | Consistency across data perturbations |
We've covered the complete spectrum of techniques for extracting flat cluster assignments from hierarchical dendrograms.
What's Next:
We've primarily focused on agglomerative (bottom-up) clustering. In the final page of this module, we'll explore divisive clustering—the top-down alternative that starts with all points in one cluster and recursively splits. We'll understand when divisive approaches might outperform agglomerative methods and see the computational challenges that make them rare in practice.
You now have a complete toolkit for extracting clusters from hierarchical structures: distance cuts, count cuts, inconsistency cuts, gap statistic, and validation strategies. You can justify your cut choice with appropriate metrics and assess cluster stability. Next, we'll complete the picture with divisive clustering.