Hierarchical Clustering - Learning Module

Loading content...

0/245

Linkage Methods

The Critical Choice That Shapes Your Hierarchy

In agglomerative clustering, every iteration asks the same question: Which two clusters are most similar and should be merged? The answer depends entirely on how we define similarity between clusters—and this is where linkage methods enter as the single most consequential design decision in hierarchical clustering.

Given two clusters A and B containing multiple points each, what is the "distance" between them? There's no single correct answer. Should we use the distance between their closest points? Their farthest points? Their centers? Some average? Each choice leads to fundamentally different behavior, producing hierarchies with radically different structures from the same input data.

Understanding linkage methods is understanding hierarchical clustering. The choice of linkage determines whether your algorithm finds elongated chains, compact spheres, or balanced trees. It affects sensitivity to noise, ability to detect irregularly shaped clusters, and computational complexity.

What You Will Learn

By the end of this page, you will master all major linkage methods: Single (minimum), Complete (maximum), Average (UPGMA/WPGMA), Ward's (minimum variance), Centroid, and Median linkages. For each method, you'll understand the mathematical definition, geometric intuition, characteristic behavior, advantages, disadvantages, and appropriate use cases. You'll be able to select the right linkage for any clustering problem.

The Linkage Problem

Formal Definition:

Given two clusters Cᵢ = {x₁, x₂, ..., xₘ} and Cⱼ = {y₁, y₂, ..., yₙ}, and a point-wise distance function d(x, y), the linkage function L(Cᵢ, Cⱼ) computes a single scalar representing the inter-cluster distance.

The linkage function must satisfy:

Non-negativity: L(Cᵢ, Cⱼ) ≥ 0
Symmetry: L(Cᵢ, Cⱼ) = L(Cⱼ, Cᵢ)
Identity for singletons: L({x}, {y}) = d(x, y)

Beyond these minimal requirements, different linkage functions encode different notions of cluster similarity, leading to dramatically different hierarchies.

Why Linkage Matters So Much:

Consider a simple example: two clusters where one is compact and one is elongated. The closest points between them might be quite near, but the farthest points might be very far. Single linkage and complete linkage will make opposite decisions about whether to merge these clusters, leading to entirely different hierarchical structures.

Summary of Common Linkage Methods
Linkage	Also Known As	Formula	Cluster Shape Tendency
Single	Minimum, Nearest-neighbor	min d(x,y) for x∈Cᵢ, y∈Cⱼ	Elongated chains
Complete	Maximum, Farthest-neighbor	max d(x,y) for x∈Cᵢ, y∈Cⱼ	Compact spheres
Average (UPGMA)	Mean, UPGMA	Mean of all pairwise distances	Moderate compactness
Weighted (WPGMA)	WPGMA	Mean of sub-cluster distances	Moderate compactness
Ward	Minimum variance	Increase in total within-cluster variance	Compact, equal-sized
Centroid	UPGMC	Distance between centroids	Variable
Median	WPGMC	Recursive midpoint distance	Variable

Single Linkage (Nearest Neighbor)

Mathematical Definition:

$$d_{\text{single}}(C_i, C_j) = \min_{x \in C_i, y \in C_j} d(x, y)$$

Single linkage defines inter-cluster distance as the distance between the closest pair of points, one from each cluster. It's the most optimistic measure—clusters are as close as their nearest members.

Geometric Intuition:

Imagine stretching a rubber band from cluster A to cluster B. Single linkage uses the shortest possible rubber band. If there's any path of nearby points connecting two regions, single linkage will merge them early.

The Chaining Effect:

Single linkage is notorious for producing long, chain-like clusters. If points A₁→A₂→A₃→...→Aₙ form a chain where each consecutive pair is close, single linkage will merge them all into one cluster—even if A₁ and Aₙ are far apart.

This "chaining effect" can be a bug or a feature:

Bug: It can merge clearly separate clusters connected by a few noisy points
Feature: It naturally discovers elongated, non-convex cluster shapes

Single Linkage Advantages

•Detects non-convex clusters: Can find elongated, curved, or irregular shapes
•Computationally efficient: O(n²) algorithm exists (SLINK)
•Space efficient: SLINK only requires O(n) memory
•Theoretically grounded: Connected to minimum spanning tree
•No cluster size bias: Works with imbalanced cluster sizes

Single Linkage Disadvantages

•Chaining effect: Sensitive to noise and outlier bridges
•Poor cluster separation: Tends to produce straggly clusters
•Not suitable for compact clusters: Will merge through thin bridges
•Sensitive to noise: Single outlier can merge distant clusters
•Shallow dendrograms: Merges happen early at low distances

Connection to Minimum Spanning Tree

Single linkage clustering is mathematically equivalent to the minimum spanning tree (MST) of the data. The dendrogram can be constructed by building the MST (using Kruskal's or Prim's algorithm) and recording edge weights as merge distances. This connection enables the efficient O(n²) SLINK algorithm and provides theoretical guarantees from graph theory.

single_linkage_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt
 
# Create data demonstrating chaining effect
np.random.seed(42)
 
# Two well-separated compact clusters
cluster1 = np.random.randn(30, 2) * 0.5 + [0, 0]
cluster2 = np.random.randn(30, 2) * 0.5 + [8, 0]
 
# Bridge of noise points connecting them
bridge = np.array([[i, 0] for i in np.linspace(1, 7, 10)])
bridge += np.random.randn(10, 2) * 0.1  # Small noise
 
X = np.vstack([cluster1, bridge, cluster2])
 
# Compare single vs complete linkage
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
 
# Single linkage
Z_single = linkage(X, method='single')
labels_single = fcluster(Z_single, t=2, criterion='maxclust')
 
axes[0, 0].scatter(X[:, 0], X[:, 1], c=labels_single, cmap='viridis', s=50)
axes[0, 0].set_title('Single Linkage: 2 Clusters')
axes[0, 0].set_xlabel('Feature 1')
axes[0, 0].set_ylabel('Feature 2')
 
dendrogram(Z_single, ax=axes[0, 1], truncate_mode='lastp', p=20)
axes[0, 1].set_title('Single Linkage Dendrogram')
axes[0, 1].set_ylabel('Distance')
 
# Complete linkage
Z_complete = linkage(X, method='complete')
labels_complete = fcluster(Z_complete, t=2, criterion='maxclust')
 
axes[1, 0].scatter(X[:, 0], X[:, 1], c=labels_complete, cmap='viridis', s=50)
axes[1, 0].set_title('Complete Linkage: 2 Clusters')
axes[1, 0].set_xlabel('Feature 1')
axes[1, 0].set_ylabel('Feature 2')
 
dendrogram(Z_complete, ax=axes[1, 1], truncate_mode='lastp', p=20)
axes[1, 1].set_title('Complete Linkage Dendrogram')
axes[1, 1].set_ylabel('Distance')
 
plt.tight_layout()
plt.savefig('linkage_comparison.png', dpi=150)
print("Visualization saved to linkage_comparison.png")
 
# The chaining effect: single linkage merges through the bridge
print(f"\nSingle linkage found {len(np.unique(labels_single))} clusters")
print(f"Complete linkage found {len(np.unique(labels_complete))} clusters")
 
# Analyze cluster compositions
for label in np.unique(labels_single):
    size = np.sum(labels_single == label)
    print(f"  Single cluster {label}: {size} points")

Complete Linkage (Farthest Neighbor)

Mathematical Definition:

$$d_{\text{complete}}(C_i, C_j) = \max_{x \in C_i, y \in C_j} d(x, y)$$

Complete linkage defines inter-cluster distance as the distance between the farthest pair of points, one from each cluster. It's the most pessimistic measure—clusters are as far apart as their most distant members.

Geometric Intuition:

Complete linkage asks: "What's the worst-case distance if I merge these clusters?" It ensures that after merging, no two points in the combined cluster are farther apart than the merge distance. This creates compact, spherical clusters with known diameter bounds.

Behavior Characteristics:

Complete linkage tends to produce balanced, compact clusters of similar diameter. It's resistant to chaining because a single bridge point cannot make two clusters appear close—the maximum distance still reflects the true separation.

However, it's sensitive to outliers in a different way: an outlier far from a cluster center increases the cluster's effective diameter, potentially preventing merges that would otherwise be intuitive.

Complete Linkage Advantages

•Compact clusters: Produces tight, spherical clusters
•Resistant to chaining: Not fooled by noise bridges
•Bounded diameter: Merged cluster diameter ≤ merge distance
•Well-separated hierarchy: Clear distinction between levels
•Reproducible: Deterministic given the data

Complete Linkage Disadvantages

•Cannot find non-convex shapes: Biased toward spherical clusters
•Sensitive to outliers: Single far point distorts cluster diameter
•Size imbalance: May break large clusters to merge small ones
•Crowding: Forces equal diameter, not natural shapes
•Less efficient: O(n² log n), no O(n²) algorithm known

The Diameter Guarantee

Complete linkage provides a useful guarantee: if two clusters were merged at distance d, then the maximum pairwise distance within the merged cluster is at most d. This 'diameter bound' is valuable in applications where you need to ensure all members of a cluster are within a known distance of each other (e.g., network latency constraints, physical proximity requirements).

When to Use Complete Linkage:

When you expect compact, roughly spherical clusters
When you need a diameter guarantee on cluster size
When data contains bridge noise that might fool single linkage
When clusters should have similar sizes and densities
When working with categorical or discrete data where elongated structures are unlikely

When to Avoid Complete Linkage:

When clusters have non-convex or elongated shapes
When clusters have significantly different densities
When outliers are common in the data
When natural clusters have varying sizes

Average Linkage (UPGMA)

Mathematical Definition:

$$d_{\text{UPGMA}}(C_i, C_j) = \frac{1}{|C_i| \cdot |C_j|} \sum_{x \in C_i} \sum_{y \in C_j} d(x, y)$$

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) defines inter-cluster distance as the arithmetic mean of all pairwise distances between points in the two clusters. It considers every point equally, weighted by cluster size.

Geometric Intuition:

Average linkage asks: "On average, how far apart are points in these two clusters?" It balances between the extremes of single linkage (only considering the closest pair) and complete linkage (only considering the farthest pair).

UPGMA vs WPGMA:

There are two variants of average linkage:

UPGMA (Unweighted): Weights each point equally. A cluster with 100 points has more influence than a cluster with 10 points.
WPGMA (Weighted): Weights each sub-cluster equally. Uses the average of the two sub-cluster distances to any third cluster, regardless of sub-cluster sizes.

$$d_{\text{WPGMA}}(C_i \cup C_j, C_k) = \frac{d(C_i, C_k) + d(C_j, C_k)}{2}$$

UPGMA vs WPGMA Comparison
Aspect	UPGMA (Unweighted)	WPGMA (Weighted)
Weighting	By number of points	By number of sub-clusters
Large cluster influence	Strong (proportional to size)	Equal to small clusters
Mathematical property	Preserves ultrametric	May violate ultrametric
Use case	When all points equally important	When cluster history matters
Sensitivity to merging order	Lower	Higher

Average Linkage as a Compromise

Average linkage is often the default choice when you're unsure. It's less prone to chaining than single linkage, less biased toward spherical shapes than complete linkage, and generally produces reasonable hierarchies for a wide range of data distributions. It's the 'safe' choice for exploratory analysis.

average_linkage_variants.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
 
# Create asymmetric clusters to show UPGMA vs WPGMA difference
np.random.seed(42)
 
# Small cluster (10 points)
small_cluster = np.random.randn(10, 2) * 0.3 + [0, 0]
# Large cluster (50 points)
large_cluster = np.random.randn(50, 2) * 0.3 + [2, 0]
# Medium cluster (20 points)
medium_cluster = np.random.randn(20, 2) * 0.3 + [1, 2]
 
X = np.vstack([small_cluster, large_cluster, medium_cluster])
 
# Compare UPGMA (average) vs WPGMA (weighted)
Z_upgma = linkage(X, method='average')   # UPGMA
Z_wpgma = linkage(X, method='weighted')  # WPGMA
 
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
 
# Scatter plot
colors = ['red'] * 10 + ['blue'] * 50 + ['green'] * 20
axes[0].scatter(X[:, 0], X[:, 1], c=colors, s=50, alpha=0.7)
axes[0].set_title('Data: Unequal Cluster Sizes')
axes[0].set_xlabel('Feature 1')
axes[0].set_ylabel('Feature 2')
axes[0].legend(['Small (10)', 'Large (50)', 'Medium (20)'], loc='upper right')
 
# UPGMA dendrogram
dendrogram(Z_upgma, ax=axes[1], truncate_mode='lastp', p=15)
axes[1].set_title('UPGMA (Average) Linkage')
axes[1].set_ylabel('Distance')
axes[1].set_xlabel('Cluster')
 
# WPGMA dendrogram  
dendrogram(Z_wpgma, ax=axes[2], truncate_mode='lastp', p=15)
axes[2].set_title('WPGMA (Weighted) Linkage')
axes[2].set_ylabel('Distance')
axes[2].set_xlabel('Cluster')
 
plt.tight_layout()
plt.savefig('average_linkage_variants.png', dpi=150)
print("Comparison saved to average_linkage_variants.png")
 
# Examine the final merges
print("\n=== Final Merges ===")
print("\nUPGMA (last 3 merges):")
for i in range(-3, 0):
    print(f"  Distance: {Z_upgma[i, 2]:.4f}, Combined size: {int(Z_upgma[i, 3])}")
 
print("\nWPGMA (last 3 merges):")
for i in range(-3, 0):
    print(f"  Distance: {Z_wpgma[i, 2]:.4f}, Combined size: {int(Z_wpgma[i, 3])}")

Average Linkage Advantages

•Balanced behavior: Compromise between single and complete
•Less sensitive to outliers: One extreme point has limited effect
•Robust: Works well across diverse data distributions
•Widely used: Standard in bioinformatics (phylogenetics)
•Intuitive: 'Average distance' is easy to explain

Average Linkage Disadvantages

•Biased toward equal sizes: May split natural large clusters
•Still assumes convexity: Struggles with elongated shapes
•Computationally expensive: O(n² × m) for m points per cluster
•No strong theoretical guarantees: Heuristic compromise
•UPGMA vs WPGMA confusion: Different results, same name

Ward's Linkage (Minimum Variance)

Mathematical Definition:

Ward's method minimizes the increase in total within-cluster variance (sum of squared deviations from cluster centroids) when merging two clusters:

$$d_{\text{Ward}}(C_i, C_j) = \sqrt{\frac{2 \cdot n_i \cdot n_j}{n_i + n_j}} \cdot |\bar{C}_i - \bar{C}_j|$$

where $n_i$ and $n_j$ are cluster sizes, and $\bar{C}_i$ and $\bar{C}_j$ are centroids.

Equivalently, the Ward distance is the increase in total intra-cluster variance:

$$\Delta V = V(C_i \cup C_j) - V(C_i) - V(C_j)$$

where $V(C) = \sum_{x \in C} |x - \bar{C}|^2$ is the within-cluster variance.

Geometric Intuition:

Ward's method asks: "How much will our clusters become more spread out if we merge these two?" It greedily minimizes variance increase at each step, similar to K-Means but building a full hierarchy.

Connection to K-Means:

Ward's linkage optimizes the same objective as K-Means (minimize within-cluster sum of squares). Cutting a Ward dendrogram at a chosen level often produces clusters similar to K-Means with that many clusters. This makes Ward's method a natural choice when you expect K-Means-like compact clusters.

Euclidean Distance Required

Ward's method fundamentally relies on computing centroids and variances, which requires a Euclidean (or more generally, inner-product) space. Unlike other linkages that only need pairwise distances, Ward's method cannot be applied to arbitrary distance matrices—it needs the original feature vectors. This is a significant limitation for non-Euclidean data.

wards_linkage_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
 
# Generate data with known cluster structure
np.random.seed(42)
n_samples = 150
 
# Three well-separated Gaussian clusters (ideal for Ward's method)
X = np.vstack([
    np.random.randn(50, 2) * 0.8 + [0, 0],
    np.random.randn(50, 2) * 0.8 + [4, 4],
    np.random.randn(50, 2) * 0.8 + [8, 0]
])
 
# Apply Ward's linkage
Z_ward = linkage(X, method='ward')
 
# Compare with K-Means for k=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans_labels = kmeans.fit_predict(X)
 
# Cut Ward's dendrogram at 3 clusters
ward_labels = fcluster(Z_ward, t=3, criterion='maxclust')
 
# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
 
# Ward clustering result
axes[0, 0].scatter(X[:, 0], X[:, 1], c=ward_labels, cmap='viridis', s=50)
axes[0, 0].set_title("Ward's Linkage (3 clusters)")
axes[0, 0].set_xlabel('Feature 1')
axes[0, 0].set_ylabel('Feature 2')
 
# K-Means result
axes[0, 1].scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='viridis', s=50)
axes[0, 1].scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], 
                   c='red', marker='X', s=200, edgecolors='black')
axes[0, 1].set_title('K-Means (k=3)')
axes[0, 1].set_xlabel('Feature 1')
axes[0, 1].set_ylabel('Feature 2')
 
# Ward dendrogram
dendrogram(Z_ward, ax=axes[1, 0], truncate_mode='lastp', p=20)
axes[1, 0].set_title("Ward's Linkage Dendrogram")
axes[1, 0].set_ylabel('Distance (Variance Increase)')
axes[1, 0].axhline(y=Z_ward[-3, 2], color='r', linestyle='--', label='3-cluster cut')
axes[1, 0].legend()
 
# Within-cluster variance at each merge
variances = Z_ward[:, 2]  # The 3rd column is the merge distance (variance increase)
cumulative_variance = np.cumsum(variances[::-1])[::-1]  # Total variance reduction
 
axes[1, 1].plot(range(1, len(cumulative_variance) + 1), cumulative_variance)
axes[1, 1].set_xlabel('Number of Clusters')
axes[1, 1].set_ylabel('Cumulative Within-Cluster Variance')
axes[1, 1].set_title('Variance vs Number of Clusters (Elbow Analysis)')
axes[1, 1].axvline(x=3, color='r', linestyle='--', label='Elbow at k=3')
axes[1, 1].legend()
axes[1, 1].set_xlim(1, 20)
 
plt.tight_layout()
plt.savefig('ward_vs_kmeans.png', dpi=150)
print("Comparison saved to ward_vs_kmeans.png")
 
# Quantify agreement between Ward and K-Means
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(kmeans_labels, ward_labels)
print(f"\nAdjusted Rand Index (Ward vs K-Means): {ari:.4f}")
print("(1.0 = perfect agreement, 0.0 = random)")

Ward's Linkage Advantages

•Compact, spherical clusters: Similar to K-Means results
•Minimizes variance: Clear objective function
•Works well with Gaussian clusters: Theoretically optimal
•Produces balanced trees: Equal-sized clusters preferred
•Good for elbow analysis: Variance curve aids k selection

Ward's Linkage Disadvantages

•Requires Euclidean space: Cannot use arbitrary distances
•Biased toward equal sizes: May split natural large clusters
•Assumes sphericity: Poor with elongated or irregular shapes
•Sensitive to outliers: Variance is not robust
•Not scale-invariant: Feature scaling affects results

Centroid and Median Linkages

Centroid Linkage (UPGMC):

$$d_{\text{centroid}}(C_i, C_j) = |\bar{C}_i - \bar{C}_j|$$

Centroid linkage defines inter-cluster distance as the Euclidean distance between cluster centroids (means). After merging, the new centroid is the weighted average:

$$\bar{C}_{ij} = \frac{n_i \bar{C}_i + n_j \bar{C}_j}{n_i + n_j}$$

Median Linkage (WPGMC):

Median linkage also uses centroids but ignores cluster sizes:

$$\bar{C}_{ij} = \frac{\bar{C}_i + \bar{C}_j}{2}$$

The merged cluster centroid is the midpoint of the two sub-cluster centroids, regardless of how many points are in each.

The Inversion Problem:

Both centroid and median linkages can produce inversions in the dendrogram—where a merge occurs at a smaller distance than a previous merge. This violates the monotonicity property expected in valid dendrograms and can make interpretation difficult.

Inversions: When Dendrograms Break

In a valid dendrogram, merge distances should be monotonically increasing: later merges happen at larger distances. Centroid and median linkages can violate this, producing 'inversions' where the dendrogram 'goes backwards.' This happens because centroids move during merging, potentially bringing clusters closer. Inversions make it impossible to cut the dendrogram at a fixed height to get a valid clustering.

Centroid vs Median Linkage
Aspect	Centroid (UPGMC)	Median (WPGMC)
Centroid computation	Weighted by size	Unweighted (midpoint)
Large cluster influence	Strong	Equal to small clusters
Inversion risk	Moderate	Higher
Requires Euclidean	Yes (needs centroids)	Yes (needs centroids)
Common use case	Rare (use Ward's instead)	Very rare

Why These Linkages Are Rarely Used:

Inversions: Neither method guarantees monotonic dendrograms
Ward is strictly better: For variance-based clustering, Ward's method is more principled
Euclidean requirement: Like Ward, but without Ward's theoretical justification
Conceptual confusion: The "distance to centroid" conflates points and clusters

When to Consider Centroid Linkage:

Despite its limitations, centroid linkage can be useful when:

You specifically care about distances between cluster centers, not pairwise point distances
You're working with data where the centroid is a meaningful representation (e.g., document clustering with TF-IDF)
You want behavior similar to K-Means but with a hierarchical output

In practice, Ward's method is almost always preferred over centroid/median linkages for Euclidean data.

Choosing the Right Linkage

Selecting the appropriate linkage method depends on your data characteristics, domain knowledge, and clustering goals. Here's a systematic decision framework:

Linkage Selection Guide
If Your Data Has...	Consider...	Avoid...
Compact, spherical clusters	Ward, Complete	Single
Elongated or chain-like clusters	Single	Complete, Ward
Unknown structure (exploratory)	Average (UPGMA)	Centroid, Median
Significant outliers	Average, Complete	Single, Ward
Varying cluster densities	Single, Average	Complete, Ward
Euclidean features	Ward (variance focus)	N/A
Non-Euclidean distances	Single, Complete, Average	Ward, Centroid
Need diameter guarantee	Complete	All others
Very large n (>10,000)	Single (SLINK)	Ward (memory-intensive)

The Practical Default Strategy

When uncertain, start with Ward's linkage for Euclidean data (robust, K-Means-like) or Average linkage for arbitrary distances (balanced compromise). If you see unexpected chaining, switch to Complete. If you believe clusters are non-convex, try Single and inspect for spurious bridges. Always visualize dendrograms and compare multiple linkages on your data.

linkage_comparison_all.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import numpy as np
from scipy.cluster.hierarchy import linkage, fcluster
from sklearn.metrics import silhouette_score, calinski_harabasz_score
import matplotlib.pyplot as plt
 
# Generate 2D data with known structure
np.random.seed(42)
 
# Three clusters with different characteristics
cluster1 = np.random.randn(50, 2) * 0.5 + [0, 0]       # Compact
cluster2 = np.random.randn(50, 2) * [0.3, 2] + [4, 0]  # Elongated
cluster3 = np.random.randn(50, 2) * 0.5 + [2, 4]       # Compact
 
X = np.vstack([cluster1, cluster2, cluster3])
true_labels = np.array([0]*50 + [1]*50 + [2]*50)
 
# Compare all standard linkages
linkages = ['single', 'complete', 'average', 'ward']
results = {}
 
print("Linkage Comparison (3 clusters):")
print("-" * 70)
print(f"{'Linkage':<12} {'Silhouette':>12} {'Calinski-Harabasz':>18} {'ARI vs True':>12}")
print("-" * 70)
 
fig, axes = plt.subplots(2, 4, figsize=(18, 9))
 
for idx, link in enumerate(linkages):
    # Compute linkage
    Z = linkage(X, method=link)
    labels = fcluster(Z, t=3, criterion='maxclust')
    
    # Compute metrics
    sil = silhouette_score(X, labels)
    ch = calinski_harabasz_score(X, labels)
    
    from sklearn.metrics import adjusted_rand_score
    ari = adjusted_rand_score(true_labels, labels)
    
    results[link] = {'silhouette': sil, 'ch': ch, 'ari': ari, 'labels': labels}
    
    print(f"{link:<12} {sil:>12.4f} {ch:>18.2f} {ari:>12.4f}")
    
    # Scatter plot
    axes[0, idx].scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=30)
    axes[0, idx].set_title(f'{link.capitalize()} Linkage')
    axes[0, idx].set_xlabel('Feature 1')
    if idx == 0:
        axes[0, idx].set_ylabel('Feature 2')
    
    # Dendrogram
    from scipy.cluster.hierarchy import dendrogram
    dendrogram(Z, ax=axes[1, idx], truncate_mode='lastp', p=15, no_labels=True)
    axes[1, idx].set_title(f'{link.capitalize()} Dendrogram')
    if idx == 0:
        axes[1, idx].set_ylabel('Distance')
 
print("-" * 70)
 
plt.tight_layout()
plt.savefig('all_linkages_comparison.png', dpi=150)
print("\nVisualization saved to all_linkages_comparison.png")

Summary: Linkage Methods

We've comprehensively covered the linkage methods that define how agglomerative clustering measures inter-cluster distance—the central design decision in hierarchical clustering.

Key Takeaways

•Single linkage uses minimum pairwise distance, producing chain-like clusters that can detect non-convex shapes but are sensitive to noise bridges.
•Complete linkage uses maximum pairwise distance, producing compact spherical clusters with bounded diameter but biased toward equal sizes.
•Average linkage (UPGMA) uses mean pairwise distance, balancing between single and complete; the safe default for exploratory analysis.
•Ward's linkage minimizes within-cluster variance increase, producing K-Means-like results; optimal for Gaussian clusters but requires Euclidean space.
•Centroid/median linkages use centroid distances but can produce dendrogram inversions; rarely preferred over Ward's method.
•No universally best linkage—choice depends on data structure, domain knowledge, and clustering goals.

What's Next:

Linkage methods produce the dendrogram, but interpreting this hierarchical structure requires careful analysis. In the next page, we'll learn to read and interpret dendrograms: understanding merge heights, identifying cluster boundaries, recognizing hierarchical patterns, and extracting actionable insights from these visualizations.

Page Complete

You now command a complete understanding of hierarchical clustering linkage methods: their mathematical definitions, geometric intuitions, behavioral characteristics, and practical trade-offs. You can intelligently select linkage methods based on data properties and clustering goals. Next, we'll master dendrogram interpretation.

Linkage Methods

The Critical Choice That Shapes Your Hierarchy

What You Will Learn

The Linkage Problem

Formal Definition:

The linkage function must satisfy:

Non-negativity: L(Cᵢ, Cⱼ) ≥ 0
Symmetry: L(Cᵢ, Cⱼ) = L(Cⱼ, Cᵢ)
Identity for singletons: L({x}, {y}) = d(x, y)

Beyond these minimal requirements, different linkage functions encode different notions of cluster similarity, leading to dramatically different hierarchies.

Why Linkage Matters So Much:

Summary of Common Linkage Methods
Linkage	Also Known As	Formula	Cluster Shape Tendency
Single	Minimum, Nearest-neighbor	min d(x,y) for x∈Cᵢ, y∈Cⱼ	Elongated chains
Complete	Maximum, Farthest-neighbor	max d(x,y) for x∈Cᵢ, y∈Cⱼ	Compact spheres
Average (UPGMA)	Mean, UPGMA	Mean of all pairwise distances	Moderate compactness
Weighted (WPGMA)	WPGMA	Mean of sub-cluster distances	Moderate compactness
Ward	Minimum variance	Increase in total within-cluster variance	Compact, equal-sized
Centroid	UPGMC	Distance between centroids	Variable
Median	WPGMC	Recursive midpoint distance	Variable

Single Linkage (Nearest Neighbor)

Mathematical Definition:

$$d_{\text{single}}(C_i, C_j) = \min_{x \in C_i, y \in C_j} d(x, y)$$

Geometric Intuition:

The Chaining Effect:

This "chaining effect" can be a bug or a feature:

Bug: It can merge clearly separate clusters connected by a few noisy points
Feature: It naturally discovers elongated, non-convex cluster shapes

Single Linkage Advantages

•Detects non-convex clusters: Can find elongated, curved, or irregular shapes
•Computationally efficient: O(n²) algorithm exists (SLINK)
•Space efficient: SLINK only requires O(n) memory
•Theoretically grounded: Connected to minimum spanning tree
•No cluster size bias: Works with imbalanced cluster sizes

Single Linkage Disadvantages

•Chaining effect: Sensitive to noise and outlier bridges
•Poor cluster separation: Tends to produce straggly clusters
•Not suitable for compact clusters: Will merge through thin bridges
•Sensitive to noise: Single outlier can merge distant clusters
•Shallow dendrograms: Merges happen early at low distances

Connection to Minimum Spanning Tree

single_linkage_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt
 
# Create data demonstrating chaining effect
np.random.seed(42)
 
# Two well-separated compact clusters
cluster1 = np.random.randn(30, 2) * 0.5 + [0, 0]
cluster2 = np.random.randn(30, 2) * 0.5 + [8, 0]
 
# Bridge of noise points connecting them
bridge = np.array([[i, 0] for i in np.linspace(1, 7, 10)])
bridge += np.random.randn(10, 2) * 0.1  # Small noise
 
X = np.vstack([cluster1, bridge, cluster2])
 
# Compare single vs complete linkage
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
 
# Single linkage
Z_single = linkage(X, method='single')
labels_single = fcluster(Z_single, t=2, criterion='maxclust')
 
axes[0, 0].scatter(X[:, 0], X[:, 1], c=labels_single, cmap='viridis', s=50)
axes[0, 0].set_title('Single Linkage: 2 Clusters')
axes[0, 0].set_xlabel('Feature 1')
axes[0, 0].set_ylabel('Feature 2')
 
dendrogram(Z_single, ax=axes[0, 1], truncate_mode='lastp', p=20)
axes[0, 1].set_title('Single Linkage Dendrogram')
axes[0, 1].set_ylabel('Distance')
 
# Complete linkage
Z_complete = linkage(X, method='complete')
labels_complete = fcluster(Z_complete, t=2, criterion='maxclust')
 
axes[1, 0].scatter(X[:, 0], X[:, 1], c=labels_complete, cmap='viridis', s=50)
axes[1, 0].set_title('Complete Linkage: 2 Clusters')
axes[1, 0].set_xlabel('Feature 1')
axes[1, 0].set_ylabel('Feature 2')
 
dendrogram(Z_complete, ax=axes[1, 1], truncate_mode='lastp', p=20)
axes[1, 1].set_title('Complete Linkage Dendrogram')
axes[1, 1].set_ylabel('Distance')
 
plt.tight_layout()
plt.savefig('linkage_comparison.png', dpi=150)
print("Visualization saved to linkage_comparison.png")
 
# The chaining effect: single linkage merges through the bridge
print(f"\nSingle linkage found {len(np.unique(labels_single))} clusters")
print(f"Complete linkage found {len(np.unique(labels_complete))} clusters")
 
# Analyze cluster compositions
for label in np.unique(labels_single):
    size = np.sum(labels_single == label)
    print(f"  Single cluster {label}: {size} points")

Complete Linkage (Farthest Neighbor)

Mathematical Definition:

$$d_{\text{complete}}(C_i, C_j) = \max_{x \in C_i, y \in C_j} d(x, y)$$

Geometric Intuition:

Behavior Characteristics:

Complete Linkage Advantages

•Compact clusters: Produces tight, spherical clusters
•Resistant to chaining: Not fooled by noise bridges
•Bounded diameter: Merged cluster diameter ≤ merge distance
•Well-separated hierarchy: Clear distinction between levels
•Reproducible: Deterministic given the data

Complete Linkage Disadvantages

•Cannot find non-convex shapes: Biased toward spherical clusters
•Sensitive to outliers: Single far point distorts cluster diameter
•Size imbalance: May break large clusters to merge small ones
•Crowding: Forces equal diameter, not natural shapes
•Less efficient: O(n² log n), no O(n²) algorithm known

The Diameter Guarantee

When to Use Complete Linkage:

When you expect compact, roughly spherical clusters
When you need a diameter guarantee on cluster size
When data contains bridge noise that might fool single linkage
When clusters should have similar sizes and densities
When working with categorical or discrete data where elongated structures are unlikely

When to Avoid Complete Linkage:

When clusters have non-convex or elongated shapes
When clusters have significantly different densities
When outliers are common in the data
When natural clusters have varying sizes

Average Linkage (UPGMA)

Mathematical Definition:

$$d_{\text{UPGMA}}(C_i, C_j) = \frac{1}{|C_i| \cdot |C_j|} \sum_{x \in C_i} \sum_{y \in C_j} d(x, y)$$

Geometric Intuition:

UPGMA vs WPGMA:

There are two variants of average linkage:

UPGMA (Unweighted): Weights each point equally. A cluster with 100 points has more influence than a cluster with 10 points.
WPGMA (Weighted): Weights each sub-cluster equally. Uses the average of the two sub-cluster distances to any third cluster, regardless of sub-cluster sizes.

$$d_{\text{WPGMA}}(C_i \cup C_j, C_k) = \frac{d(C_i, C_k) + d(C_j, C_k)}{2}$$

UPGMA vs WPGMA Comparison
Aspect	UPGMA (Unweighted)	WPGMA (Weighted)
Weighting	By number of points	By number of sub-clusters
Large cluster influence	Strong (proportional to size)	Equal to small clusters
Mathematical property	Preserves ultrametric	May violate ultrametric
Use case	When all points equally important	When cluster history matters
Sensitivity to merging order	Lower	Higher

Average Linkage as a Compromise

average_linkage_variants.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
 
# Create asymmetric clusters to show UPGMA vs WPGMA difference
np.random.seed(42)
 
# Small cluster (10 points)
small_cluster = np.random.randn(10, 2) * 0.3 + [0, 0]
# Large cluster (50 points)
large_cluster = np.random.randn(50, 2) * 0.3 + [2, 0]
# Medium cluster (20 points)
medium_cluster = np.random.randn(20, 2) * 0.3 + [1, 2]
 
X = np.vstack([small_cluster, large_cluster, medium_cluster])
 
# Compare UPGMA (average) vs WPGMA (weighted)
Z_upgma = linkage(X, method='average')   # UPGMA
Z_wpgma = linkage(X, method='weighted')  # WPGMA
 
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
 
# Scatter plot
colors = ['red'] * 10 + ['blue'] * 50 + ['green'] * 20
axes[0].scatter(X[:, 0], X[:, 1], c=colors, s=50, alpha=0.7)
axes[0].set_title('Data: Unequal Cluster Sizes')
axes[0].set_xlabel('Feature 1')
axes[0].set_ylabel('Feature 2')
axes[0].legend(['Small (10)', 'Large (50)', 'Medium (20)'], loc='upper right')
 
# UPGMA dendrogram
dendrogram(Z_upgma, ax=axes[1], truncate_mode='lastp', p=15)
axes[1].set_title('UPGMA (Average) Linkage')
axes[1].set_ylabel('Distance')
axes[1].set_xlabel('Cluster')
 
# WPGMA dendrogram  
dendrogram(Z_wpgma, ax=axes[2], truncate_mode='lastp', p=15)
axes[2].set_title('WPGMA (Weighted) Linkage')
axes[2].set_ylabel('Distance')
axes[2].set_xlabel('Cluster')
 
plt.tight_layout()
plt.savefig('average_linkage_variants.png', dpi=150)
print("Comparison saved to average_linkage_variants.png")
 
# Examine the final merges
print("\n=== Final Merges ===")
print("\nUPGMA (last 3 merges):")
for i in range(-3, 0):
    print(f"  Distance: {Z_upgma[i, 2]:.4f}, Combined size: {int(Z_upgma[i, 3])}")
 
print("\nWPGMA (last 3 merges):")
for i in range(-3, 0):
    print(f"  Distance: {Z_wpgma[i, 2]:.4f}, Combined size: {int(Z_wpgma[i, 3])}")

Average Linkage Advantages

•Balanced behavior: Compromise between single and complete
•Less sensitive to outliers: One extreme point has limited effect
•Robust: Works well across diverse data distributions
•Widely used: Standard in bioinformatics (phylogenetics)
•Intuitive: 'Average distance' is easy to explain

Average Linkage Disadvantages

•Biased toward equal sizes: May split natural large clusters
•Still assumes convexity: Struggles with elongated shapes
•Computationally expensive: O(n² × m) for m points per cluster
•No strong theoretical guarantees: Heuristic compromise
•UPGMA vs WPGMA confusion: Different results, same name

Ward's Linkage (Minimum Variance)

Mathematical Definition:

Ward's method minimizes the increase in total within-cluster variance (sum of squared deviations from cluster centroids) when merging two clusters:

$$d_{\text{Ward}}(C_i, C_j) = \sqrt{\frac{2 \cdot n_i \cdot n_j}{n_i + n_j}} \cdot |\bar{C}_i - \bar{C}_j|$$

where $n_i$ and $n_j$ are cluster sizes, and $\bar{C}_i$ and $\bar{C}_j$ are centroids.

Equivalently, the Ward distance is the increase in total intra-cluster variance:

$$\Delta V = V(C_i \cup C_j) - V(C_i) - V(C_j)$$

where $V(C) = \sum_{x \in C} |x - \bar{C}|^2$ is the within-cluster variance.

Geometric Intuition:

Ward's method asks: "How much will our clusters become more spread out if we merge these two?" It greedily minimizes variance increase at each step, similar to K-Means but building a full hierarchy.

Connection to K-Means:

Euclidean Distance Required

wards_linkage_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import numpy as np
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
 
# Generate data with known cluster structure
np.random.seed(42)
n_samples = 150
 
# Three well-separated Gaussian clusters (ideal for Ward's method)
X = np.vstack([
    np.random.randn(50, 2) * 0.8 + [0, 0],
    np.random.randn(50, 2) * 0.8 + [4, 4],
    np.random.randn(50, 2) * 0.8 + [8, 0]
])
 
# Apply Ward's linkage
Z_ward = linkage(X, method='ward')
 
# Compare with K-Means for k=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans_labels = kmeans.fit_predict(X)
 
# Cut Ward's dendrogram at 3 clusters
ward_labels = fcluster(Z_ward, t=3, criterion='maxclust')
 
# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
 
# Ward clustering result
axes[0, 0].scatter(X[:, 0], X[:, 1], c=ward_labels, cmap='viridis', s=50)
axes[0, 0].set_title("Ward's Linkage (3 clusters)")
axes[0, 0].set_xlabel('Feature 1')
axes[0, 0].set_ylabel('Feature 2')
 
# K-Means result
axes[0, 1].scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='viridis', s=50)
axes[0, 1].scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], 
                   c='red', marker='X', s=200, edgecolors='black')
axes[0, 1].set_title('K-Means (k=3)')
axes[0, 1].set_xlabel('Feature 1')
axes[0, 1].set_ylabel('Feature 2')
 
# Ward dendrogram
dendrogram(Z_ward, ax=axes[1, 0], truncate_mode='lastp', p=20)
axes[1, 0].set_title("Ward's Linkage Dendrogram")
axes[1, 0].set_ylabel('Distance (Variance Increase)')
axes[1, 0].axhline(y=Z_ward[-3, 2], color='r', linestyle='--', label='3-cluster cut')
axes[1, 0].legend()
 
# Within-cluster variance at each merge
variances = Z_ward[:, 2]  # The 3rd column is the merge distance (variance increase)
cumulative_variance = np.cumsum(variances[::-1])[::-1]  # Total variance reduction
 
axes[1, 1].plot(range(1, len(cumulative_variance) + 1), cumulative_variance)
axes[1, 1].set_xlabel('Number of Clusters')
axes[1, 1].set_ylabel('Cumulative Within-Cluster Variance')
axes[1, 1].set_title('Variance vs Number of Clusters (Elbow Analysis)')
axes[1, 1].axvline(x=3, color='r', linestyle='--', label='Elbow at k=3')
axes[1, 1].legend()
axes[1, 1].set_xlim(1, 20)
 
plt.tight_layout()
plt.savefig('ward_vs_kmeans.png', dpi=150)
print("Comparison saved to ward_vs_kmeans.png")
 
# Quantify agreement between Ward and K-Means
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(kmeans_labels, ward_labels)
print(f"\nAdjusted Rand Index (Ward vs K-Means): {ari:.4f}")
print("(1.0 = perfect agreement, 0.0 = random)")

Ward's Linkage Advantages

•Compact, spherical clusters: Similar to K-Means results
•Minimizes variance: Clear objective function
•Works well with Gaussian clusters: Theoretically optimal
•Produces balanced trees: Equal-sized clusters preferred
•Good for elbow analysis: Variance curve aids k selection

Ward's Linkage Disadvantages

•Requires Euclidean space: Cannot use arbitrary distances
•Biased toward equal sizes: May split natural large clusters
•Assumes sphericity: Poor with elongated or irregular shapes
•Sensitive to outliers: Variance is not robust
•Not scale-invariant: Feature scaling affects results

Centroid and Median Linkages

Centroid Linkage (UPGMC):

$$d_{\text{centroid}}(C_i, C_j) = |\bar{C}_i - \bar{C}_j|$$

Centroid linkage defines inter-cluster distance as the Euclidean distance between cluster centroids (means). After merging, the new centroid is the weighted average:

$$\bar{C}_{ij} = \frac{n_i \bar{C}_i + n_j \bar{C}_j}{n_i + n_j}$$

Median Linkage (WPGMC):

Median linkage also uses centroids but ignores cluster sizes:

$$\bar{C}_{ij} = \frac{\bar{C}_i + \bar{C}_j}{2}$$

The merged cluster centroid is the midpoint of the two sub-cluster centroids, regardless of how many points are in each.

The Inversion Problem:

Inversions: When Dendrograms Break

Centroid vs Median Linkage
Aspect	Centroid (UPGMC)	Median (WPGMC)
Centroid computation	Weighted by size	Unweighted (midpoint)
Large cluster influence	Strong	Equal to small clusters
Inversion risk	Moderate	Higher
Requires Euclidean	Yes (needs centroids)	Yes (needs centroids)
Common use case	Rare (use Ward's instead)	Very rare

Why These Linkages Are Rarely Used:

Inversions: Neither method guarantees monotonic dendrograms
Ward is strictly better: For variance-based clustering, Ward's method is more principled
Euclidean requirement: Like Ward, but without Ward's theoretical justification
Conceptual confusion: The "distance to centroid" conflates points and clusters

When to Consider Centroid Linkage:

Despite its limitations, centroid linkage can be useful when:

You specifically care about distances between cluster centers, not pairwise point distances
You're working with data where the centroid is a meaningful representation (e.g., document clustering with TF-IDF)
You want behavior similar to K-Means but with a hierarchical output

In practice, Ward's method is almost always preferred over centroid/median linkages for Euclidean data.

Choosing the Right Linkage

Selecting the appropriate linkage method depends on your data characteristics, domain knowledge, and clustering goals. Here's a systematic decision framework:

Linkage Selection Guide
If Your Data Has...	Consider...	Avoid...
Compact, spherical clusters	Ward, Complete	Single
Elongated or chain-like clusters	Single	Complete, Ward
Unknown structure (exploratory)	Average (UPGMA)	Centroid, Median
Significant outliers	Average, Complete	Single, Ward
Varying cluster densities	Single, Average	Complete, Ward
Euclidean features	Ward (variance focus)	N/A
Non-Euclidean distances	Single, Complete, Average	Ward, Centroid
Need diameter guarantee	Complete	All others
Very large n (>10,000)	Single (SLINK)	Ward (memory-intensive)

The Practical Default Strategy

linkage_comparison_all.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import numpy as np
from scipy.cluster.hierarchy import linkage, fcluster
from sklearn.metrics import silhouette_score, calinski_harabasz_score
import matplotlib.pyplot as plt
 
# Generate 2D data with known structure
np.random.seed(42)
 
# Three clusters with different characteristics
cluster1 = np.random.randn(50, 2) * 0.5 + [0, 0]       # Compact
cluster2 = np.random.randn(50, 2) * [0.3, 2] + [4, 0]  # Elongated
cluster3 = np.random.randn(50, 2) * 0.5 + [2, 4]       # Compact
 
X = np.vstack([cluster1, cluster2, cluster3])
true_labels = np.array([0]*50 + [1]*50 + [2]*50)
 
# Compare all standard linkages
linkages = ['single', 'complete', 'average', 'ward']
results = {}
 
print("Linkage Comparison (3 clusters):")
print("-" * 70)
print(f"{'Linkage':<12} {'Silhouette':>12} {'Calinski-Harabasz':>18} {'ARI vs True':>12}")
print("-" * 70)
 
fig, axes = plt.subplots(2, 4, figsize=(18, 9))
 
for idx, link in enumerate(linkages):
    # Compute linkage
    Z = linkage(X, method=link)
    labels = fcluster(Z, t=3, criterion='maxclust')
    
    # Compute metrics
    sil = silhouette_score(X, labels)
    ch = calinski_harabasz_score(X, labels)
    
    from sklearn.metrics import adjusted_rand_score
    ari = adjusted_rand_score(true_labels, labels)
    
    results[link] = {'silhouette': sil, 'ch': ch, 'ari': ari, 'labels': labels}
    
    print(f"{link:<12} {sil:>12.4f} {ch:>18.2f} {ari:>12.4f}")
    
    # Scatter plot
    axes[0, idx].scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=30)
    axes[0, idx].set_title(f'{link.capitalize()} Linkage')
    axes[0, idx].set_xlabel('Feature 1')
    if idx == 0:
        axes[0, idx].set_ylabel('Feature 2')
    
    # Dendrogram
    from scipy.cluster.hierarchy import dendrogram
    dendrogram(Z, ax=axes[1, idx], truncate_mode='lastp', p=15, no_labels=True)
    axes[1, idx].set_title(f'{link.capitalize()} Dendrogram')
    if idx == 0:
        axes[1, idx].set_ylabel('Distance')
 
print("-" * 70)
 
plt.tight_layout()
plt.savefig('all_linkages_comparison.png', dpi=150)
print("\nVisualization saved to all_linkages_comparison.png")

Summary: Linkage Methods

We've comprehensively covered the linkage methods that define how agglomerative clustering measures inter-cluster distance—the central design decision in hierarchical clustering.

Key Takeaways

•Single linkage uses minimum pairwise distance, producing chain-like clusters that can detect non-convex shapes but are sensitive to noise bridges.
•Complete linkage uses maximum pairwise distance, producing compact spherical clusters with bounded diameter but biased toward equal sizes.
•Average linkage (UPGMA) uses mean pairwise distance, balancing between single and complete; the safe default for exploratory analysis.
•Ward's linkage minimizes within-cluster variance increase, producing K-Means-like results; optimal for Gaussian clusters but requires Euclidean space.
•Centroid/median linkages use centroid distances but can produce dendrogram inversions; rarely preferred over Ward's method.
•No universally best linkage—choice depends on data structure, domain knowledge, and clustering goals.

What's Next:

Page Complete