Loading content...
In unsupervised machine learning, clustering algorithms group similar data points together without prior knowledge of labels. However, evaluating the quality of these clusters is a non-trivial challenge since we have no ground truth to compare against. The Cluster Cohesion Coefficient (commonly known as the Silhouette Score) provides an elegant solution to this problem by measuring how well each data point fits within its assigned cluster.
The Cluster Cohesion Coefficient quantifies two essential properties of good clustering:
Mathematical Formulation:
For each data point i in the dataset, we calculate:
a(i) — The mean intra-cluster distance: the average distance from point i to all other points within the same cluster. This measures how tightly the point is bound to its cluster.
$$a(i) = \frac{1}{|C_i| - 1} \sum_{j \in C_i, j eq i} d(i, j)$$
b(i) — The mean nearest-cluster distance: for each cluster that point i does NOT belong to, calculate the mean distance from i to all points in that cluster. Then take the minimum across all such clusters. This represents the distance to the "next best" cluster.
$$b(i) = \min_{k eq C_i} \frac{1}{|C_k|} \sum_{j \in C_k} d(i, j)$$
s(i) — The cohesion coefficient for point i:
$$s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))}$$
The coefficient ranges from -1 to +1:
Overall Score: The overall Cluster Cohesion Coefficient is the mean of all individual coefficients:
$$S = \frac{1}{N} \sum_{i=1}^{N} s(i)$$
Your Task: Implement a function that computes the Cluster Cohesion Coefficient for a given dataset and cluster assignments. Use Euclidean distance as the distance metric. Return the result rounded to 4 decimal places.
Edge Cases:
X = [[0, 0], [1, 0], [10, 0], [11, 0]]
labels = [0, 0, 1, 1]0.8997We have two well-separated clusters:
Cluster 0: Points [0, 0] and [1, 0] Cluster 1: Points [10, 0] and [11, 0]
For point [0, 0]: • a(i) = distance to [1, 0] = 1.0 • Mean distance to Cluster 1: (distance to [10, 0] + distance to [11, 0]) / 2 = (10 + 11) / 2 = 10.5 • b(i) = 10.5 (only one other cluster) • s(i) = (10.5 - 1.0) / max(1.0, 10.5) = 9.5 / 10.5 ≈ 0.905
Similar calculations for all points yield a high mean cohesion coefficient of 0.8997, indicating excellent cluster separation and cohesion.
X = [[0, 0], [1, 1], [10, 10], [11, 11], [20, 0], [21, 1]]
labels = [0, 0, 1, 1, 2, 2]0.8992Three distinct clusters are formed:
Cluster 0: [0, 0] and [1, 1] (lower-left region) Cluster 1: [10, 10] and [11, 11] (center region) Cluster 2: [20, 0] and [21, 1] (right region)
Each cluster has tight intra-cluster distances (√2 ≈ 1.414) while inter-cluster distances are much larger. The coefficient of 0.8992 confirms that the clustering effectively separates the three groups with excellent cohesion.
X = [[0, 0], [1, 1], [2, 2], [3, 3]]
labels = [0, 0, 0, 0]0.0All four points are assigned to a single cluster (Cluster 0). With only one cluster present, there are no neighboring clusters to compute inter-cluster distances. Since the cohesion coefficient requires comparison between at least two clusters, the function returns 0.0 for this degenerate case.
Constraints