0/318

00:00:00

Description

Editorial

Cluster Cohesion Coefficient for Clustering Quality Assessment

MEDIUM20 pts

In unsupervised machine learning, clustering algorithms group similar data points together without prior knowledge of labels. However, evaluating the quality of these clusters is a non-trivial challenge since we have no ground truth to compare against. The Cluster Cohesion Coefficient (commonly known as the Silhouette Score) provides an elegant solution to this problem by measuring how well each data point fits within its assigned cluster.

The Cluster Cohesion Coefficient quantifies two essential properties of good clustering:

Intra-cluster cohesion: How close a point is to other members of its own cluster
Inter-cluster separation: How far a point is from members of the nearest neighboring cluster

Mathematical Formulation:

For each data point i in the dataset, we calculate:

a(i) — The mean intra-cluster distance: the average distance from point i to all other points within the same cluster. This measures how tightly the point is bound to its cluster.

$$a(i) = \frac{1}{|C_i| - 1} \sum_{j \in C_i, j eq i} d(i, j)$$

b(i) — The mean nearest-cluster distance: for each cluster that point i does NOT belong to, calculate the mean distance from i to all points in that cluster. Then take the minimum across all such clusters. This represents the distance to the "next best" cluster.

$$b(i) = \min_{k eq C_i} \frac{1}{|C_k|} \sum_{j \in C_k} d(i, j)$$

s(i) — The cohesion coefficient for point i:

$$s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))}$$

The coefficient ranges from -1 to +1:

s(i) ≈ +1: Point is well-matched to its cluster and poorly matched to neighboring clusters (excellent clustering)
s(i) ≈ 0: Point lies on the boundary between two clusters (ambiguous assignment)
s(i) ≈ -1: Point is likely assigned to the wrong cluster (poor clustering)

Overall Score: The overall Cluster Cohesion Coefficient is the mean of all individual coefficients:

$$S = \frac{1}{N} \sum_{i=1}^{N} s(i)$$

Your Task: Implement a function that computes the Cluster Cohesion Coefficient for a given dataset and cluster assignments. Use Euclidean distance as the distance metric. Return the result rounded to 4 decimal places.

Edge Cases:

If there is only one cluster containing all points, return 0.0 (no inter-cluster comparison is possible)
If every point forms its own singleton cluster, return 0.0 (no intra-cluster comparison is possible)

Example

Input

X = [[0, 0], [1, 0], [10, 0], [11, 0]]
labels = [0, 0, 1, 1]

Output

0.8997

Explanation

We have two well-separated clusters:

Cluster 0: Points [0, 0] and [1, 0] Cluster 1: Points [10, 0] and [11, 0]

For point [0, 0]: • a(i) = distance to [1, 0] = 1.0 • Mean distance to Cluster 1: (distance to [10, 0] + distance to [11, 0]) / 2 = (10 + 11) / 2 = 10.5 • b(i) = 10.5 (only one other cluster) • s(i) = (10.5 - 1.0) / max(1.0, 10.5) = 9.5 / 10.5 ≈ 0.905

Similar calculations for all points yield a high mean cohesion coefficient of 0.8997, indicating excellent cluster separation and cohesion.

Example

Input

X = [[0, 0], [1, 1], [10, 10], [11, 11], [20, 0], [21, 1]]
labels = [0, 0, 1, 1, 2, 2]

Output

0.8992

Explanation

Three distinct clusters are formed:

Cluster 0: [0, 0] and [1, 1] (lower-left region) Cluster 1: [10, 10] and [11, 11] (center region) Cluster 2: [20, 0] and [21, 1] (right region)

Each cluster has tight intra-cluster distances (√2 ≈ 1.414) while inter-cluster distances are much larger. The coefficient of 0.8992 confirms that the clustering effectively separates the three groups with excellent cohesion.

Example

Input

X = [[0, 0], [1, 1], [2, 2], [3, 3]]
labels = [0, 0, 0, 0]

Output

0.0

Explanation

All four points are assigned to a single cluster (Cluster 0). With only one cluster present, there are no neighboring clusters to compute inter-cluster distances. Since the cohesion coefficient requires comparison between at least two clusters, the function returns 0.0 for this degenerate case.

Accepted0/0·0% Acceptance

Constraints

2 ≤ n_samples ≤ 1000 (number of data points)
1 ≤ n_features ≤ 100 (dimensionality of data)
1 ≤ number of clusters ≤ n_samples
-10⁶ ≤ X[i][j] ≤ 10⁶ (coordinate values)
0 ≤ labels[i] < number of clusters
Labels are consecutive integers starting from 0
Use Euclidean distance for all distance calculations

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[0,0],[1,0],[10,0],[11,0]]

labels =

[0,0,1,1]

Cluster Cohesion Coefficient for Clustering Quality Assessment

Hints

Cluster Cohesion Coefficient for Clustering Quality Assessment

Hints