Loading content...
Centroid-Based Partitioning is a fundamental unsupervised learning technique used to discover natural groupings within unlabeled datasets. The algorithm partitions n data points into k distinct clusters, where each cluster is represented by its centroid—the geometric center of all points belonging to that cluster.
The algorithm operates through an iterative refinement process that alternates between two key steps:
Each data point is assigned to the cluster whose centroid is nearest (typically using Euclidean distance). For a point p and centroids c₁, c₂, ..., cₖ, the point is assigned to cluster j where:
$$j = \argmin_{i} ||p - c_i||_2$$
After all points are assigned, each centroid is recalculated as the arithmetic mean of all points in its cluster:
$$c_i = \frac{1}{|S_i|} \sum_{p \in S_i} p$$
where Sᵢ is the set of points assigned to cluster i and |Sᵢ| is the number of points in that cluster.
The algorithm continues iterating until one of the following conditions is met:
Implement the centroid-based partitioning algorithm that:
points = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
k = 2
initial_centroids = [[1, 1], [10, 1]]
max_iterations = 10[[1.0, 2.0], [10.0, 2.0]]The algorithm partitions 6 points into 2 clusters:
Initial State: Centroids at [1, 1] and [10, 1]
Iteration 1 - Assignment: • Points [1,2], [1,4], [1,0] → Cluster 1 (closer to [1,1]) • Points [10,2], [10,4], [10,0] → Cluster 2 (closer to [10,1])
Iteration 1 - Update: • Centroid 1: mean([[1,2], [1,4], [1,0]]) = [1.0, 2.0] • Centroid 2: mean([[10,2], [10,4], [10,0]]) = [10.0, 2.0]
The algorithm converges as these centroids represent the true centers of the two natural clusters.
points = [[0, 0], [1, 1], [0, 1], [5, 5], [6, 5], [5, 6], [10, 0], [11, 1], [10, 1]]
k = 3
initial_centroids = [[0, 0], [5, 5], [10, 0]]
max_iterations = 10[[0.3333, 0.6667], [5.3333, 5.3333], [10.3333, 0.6667]]Nine points are partitioned into 3 clusters:
Cluster 1: Points near origin → [0,0], [1,1], [0,1] • Centroid: [(0+1+0)/3, (0+1+1)/3] = [0.3333, 0.6667]
Cluster 2: Points near center → [5,5], [6,5], [5,6] • Centroid: [(5+6+5)/3, (5+5+6)/3] = [5.3333, 5.3333]
Cluster 3: Points near right side → [10,0], [11,1], [10,1] • Centroid: [(10+11+10)/3, (0+1+1)/3] = [10.3333, 0.6667]
Each cluster naturally forms around its respective corner of the data space.
points = [[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]]
k = 1
initial_centroids = [[0, 0]]
max_iterations = 10[[3.0, 3.0]]With k=1, all points belong to a single cluster:
Single Cluster Centroid: • All 5 points are assigned to the only cluster • Centroid = mean of all points:
The algorithm converges after the first update, placing the centroid at the exact center of the diagonal line of points.
Constraints