0/318

00:00:00

Description

Editorial

Centroid-Based Data Partitioning

MEDIUM20 pts

Centroid-Based Partitioning is a fundamental unsupervised learning technique used to discover natural groupings within unlabeled datasets. The algorithm partitions n data points into k distinct clusters, where each cluster is represented by its centroid—the geometric center of all points belonging to that cluster.

Algorithm Overview

The algorithm operates through an iterative refinement process that alternates between two key steps:

Step 1: Assignment Phase

Each data point is assigned to the cluster whose centroid is nearest (typically using Euclidean distance). For a point p and centroids c₁, c₂, ..., cₖ, the point is assigned to cluster j where:

$$j = \argmin_{i} ||p - c_i||_2$$

Step 2: Update Phase

After all points are assigned, each centroid is recalculated as the arithmetic mean of all points in its cluster:

$$c_i = \frac{1}{|S_i|} \sum_{p \in S_i} p$$

where Sᵢ is the set of points assigned to cluster i and |Sᵢ| is the number of points in that cluster.

Convergence

The algorithm continues iterating until one of the following conditions is met:

Convergence: Centroids no longer change between iterations (stable state reached)
Iteration Limit: The maximum number of allowed iterations is reached

Your Task

Implement the centroid-based partitioning algorithm that:

Takes a set of data points, the desired number of clusters k, initial centroid positions, and a maximum iteration count
Returns the final centroid positions after the algorithm converges or reaches the iteration limit
Returns centroids rounded to 4 decimal places

Example

Input

points = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
k = 2
initial_centroids = [[1, 1], [10, 1]]
max_iterations = 10

Output

[[1.0, 2.0], [10.0, 2.0]]

Explanation

The algorithm partitions 6 points into 2 clusters:

Initial State: Centroids at [1, 1] and [10, 1]

Iteration 1 - Assignment: • Points [1,2], [1,4], [1,0] → Cluster 1 (closer to [1,1]) • Points [10,2], [10,4], [10,0] → Cluster 2 (closer to [10,1])

Iteration 1 - Update: • Centroid 1: mean([[1,2], [1,4], [1,0]]) = [1.0, 2.0] • Centroid 2: mean([[10,2], [10,4], [10,0]]) = [10.0, 2.0]

The algorithm converges as these centroids represent the true centers of the two natural clusters.

Example

Input

points = [[0, 0], [1, 1], [0, 1], [5, 5], [6, 5], [5, 6], [10, 0], [11, 1], [10, 1]]
k = 3
initial_centroids = [[0, 0], [5, 5], [10, 0]]
max_iterations = 10

Output

[[0.3333, 0.6667], [5.3333, 5.3333], [10.3333, 0.6667]]

Explanation

Nine points are partitioned into 3 clusters:

Cluster 1: Points near origin → [0,0], [1,1], [0,1] • Centroid: [(0+1+0)/3, (0+1+1)/3] = [0.3333, 0.6667]

Cluster 2: Points near center → [5,5], [6,5], [5,6] • Centroid: [(5+6+5)/3, (5+5+6)/3] = [5.3333, 5.3333]

Cluster 3: Points near right side → [10,0], [11,1], [10,1] • Centroid: [(10+11+10)/3, (0+1+1)/3] = [10.3333, 0.6667]

Each cluster naturally forms around its respective corner of the data space.

Example

Input

points = [[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]]
k = 1
initial_centroids = [[0, 0]]
max_iterations = 10

Output

[[3.0, 3.0]]

Explanation

With k=1, all points belong to a single cluster:

Single Cluster Centroid: • All 5 points are assigned to the only cluster • Centroid = mean of all points:

x: (1+2+3+4+5)/5 = 3.0
y: (1+2+3+4+5)/5 = 3.0

The algorithm converges after the first update, placing the centroid at the exact center of the diagonal line of points.

Accepted0/0·0% Acceptance

Constraints

1 ≤ number of points ≤ 1000
1 ≤ k ≤ number of points
Each point is a list of d coordinates (d ≥ 1)
All points have the same dimensionality
initial_centroids has exactly k centroids
1 ≤ max_iterations ≤ 1000
-10⁶ ≤ coordinate values ≤ 10⁶
Round final centroids to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

k =

points =

[[1,2],[1,4],[1,0],[10,2],[10,4],[10,0]]

max_iterations =

initial_centroids =

[[1,1],[10,1]]

Algorithm Overview

The algorithm operates through an iterative refinement process that alternates between two key steps:

Step 1: Assignment Phase

$$j = \argmin_{i} ||p - c_i||_2$$

Step 2: Update Phase

After all points are assigned, each centroid is recalculated as the arithmetic mean of all points in its cluster:

$$c_i = \frac{1}{|S_i|} \sum_{p \in S_i} p$$

where Sᵢ is the set of points assigned to cluster i and |Sᵢ| is the number of points in that cluster.

Your Task

Implement the centroid-based partitioning algorithm that:

Takes a set of data points, the desired number of clusters k, initial centroid positions, and a maximum iteration count

Returns the final centroid positions after the algorithm converges or reaches the iteration limit

Returns centroids rounded to 4 decimal places

Centroid-Based Data Partitioning

Algorithm Overview

Step 1: Assignment Phase

Step 2: Update Phase

Convergence

Your Task

Hints

Centroid-Based Data Partitioning

Algorithm Overview

Step 1: Assignment Phase

Step 2: Update Phase

Convergence

Your Task

Hints