0/318

00:00:00

Description

Editorial

Density-Based Spatial Clustering

MEDIUM20 pts

Clustering is a fundamental unsupervised learning technique that groups similar data points together without predefined labels. Among various clustering approaches, density-based clustering stands out for its ability to discover clusters of arbitrary shapes and automatically identify outliers in the data.

The core idea behind density-based clustering is elegantly simple: clusters are regions of high point density separated by regions of low density. Unlike centroid-based methods (like K-Means), this approach doesn't require specifying the number of clusters beforehand and can naturally detect irregularly shaped clusters.

Algorithm Concepts

Core Points: A point is considered a core point if it has at least min_neighbors other points within a distance of epsilon (including itself in some definitions). Core points are the foundation of dense regions.

Neighborhood: The epsilon-neighborhood of a point p consists of all points within Euclidean distance epsilon from p: $$N_{\epsilon}(p) = {q \in X : |p - q|_2 \leq \epsilon}$$

Density Reachability: A point q is density-reachable from a core point p if there exists a chain of core points connecting them, where each consecutive pair is within epsilon distance.

Noise Points: Points that are not core points and are not density-reachable from any core point are classified as noise (outliers) and assigned the label -1.

Clustering Process

Identify Core Points: For each point, count neighbors within epsilon distance. Points with at least min_neighbors neighbors become core points.
Form Clusters: Starting from an unvisited core point, recursively expand the cluster by including all density-reachable points. All connected core points and their border points form a single cluster.
Label Noise: Any remaining points not assigned to a cluster are marked as noise with label -1.

Your Task

Write a Python function density_cluster that implements density-based spatial clustering. The function should:

Accept a 2D numpy array X of shape (n_samples, n_features) containing the data points
Accept epsilon, the maximum distance for points to be considered neighbors
Accept min_neighbors, the minimum number of points required to form a dense core
Return a 1D numpy array of cluster labels (0, 1, 2, ...) for each point, where -1 indicates noise points
Use Euclidean distance for computing pairwise distances

Cluster labels should be assigned sequentially starting from 0 in the order clusters are discovered.

Example

Input

X = np.array([[1.0, 1.0], [1.5, 1.5], [2.0, 1.0], [8.0, 8.0], [8.5, 8.5], [9.0, 8.0]])
epsilon = 2.0
min_neighbors = 2

Output

[0, 0, 0, 1, 1, 1]

Explanation

The dataset contains 6 points forming two distinct spatial groups.

Cluster 0 (first group): • Points [1.0, 1.0], [1.5, 1.5], and [2.0, 1.0] are close together • Distance between [1.0, 1.0] and [1.5, 1.5] ≈ 0.71 < 2.0 ✓ • Distance between [1.0, 1.0] and [2.0, 1.0] = 1.0 < 2.0 ✓ • Each point has at least 2 neighbors within epsilon=2.0, so all are core points • They form a connected dense region → Cluster 0

Cluster 1 (second group): • Points [8.0, 8.0], [8.5, 8.5], and [9.0, 8.0] form another tight group • All pairwise distances are well under epsilon=2.0 • They form another connected dense region → Cluster 1

Separation Check: • Distance between [2.0, 1.0] and [8.0, 8.0] ≈ 9.22 >> 2.0 • The two groups are clearly separated with no density connection

No points are noise since all belong to dense regions.

Example

Input

X = np.array([[0.0, 0.0], [0.5, 0.5], [1.0, 0.0], [10.0, 10.0], [100.0, 100.0]])
epsilon = 2.0
min_neighbors = 2

Output

[0, 0, 0, -1, -1]

Explanation

This example demonstrates noise detection capability.

Cluster 0: • Points [0.0, 0.0], [0.5, 0.5], and [1.0, 0.0] are clustered near the origin • All pairwise distances are < 2.0, and each has at least 2 neighbors • These points form a dense region → Cluster 0

Noise Points (label = -1): • Point [10.0, 10.0] is isolated—no other points within epsilon=2.0 • Point [100.0, 100.0] is extremely isolated in the feature space • Neither point has enough neighbors to be a core point • Neither is reachable from the dense cluster • Both are classified as noise → Label -1

This demonstrates the algorithm's inherent outlier detection capability.

Example

Input

X = np.array([[0.0, 0.0], [0.5, 0.0], [0.0, 0.5], [5.0, 5.0], [5.5, 5.0], [5.0, 5.5], [10.0, 0.0], [10.5, 0.0], [10.0, 0.5]])
epsilon = 1.0
min_neighbors = 2

Output

[0, 0, 0, 1, 1, 1, 2, 2, 2]

Explanation

This example shows discovery of multiple well-separated clusters.

Cluster 0 (near origin): • Points [0.0, 0.0], [0.5, 0.0], [0.0, 0.5] form a tight triangle • All pairwise distances ≤ 0.71 < 1.0 • Dense region with 3 mutually reachable points → Cluster 0

Cluster 1 (center region): • Points [5.0, 5.0], [5.5, 5.0], [5.0, 5.5] form another tight group • Positioned around coordinates (5, 5) • Dense region → Cluster 1

Cluster 2 (right side): • Points [10.0, 0.0], [10.5, 0.0], [10.0, 0.5] form the third group • Located along the x-axis around x=10 • Dense region → Cluster 2

With epsilon=1.0 (tighter threshold), the algorithm successfully identifies 3 distinct clusters. All points are core points as each has exactly 2 neighbors within the epsilon distance.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_samples ≤ 1000 (number of data points)
1 ≤ n_features ≤ 50 (dimensionality of each point)
0.001 ≤ epsilon ≤ 1000.0
1 ≤ min_neighbors ≤ n_samples
-10⁶ ≤ X[i][j] ≤ 10⁶ (coordinate values)
All input points have the same dimensionality
Cluster labels must be assigned sequentially starting from 0

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,1],[1.5,1.5],[2,1],[8,8],[8.5,8.5],[9,8]]

eps =

min_samples =

Algorithm Concepts

Neighborhood: The epsilon-neighborhood of a point p consists of all points within Euclidean distance epsilon from p: $$N_{\epsilon}(p) = {q \in X : |p - q|_2 \leq \epsilon}$$

Density Reachability: A point q is density-reachable from a core point p if there exists a chain of core points connecting them, where each consecutive pair is within epsilon distance.

Noise Points: Points that are not core points and are not density-reachable from any core point are classified as noise (outliers) and assigned the label -1.

Clustering Process

Identify Core Points: For each point, count neighbors within epsilon distance. Points with at least min_neighbors neighbors become core points.

Form Clusters: Starting from an unvisited core point, recursively expand the cluster by including all density-reachable points. All connected core points and their border points form a single cluster.

Label Noise: Any remaining points not assigned to a cluster are marked as noise with label -1.

Your Task

Write a Python function density_cluster that implements density-based spatial clustering. The function should:

Accept a 2D numpy array X of shape (n_samples, n_features) containing the data points

Accept epsilon, the maximum distance for points to be considered neighbors

Accept min_neighbors, the minimum number of points required to form a dense core

Return a 1D numpy array of cluster labels (0, 1, 2, ...) for each point, where -1 indicates noise points

Use Euclidean distance for computing pairwise distances

Cluster labels should be assigned sequentially starting from 0 in the order clusters are discovered.

Density-Based Spatial Clustering

Algorithm Concepts

Clustering Process

Your Task

Hints

Density-Based Spatial Clustering

Algorithm Concepts

Clustering Process

Your Task

Hints