0/318

00:00:00

Description

Editorial

Partition Dataset by Feature Threshold

MEDIUM20 pts

In machine learning, data partitioning is a foundational operation that enables algorithms to split datasets based on specific criteria. This technique is essential for building decision trees, implementing conditional logic in preprocessing pipelines, and creating data subsets for targeted analysis.

Given a dataset represented as a 2D array X with shape (n_samples, n_features), a feature index indicating which column to examine, and a threshold value, your task is to partition the dataset into two distinct subsets:

Subset A: Contains all samples where the value at the specified feature column is greater than or equal to the threshold
Subset B: Contains all samples where the value at the specified feature column is less than the threshold

Mathematical Formulation:

For a sample $x_i$ with feature values $[x_{i,0}, x_{i,1}, ..., x_{i,m}]$, and given feature index $j$ and threshold $t$:

$$\text{Subset A} = {x_i \mid x_{i,j} \geq t}$$ $$\text{Subset B} = {x_i \mid x_{i,j} < t}$$

Important Properties:

The partition is exhaustive: every sample from the original dataset must appear in exactly one of the two subsets
The partition is mutually exclusive: no sample can appear in both subsets simultaneously
The relative ordering of samples within each subset should be preserved from the original dataset
Either subset may be empty if all samples satisfy (or fail to satisfy) the threshold condition

Your Task: Write a Python function that partitions a dataset into two subsets based on the threshold condition for the specified feature. Return the two subsets as a list of two NumPy arrays, with the subset meeting the condition (≥ threshold) listed first, followed by the subset not meeting the condition (< threshold).

Example

Input

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
feature_i = 0
threshold = 5

Output

[array([[5, 6], [7, 8], [9, 10]]), array([[1, 2], [3, 4]])]

Explanation

The dataset is partitioned based on the first column (feature index 0) with threshold 5.

• Samples with column 0 value ≥ 5: [5, 6], [7, 8], [9, 10] → First subset • Samples with column 0 value < 5: [1, 2], [3, 4] → Second subset

The samples are grouped while maintaining their original relative order within each subset.

Example

Input

X = np.array([[1, 5], [2, 3], [4, 7], [6, 2], [8, 9]])
feature_i = 1
threshold = 5

Output

[array([[1, 5], [4, 7], [8, 9]]), array([[2, 3], [6, 2]])]

Explanation

The dataset is partitioned based on the second column (feature index 1) with threshold 5.

• Samples with column 1 value ≥ 5: [1, 5], [4, 7], [8, 9] → First subset • Samples with column 1 value < 5: [2, 3], [6, 2] → Second subset

Note that even though [6, 2] has a larger value in column 0, it's placed in the second subset because its column 1 value (2) is less than 5.

Example

Input

X = np.array([[-5, 2], [-3, 4], [0, 6], [3, 8], [5, 10]])
feature_i = 0
threshold = 0

Output

[array([[0, 6], [3, 8], [5, 10]]), array([[-5, 2], [-3, 4]])]

Explanation

The dataset is partitioned based on the first column (feature index 0) with threshold 0.

• Samples with column 0 value ≥ 0: [0, 6], [3, 8], [5, 10] → First subset (includes the boundary value 0) • Samples with column 0 value < 0: [-5, 2], [-3, 4] → Second subset (negative values only)

This demonstrates handling of negative values and the inclusive nature of the ≥ comparison for the first subset.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_samples ≤ 10,000 (number of rows in X)
1 ≤ n_features ≤ 100 (number of columns in X)
0 ≤ feature_i < n_features
-10⁶ ≤ X[i][j] ≤ 10⁶
-10⁶ ≤ threshold ≤ 10⁶
The dataset X will always be a valid 2D array with consistent row lengths
Both resulting subsets maintain the original relative order of samples

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,2],[3,4],[5,6],[7,8],[9,10]]

feature_i =

threshold =

Partition Dataset by Feature Threshold

Hints

Partition Dataset by Feature Threshold

Hints