00:00:00

Description

Editorial

Computing Gini Impurity for Classification Labels

EASY10 pts

Gini impurity is a fundamental metric used in supervised machine learning to quantify the disorder or heterogeneity within a dataset's class distribution. This measure is central to decision tree algorithms, where it guides the selection of optimal splitting criteria to create the most informative partitions.

The Gini impurity for a node containing samples from multiple classes is calculated using the following formula:

$$\text{Gini Impurity} = 1 - \sum_{k=1}^{K} p_k^2$$

Where:

K is the total number of distinct classes
p_k represents the probability (relative frequency) of class k in the dataset

Intuition Behind Gini Impurity:

A Gini impurity of 0 indicates perfect purity — all samples belong to a single class. This is the ideal outcome after splitting in a decision tree.
The maximum impurity occurs when classes are evenly distributed. For a binary classification problem, this maximum is 0.5 (when both classes have equal probability of 0.5 each).
For multi-class problems with K classes, the maximum Gini impurity is (K-1)/K, achieved when all classes are equally represented.

Your Task:

Implement a function that computes the Gini impurity for a given list of class labels. The function should:

Count the frequency of each unique class in the input
Calculate the probability of each class
Apply the Gini impurity formula
Return the result rounded to 2 decimal places

Example

Input

labels = [0, 1, 1, 1, 0]

Output

0.48

Explanation

The dataset contains 5 samples: 2 labeled as class 0 and 3 labeled as class 1.

Step 1: Calculate class probabilities: • p₀ = 2/5 = 0.4 (probability of class 0) • p₁ = 3/5 = 0.6 (probability of class 1)

Step 2: Apply Gini formula: Gini = 1 - (p₀² + p₁²) Gini = 1 - (0.4² + 0.6²) Gini = 1 - (0.16 + 0.36) Gini = 1 - 0.52 Gini = 0.48

The impurity of 0.48 indicates a reasonably mixed distribution, close to the maximum possible impurity of 0.5 for binary classification.

Example

Input

labels = [1, 1, 1, 1]

Output

0.0

Explanation

All 4 samples belong to class 1, representing a perfectly pure node.

Step 1: Calculate class probabilities: • p₁ = 4/4 = 1.0 (100% of samples are class 1)

Step 2: Apply Gini formula: Gini = 1 - (p₁²) Gini = 1 - (1.0²) Gini = 1 - 1.0 Gini = 0.0

A Gini impurity of 0.0 indicates perfect classification purity — this node would be a leaf node in a decision tree since no further splitting is beneficial.

Example

Input

labels = [0, 0, 1, 1]

Output

0.5

Explanation

The dataset is perfectly balanced with 2 samples each from class 0 and class 1.

Step 1: Calculate class probabilities: • p₀ = 2/4 = 0.5 (50% are class 0) • p₁ = 2/4 = 0.5 (50% are class 1)

Step 2: Apply Gini formula: Gini = 1 - (p₀² + p₁²) Gini = 1 - (0.5² + 0.5²) Gini = 1 - (0.25 + 0.25) Gini = 1 - 0.5 Gini = 0.5

This represents the maximum possible Gini impurity for binary classification. A 50-50 split provides no predictive power, as predicting either class would be equally likely to be correct.

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of labels ≤ 10⁵
0 ≤ labels[i] ≤ 100 (class labels are non-negative integers)
The label list will contain at least one element
Class labels do not need to be consecutive (e.g., [0, 2, 5] is valid)
The returned value should be rounded to exactly 2 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

y =

[0,1,1,1,0]

Loading problem...

Computing Gini Impurity for Classification Labels (Easy) — Practice with Code Visualizer | OneNoughtOne

101

00:00:00

Description

Editorial

Computing Gini Impurity for Classification Labels

EASY10 pts

The Gini impurity for a node containing samples from multiple classes is calculated using the following formula:

$$\text{Gini Impurity} = 1 - \sum_{k=1}^{K} p_k^2$$

Where:

K is the total number of distinct classes
p_k represents the probability (relative frequency) of class k in the dataset

Intuition Behind Gini Impurity:

A Gini impurity of 0 indicates perfect purity — all samples belong to a single class. This is the ideal outcome after splitting in a decision tree.
The maximum impurity occurs when classes are evenly distributed. For a binary classification problem, this maximum is 0.5 (when both classes have equal probability of 0.5 each).
For multi-class problems with K classes, the maximum Gini impurity is (K-1)/K, achieved when all classes are equally represented.

Your Task:

Implement a function that computes the Gini impurity for a given list of class labels. The function should:

Count the frequency of each unique class in the input
Calculate the probability of each class
Apply the Gini impurity formula
Return the result rounded to 2 decimal places

Example

Input

labels = [0, 1, 1, 1, 0]

Output

0.48

Explanation

The dataset contains 5 samples: 2 labeled as class 0 and 3 labeled as class 1.

Step 1: Calculate class probabilities: • p₀ = 2/5 = 0.4 (probability of class 0) • p₁ = 3/5 = 0.6 (probability of class 1)

Step 2: Apply Gini formula: Gini = 1 - (p₀² + p₁²) Gini = 1 - (0.4² + 0.6²) Gini = 1 - (0.16 + 0.36) Gini = 1 - 0.52 Gini = 0.48

The impurity of 0.48 indicates a reasonably mixed distribution, close to the maximum possible impurity of 0.5 for binary classification.

Example

Input

labels = [1, 1, 1, 1]

Output

0.0

Explanation

All 4 samples belong to class 1, representing a perfectly pure node.

Step 1: Calculate class probabilities: • p₁ = 4/4 = 1.0 (100% of samples are class 1)

Step 2: Apply Gini formula: Gini = 1 - (p₁²) Gini = 1 - (1.0²) Gini = 1 - 1.0 Gini = 0.0

A Gini impurity of 0.0 indicates perfect classification purity — this node would be a leaf node in a decision tree since no further splitting is beneficial.

Example

Input

labels = [0, 0, 1, 1]

Output

0.5

Explanation

The dataset is perfectly balanced with 2 samples each from class 0 and class 1.

Step 1: Calculate class probabilities: • p₀ = 2/4 = 0.5 (50% are class 0) • p₁ = 2/4 = 0.5 (50% are class 1)

Step 2: Apply Gini formula: Gini = 1 - (p₀² + p₁²) Gini = 1 - (0.5² + 0.5²) Gini = 1 - (0.25 + 0.25) Gini = 1 - 0.5 Gini = 0.5

This represents the maximum possible Gini impurity for binary classification. A 50-50 split provides no predictive power, as predicting either class would be equally likely to be correct.

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of labels ≤ 10⁵
0 ≤ labels[i] ≤ 100 (class labels are non-negative integers)
The label list will contain at least one element
Class labels do not need to be consecutive (e.g., [0, 2, 5] is valid)
The returned value should be rounded to exactly 2 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

y =

[0,1,1,1,0]

Computing Gini Impurity for Classification Labels

Hints

Computing Gini Impurity for Classification Labels

Hints