0/318

00:00:00

Description

Editorial

Categorical Distribution Entropy

EASY10 pts

Entropy is a fundamental concept in information theory that quantifies the amount of uncertainty, disorder, or randomness in a distribution. Originally developed by Claude Shannon in 1948, entropy has become one of the most widely used metrics in machine learning, data science, and statistical analysis.

Given a collection of items where each item belongs to a category (represented as an integer), your task is to compute the Shannon entropy of the categorical distribution. This metric measures how "mixed" or "diverse" the collection is.

Mathematical Definition:

For a discrete probability distribution with categories having probabilities (p_1, p_2, ..., p_k), the Shannon entropy is defined as:

$$H = -\sum_{i=1}^{k} p_i \cdot \log_2(p_i)$$

where:

(p_i) is the probability (relative frequency) of category (i)
The sum is taken over all distinct categories
By convention, (0 \cdot \log_2(0) = 0) (since (\lim_{p \to 0^+} p \cdot \log_2(p) = 0))

Properties of Entropy:

Minimum Entropy (H = 0): When all items belong to a single category, there is no uncertainty—the entropy is zero. Example: ([0, 0, 0, 0]) yields (H = 0).
Maximum Entropy: Achieved when all categories are equally likely. For (k) equally probable categories, (H_{max} = \log_2(k)).
Monotonicity in Diversity: More diverse distributions (more evenly spread across categories) have higher entropy than less diverse ones.

Required Ordering Properties:

Your implementation must satisfy the following ordering conditions:

([0, 0, 0, 0]) should yield entropy = 0 (homogeneous)
([1, 1, 0, 0]) should have higher entropy than ([0, 0, 0, 0]) (two categories vs. one)
([0, 1, 2, 3]) should have higher entropy than ([1, 1, 0, 0]) (four equally distributed categories vs. two)
([0, 0, 1, 1, 2, 2, 3, 3]) should have higher entropy than ([0, 0, 0, 0, 0, 1, 2, 3]) (uniform distribution vs. skewed)

Your Task:

Implement a function that takes a list of categorical values (integers) and returns the Shannon entropy of the distribution.

Example

Input

colors = [0, 0, 0, 0]

Output

0.0

Explanation

All four items belong to the same category (0). The probability distribution is:

• P(category 0) = 4/4 = 1.0

Entropy calculation: H = -(1.0 × log₂(1.0)) = -(1.0 × 0) = 0.0

With only one category present, there is no uncertainty or disorder in the distribution, resulting in zero entropy.

Example

Input

colors = [1, 1, 0, 0]

Output

1.0

Explanation

The collection has two categories (0 and 1), each appearing exactly twice. The probability distribution is:

• P(category 0) = 2/4 = 0.5 • P(category 1) = 2/4 = 0.5

Entropy calculation: H = -(0.5 × log₂(0.5) + 0.5 × log₂(0.5)) H = -(0.5 × (-1) + 0.5 × (-1)) H = -(-0.5 - 0.5) = 1.0

This is the maximum entropy for a two-category distribution, as both categories are equally likely.

Example

Input

colors = [0, 1, 2, 3]

Output

2.0

Explanation

The collection has four distinct categories, each appearing exactly once. The probability distribution is:

• P(category 0) = 1/4 = 0.25 • P(category 1) = 1/4 = 0.25 • P(category 2) = 1/4 = 0.25 • P(category 3) = 1/4 = 0.25

Entropy calculation: H = -4 × (0.25 × log₂(0.25)) H = -4 × (0.25 × (-2)) H = -4 × (-0.5) = 2.0

This achieves maximum entropy for four categories (log₂(4) = 2), as all categories are uniformly distributed.

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of colors ≤ 10,000
0 ≤ colors[i] ≤ 10⁶ (each color is a non-negative integer)
The input list will always contain at least one element
Output should be accurate to at least 6 decimal places
When calculating log₂, use standard mathematical conventions where log₂(1) = 0

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

colors =

[0,0,0,0]

Categorical Distribution Entropy

Hints

Categorical Distribution Entropy

Hints