Loading content...
Entropy is a fundamental concept in information theory that quantifies the amount of uncertainty, disorder, or randomness in a distribution. Originally developed by Claude Shannon in 1948, entropy has become one of the most widely used metrics in machine learning, data science, and statistical analysis.
Given a collection of items where each item belongs to a category (represented as an integer), your task is to compute the Shannon entropy of the categorical distribution. This metric measures how "mixed" or "diverse" the collection is.
Mathematical Definition:
For a discrete probability distribution with categories having probabilities (p_1, p_2, ..., p_k), the Shannon entropy is defined as:
$$H = -\sum_{i=1}^{k} p_i \cdot \log_2(p_i)$$
where:
Properties of Entropy:
Minimum Entropy (H = 0): When all items belong to a single category, there is no uncertainty—the entropy is zero. Example: ([0, 0, 0, 0]) yields (H = 0).
Maximum Entropy: Achieved when all categories are equally likely. For (k) equally probable categories, (H_{max} = \log_2(k)).
Monotonicity in Diversity: More diverse distributions (more evenly spread across categories) have higher entropy than less diverse ones.
Required Ordering Properties:
Your implementation must satisfy the following ordering conditions:
Your Task:
Implement a function that takes a list of categorical values (integers) and returns the Shannon entropy of the distribution.
colors = [0, 0, 0, 0]0.0All four items belong to the same category (0). The probability distribution is:
• P(category 0) = 4/4 = 1.0
Entropy calculation: H = -(1.0 × log₂(1.0)) = -(1.0 × 0) = 0.0
With only one category present, there is no uncertainty or disorder in the distribution, resulting in zero entropy.
colors = [1, 1, 0, 0]1.0The collection has two categories (0 and 1), each appearing exactly twice. The probability distribution is:
• P(category 0) = 2/4 = 0.5 • P(category 1) = 2/4 = 0.5
Entropy calculation: H = -(0.5 × log₂(0.5) + 0.5 × log₂(0.5)) H = -(0.5 × (-1) + 0.5 × (-1)) H = -(-0.5 - 0.5) = 1.0
This is the maximum entropy for a two-category distribution, as both categories are equally likely.
colors = [0, 1, 2, 3]2.0The collection has four distinct categories, each appearing exactly once. The probability distribution is:
• P(category 0) = 1/4 = 0.25 • P(category 1) = 1/4 = 0.25 • P(category 2) = 1/4 = 0.25 • P(category 3) = 1/4 = 0.25
Entropy calculation: H = -4 × (0.25 × log₂(0.25)) H = -4 × (0.25 × (-2)) H = -4 × (-0.5) = 2.0
This achieves maximum entropy for four categories (log₂(4) = 2), as all categories are uniformly distributed.
Constraints