Loading problem...
In information theory, understanding the relationship between two random variables is fundamental to many machine learning and data science applications. The shared information (also known as mutual information) between two random variables X and Y quantifies how much information one variable reveals about the other.
Given a joint probability distribution P(X, Y) represented as a 2D matrix, we want to compute how much uncertainty about one variable is reduced when we know the value of the other. This measure has profound implications in areas ranging from feature selection to communication theory.
Understanding Shared Information:
The shared information I(X; Y) can be computed using the following formula:
$$I(X; Y) = \sum_{i} \sum_{j} P(X=i, Y=j) \cdot \log\left(\frac{P(X=i, Y=j)}{P(X=i) \cdot P(Y=j)}\right)$$
Where:
Key Properties:
Non-negativity: I(X; Y) ≥ 0 always. It equals zero if and only if X and Y are statistically independent.
Symmetry: I(X; Y) = I(Y; X). Knowing X tells you as much about Y as knowing Y tells you about X.
Upper Bound: I(X; Y) ≤ min(H(X), H(Y)), where H denotes entropy. The shared information cannot exceed the total uncertainty in either variable.
Handling Edge Cases: When computing the formula, if P(X=i, Y=j) = 0, that term contributes 0 to the sum (by convention, since 0 × log(0/anything) = 0 using the limit definition).
Your Task: Write a Python function that computes the shared information between two random variables given their joint probability distribution. The result should be rounded to 6 decimal places.
joint_prob = [[0.4, 0.1], [0.1, 0.4]]0.192745This joint distribution shows two correlated binary random variables.
Step 1: Compute Marginal Probabilities • P(X=0) = 0.4 + 0.1 = 0.5 • P(X=1) = 0.1 + 0.4 = 0.5 • P(Y=0) = 0.4 + 0.1 = 0.5 • P(Y=1) = 0.1 + 0.4 = 0.5
Step 2: Compute Each Term For each cell, compute P(X,Y) × log(P(X,Y) / (P(X) × P(Y))): • Cell (0,0): 0.4 × ln(0.4 / 0.25) = 0.4 × ln(1.6) ≈ 0.188 • Cell (0,1): 0.1 × ln(0.1 / 0.25) = 0.1 × ln(0.4) ≈ -0.092 • Cell (1,0): 0.1 × ln(0.1 / 0.25) = 0.1 × ln(0.4) ≈ -0.092 • Cell (1,1): 0.4 × ln(0.4 / 0.25) = 0.4 × ln(1.6) ≈ 0.188
Step 3: Sum All Terms I(X; Y) = 0.188 - 0.092 - 0.092 + 0.188 ≈ 0.192745
The positive shared information indicates X and Y are correlated — knowing one reduces uncertainty about the other.
joint_prob = [[0.25, 0.25], [0.25, 0.25]]0.0This is a uniform joint distribution representing two independent random variables.
Step 1: Compute Marginal Probabilities • P(X=0) = 0.25 + 0.25 = 0.5 • P(X=1) = 0.25 + 0.25 = 0.5 • P(Y=0) = 0.25 + 0.25 = 0.5 • P(Y=1) = 0.25 + 0.25 = 0.5
Step 2: Check Independence Condition For independent variables: P(X,Y) = P(X) × P(Y) • P(X=0,Y=0) = 0.25 = 0.5 × 0.5 ✓ • All other cells follow the same pattern
Step 3: Compute Terms Since P(X,Y) = P(X) × P(Y) for all cells: • Each term becomes: P(X,Y) × ln(1) = P(X,Y) × 0 = 0
Result: I(X; Y) = 0
This confirms a fundamental property: statistically independent variables share no information.
joint_prob = [[0.333333, 0.0, 0.0], [0.0, 0.333333, 0.0], [0.0, 0.0, 0.333333]]1.098612This is a diagonal joint distribution representing perfectly correlated random variables (X = Y always).
Step 1: Compute Marginal Probabilities • P(X=0) = P(X=1) = P(X=2) ≈ 0.333333 • P(Y=0) = P(Y=1) = P(Y=2) ≈ 0.333333
Step 2: Analyze the Structure The diagonal structure means: • P(X=i, Y=j) > 0 only when i = j • X and Y are in perfect correspondence
Step 3: Compute Non-Zero Terms For diagonal cells where P(X,Y) ≈ 1/3: • Each term: (1/3) × ln((1/3) / ((1/3) × (1/3))) = (1/3) × ln(3) ≈ 0.366204
Step 4: Sum All Terms I(X; Y) = 3 × 0.366204 ≈ 1.098612
Note: ln(3) ≈ 1.0986, which equals the entropy of a 3-state uniform distribution. This demonstrates that perfectly correlated variables maximize shared information, which equals the entropy of either variable.
Constraints