Loading content...
In statistics and data science, quantifying the relationship between two categorical variables is a fundamental task. When both variables are binary (taking only values 0 or 1), the Binary Association Coefficient provides an elegant measure of their correlation strength and direction.
This coefficient is mathematically equivalent to the Pearson correlation coefficient when applied specifically to two binary variables. It computes a value between -1 and +1:
The Four-Cell Contingency Table:
Given two binary sequences x and y of equal length, we first construct a 2×2 contingency table counting co-occurrences:
| y = 1 | y = 0 | |
|---|---|---|
| x = 1 | a (both 1) | b (x=1, y=0) |
| x = 0 | c (x=0, y=1) | d (both 0) |
Where:
The Association Coefficient Formula:
$$\phi = \frac{ad - bc}{\sqrt{(a+b)(c+d)(a+c)(b+d)}}$$
This formula computes the normalized difference between the diagonal products of the contingency table. The denominator normalizes the result to the [-1, +1] range.
Your Task: Implement a function that takes two binary sequences (lists of 0s and 1s) and returns their binary association coefficient, rounded to 4 decimal places. Both sequences will always have the same length.
x = [1, 1, 0, 0]
y = [0, 0, 1, 1]-1.0Building the contingency table: • a (both 1) = 0 • b (x=1, y=0) = 2 • c (x=0, y=1) = 2 • d (both 0) = 0
Applying the formula: • Numerator: (0 × 0) - (2 × 2) = 0 - 4 = -4 • Denominator: √((0+2)(2+0)(0+2)(2+0)) = √(2 × 2 × 2 × 2) = √16 = 4 • Coefficient: -4 / 4 = -1.0
This is a perfect negative association—whenever x is 1, y is 0, and vice versa.
x = [1, 0, 1, 0]
y = [1, 0, 1, 0]1.0Building the contingency table: • a (both 1) = 2 • b (x=1, y=0) = 0 • c (x=0, y=1) = 0 • d (both 0) = 2
Applying the formula: • Numerator: (2 × 2) - (0 × 0) = 4 - 0 = 4 • Denominator: √((2+0)(0+2)(2+0)(0+2)) = √(2 × 2 × 2 × 2) = √16 = 4 • Coefficient: 4 / 4 = 1.0
This is a perfect positive association—the sequences are identical.
x = [1, 1, 0, 0]
y = [1, 0, 1, 0]0.0Building the contingency table: • a (both 1) = 1 • b (x=1, y=0) = 1 • c (x=0, y=1) = 1 • d (both 0) = 1
Applying the formula: • Numerator: (1 × 1) - (1 × 1) = 1 - 1 = 0 • Denominator: √((1+1)(1+1)(1+1)(1+1)) = √16 = 4 • Coefficient: 0 / 4 = 0.0
There is no linear association between the two variables—they are statistically independent in this sample.
Constraints