Binary Set Similarity Coefficient (Easy) — Practice with Code Visualizer

In machine learning and data science, measuring the similarity between two binary sets is a fundamental task for evaluating classification models, clustering algorithms, and recommendation systems. One of the most intuitive and widely-used metrics for this purpose is the Intersection-over-Union (IoU) coefficient, also known as the set overlap ratio.

Given two binary arrays representing ground truth labels and predicted labels, this metric quantifies how well the predictions align with the actual values by computing the ratio of matching positive elements to the total number of distinct positive elements across both sets.

Mathematical Definition:

The similarity coefficient is calculated using the following formula:

$$\text{Similarity Coefficient} = \frac{|A \cap B|}{|A \cup B|}$$

Where:

A represents the set of positions where the actual labels equal 1 (true positives in ground truth)
B represents the set of positions where the predicted labels equal 1 (predicted positives)
|A ∩ B| is the count of positions where both actual and predicted labels are 1 (intersection)
|A ∪ B| is the count of positions where at least one of the labels is 1 (union)

Interpretation:

A coefficient of 0.0 indicates complete disagreement—there is no overlap between the sets of positive predictions and actual positives
A coefficient of 1.0 indicates perfect agreement—the predicted positives exactly match the actual positives
Values between 0 and 1 represent partial overlap, with higher values indicating better prediction quality

Edge Case Handling: When both binary arrays contain only zeros (neither the ground truth nor predictions have any positive labels), the union is empty. By convention, this scenario should return 0.0 since there are no positive elements to compare.

Your Task: Implement a function that takes two binary arrays of equal length and computes their similarity coefficient. The function should correctly handle all edge cases, including scenarios with no overlap, perfect overlap, and empty positive sets.

Let's identify the positions where each array has value 1:

• Actual positives (A): positions 0, 2, 3, 5 → {0, 2, 3, 5} • Predicted positives (B): positions 0, 2, 5 → {0, 2, 5}

Now compute intersection and union: • Intersection (A ∩ B): positions where both are 1 → {0, 2, 5} → count = 3 • Union (A ∪ B): positions where at least one is 1 → {0, 2, 3, 5} → count = 4

Similarity Coefficient = 3 / 4 = 0.75

This indicates 75% overlap between the actual and predicted positive labels.

The predicted labels exactly match the actual labels for all positive positions:

• Actual positives (A): positions 0, 1, 3 → {0, 1, 3} • Predicted positives (B): positions 0, 1, 3 → {0, 1, 3}

Both sets are identical: • Intersection = {0, 1, 3} → count = 3 • Union = {0, 1, 3} → count = 3

Similarity Coefficient = 3 / 3 = 1.0

A perfect score of 1.0 indicates the model's predictions perfectly align with ground truth.

The predicted positives and actual positives have no positions in common:

• Actual positives (A): positions 0, 1 → {0, 1} • Predicted positives (B): positions 2, 3 → {2, 3}

The intersection is empty: • Intersection = {} → count = 0 • Union = {0, 1, 2, 3} → count = 4

Similarity Coefficient = 0 / 4 = 0.0

A score of 0.0 indicates complete disagreement—the model predicted positives exactly where the actual values were negative, and vice versa.