Loading problem...
In machine learning and data science, measuring the similarity between two binary sets is a fundamental task for evaluating classification models, clustering algorithms, and recommendation systems. One of the most intuitive and widely-used metrics for this purpose is the Intersection-over-Union (IoU) coefficient, also known as the set overlap ratio.
Given two binary arrays representing ground truth labels and predicted labels, this metric quantifies how well the predictions align with the actual values by computing the ratio of matching positive elements to the total number of distinct positive elements across both sets.
Mathematical Definition:
The similarity coefficient is calculated using the following formula:
$$\text{Similarity Coefficient} = \frac{|A \cap B|}{|A \cup B|}$$
Where:
Interpretation:
Edge Case Handling: When both binary arrays contain only zeros (neither the ground truth nor predictions have any positive labels), the union is empty. By convention, this scenario should return 0.0 since there are no positive elements to compare.
Your Task: Implement a function that takes two binary arrays of equal length and computes their similarity coefficient. The function should correctly handle all edge cases, including scenarios with no overlap, perfect overlap, and empty positive sets.
actual_labels = [1, 0, 1, 1, 0, 1]
predicted_labels = [1, 0, 1, 0, 0, 1]0.75Let's identify the positions where each array has value 1:
• Actual positives (A): positions 0, 2, 3, 5 → {0, 2, 3, 5} • Predicted positives (B): positions 0, 2, 5 → {0, 2, 5}
Now compute intersection and union: • Intersection (A ∩ B): positions where both are 1 → {0, 2, 5} → count = 3 • Union (A ∪ B): positions where at least one is 1 → {0, 2, 3, 5} → count = 4
Similarity Coefficient = 3 / 4 = 0.75
This indicates 75% overlap between the actual and predicted positive labels.
actual_labels = [1, 1, 0, 1, 0]
predicted_labels = [1, 1, 0, 1, 0]1.0The predicted labels exactly match the actual labels for all positive positions:
• Actual positives (A): positions 0, 1, 3 → {0, 1, 3} • Predicted positives (B): positions 0, 1, 3 → {0, 1, 3}
Both sets are identical: • Intersection = {0, 1, 3} → count = 3 • Union = {0, 1, 3} → count = 3
Similarity Coefficient = 3 / 3 = 1.0
A perfect score of 1.0 indicates the model's predictions perfectly align with ground truth.
actual_labels = [1, 1, 0, 0]
predicted_labels = [0, 0, 1, 1]0.0The predicted positives and actual positives have no positions in common:
• Actual positives (A): positions 0, 1 → {0, 1} • Predicted positives (B): positions 2, 3 → {2, 3}
The intersection is empty: • Intersection = {} → count = 0 • Union = {0, 1, 2, 3} → count = 4
Similarity Coefficient = 0 / 4 = 0.0
A score of 0.0 indicates complete disagreement—the model predicted positives exactly where the actual values were negative, and vice versa.
Constraints