Loading problem...
In the world of binary classification, evaluating model performance goes far beyond simple accuracy. When dealing with imbalanced datasets—where one class significantly outnumbers the other—traditional metrics like accuracy can be highly misleading. A model that always predicts the majority class might achieve 95% accuracy but fail entirely at its actual purpose.
The Matthews Correlation Coefficient (MCC) addresses this challenge by providing a balanced evaluation metric that considers all four components of the confusion matrix: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Unlike accuracy, precision, or recall, the MCC produces a high score only when the classifier performs well across all four categories.
Mathematical Foundation: The MCC is computed using the following formula:
$$MCC = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$$
Interpretation of MCC Values:
Edge Case Handling: When any of the four terms in the denominator equals zero (meaning one entire row or column of the confusion matrix is zero), the denominator becomes zero, making the coefficient undefined. In such cases, the convention is to return 0.0, indicating that the classifier is not providing meaningful discrimination between classes.
Your Task: Implement a function that computes the Matthews Correlation Coefficient for binary classification. The function should accept two lists of equal length containing ground truth labels and predicted labels (both containing only 0s and 1s), and return the MCC value rounded to 4 decimal places.
actual_labels = [1, 1, 0, 0, 1, 0, 1, 0]
predicted_labels = [1, 0, 0, 1, 1, 0, 0, 0]0.2582Let's break down the confusion matrix step by step:
Step 1: Identify confusion matrix elements by examining each position: • Position 0: actual=1, pred=1 → True Positive (TP) • Position 1: actual=1, pred=0 → False Negative (FN) • Position 2: actual=0, pred=0 → True Negative (TN) • Position 3: actual=0, pred=1 → False Positive (FP) • Position 4: actual=1, pred=1 → True Positive (TP) • Position 5: actual=0, pred=0 → True Negative (TN) • Position 6: actual=1, pred=0 → False Negative (FN) • Position 7: actual=0, pred=0 → True Negative (TN)
Step 2: Count each category: • TP = 2, TN = 3, FP = 1, FN = 2
Step 3: Apply the MCC formula: • Numerator = (TP × TN) - (FP × FN) = (2 × 3) - (1 × 2) = 6 - 2 = 4 • Denominator = √((TP+FP)(TP+FN)(TN+FP)(TN+FN)) • Denominator = √((2+1)(2+2)(3+1)(3+2)) = √(3 × 4 × 4 × 5) = √240 ≈ 15.4919 • MCC = 4 / 15.4919 ≈ 0.2582
The positive MCC of 0.2582 indicates a weak positive correlation—the classifier is performing better than random chance, but there's significant room for improvement.
actual_labels = [1, 1, 0, 0, 1, 0]
predicted_labels = [1, 1, 0, 0, 1, 0]1.0This is a perfect classification scenario where every prediction matches the actual label:
Confusion Matrix Analysis: • TP = 3 (positions 0, 1, 4 where actual=1 and pred=1) • TN = 3 (positions 2, 3, 5 where actual=0 and pred=0) • FP = 0 (no false positives) • FN = 0 (no false negatives)
MCC Calculation: • Numerator = (3 × 3) - (0 × 0) = 9 • Denominator = √((3+0)(3+0)(3+0)(3+0)) = √(3 × 3 × 3 × 3) = √81 = 9 • MCC = 9 / 9 = 1.0
An MCC of 1.0 represents the theoretical maximum—a flawless classifier that makes no errors whatsoever.
actual_labels = [0, 0, 1, 1, 0, 1]
predicted_labels = [0, 1, 1, 0, 0, 1]0.3333Let's analyze this mixed-performance scenario:
Confusion Matrix Elements: • Position 0: actual=0, pred=0 → TN • Position 1: actual=0, pred=1 → FP • Position 2: actual=1, pred=1 → TP • Position 3: actual=1, pred=0 → FN • Position 4: actual=0, pred=0 → TN • Position 5: actual=1, pred=1 → TP
Counts: TP = 2, TN = 2, FP = 1, FN = 1
MCC Calculation: • Numerator = (2 × 2) - (1 × 1) = 4 - 1 = 3 • Denominator = √((2+1)(2+1)(2+1)(2+1)) = √(3 × 3 × 3 × 3) = √81 = 9 • MCC = 3 / 9 = 0.3333
This moderate MCC of 0.3333 shows the classifier is performing better than random but with notable errors in both directions (one false positive and one false negative).
Constraints