00:00:00

Description

Editorial

Binary Classification Quality Score Using Matthews Correlation

MEDIUM20 pts

In the world of binary classification, evaluating model performance goes far beyond simple accuracy. When dealing with imbalanced datasets—where one class significantly outnumbers the other—traditional metrics like accuracy can be highly misleading. A model that always predicts the majority class might achieve 95% accuracy but fail entirely at its actual purpose.

The Matthews Correlation Coefficient (MCC) addresses this challenge by providing a balanced evaluation metric that considers all four components of the confusion matrix: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Unlike accuracy, precision, or recall, the MCC produces a high score only when the classifier performs well across all four categories.

Mathematical Foundation: The MCC is computed using the following formula:

$$MCC = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$$

Interpretation of MCC Values:

+1: Perfect classification — the model makes no errors whatsoever
0: Performance equivalent to random guessing — no correlation between predictions and actual labels
-1: Complete inverse prediction — every prediction is wrong (which could actually be useful if inverted)

Edge Case Handling: When any of the four terms in the denominator equals zero (meaning one entire row or column of the confusion matrix is zero), the denominator becomes zero, making the coefficient undefined. In such cases, the convention is to return 0.0, indicating that the classifier is not providing meaningful discrimination between classes.

Your Task: Implement a function that computes the Matthews Correlation Coefficient for binary classification. The function should accept two lists of equal length containing ground truth labels and predicted labels (both containing only 0s and 1s), and return the MCC value rounded to 4 decimal places.

Example

Input

actual_labels = [1, 1, 0, 0, 1, 0, 1, 0]
predicted_labels = [1, 0, 0, 1, 1, 0, 0, 0]

Output

0.2582

Explanation

Let's break down the confusion matrix step by step:

Step 1: Identify confusion matrix elements by examining each position: • Position 0: actual=1, pred=1 → True Positive (TP) • Position 1: actual=1, pred=0 → False Negative (FN) • Position 2: actual=0, pred=0 → True Negative (TN) • Position 3: actual=0, pred=1 → False Positive (FP) • Position 4: actual=1, pred=1 → True Positive (TP) • Position 5: actual=0, pred=0 → True Negative (TN) • Position 6: actual=1, pred=0 → False Negative (FN) • Position 7: actual=0, pred=0 → True Negative (TN)

Step 2: Count each category: • TP = 2, TN = 3, FP = 1, FN = 2

Step 3: Apply the MCC formula: • Numerator = (TP × TN) - (FP × FN) = (2 × 3) - (1 × 2) = 6 - 2 = 4 • Denominator = √((TP+FP)(TP+FN)(TN+FP)(TN+FN)) • Denominator = √((2+1)(2+2)(3+1)(3+2)) = √(3 × 4 × 4 × 5) = √240 ≈ 15.4919 • MCC = 4 / 15.4919 ≈ 0.2582

The positive MCC of 0.2582 indicates a weak positive correlation—the classifier is performing better than random chance, but there's significant room for improvement.

Example

Input

actual_labels = [1, 1, 0, 0, 1, 0]
predicted_labels = [1, 1, 0, 0, 1, 0]

Output

1.0

Explanation

This is a perfect classification scenario where every prediction matches the actual label:

Confusion Matrix Analysis: • TP = 3 (positions 0, 1, 4 where actual=1 and pred=1) • TN = 3 (positions 2, 3, 5 where actual=0 and pred=0) • FP = 0 (no false positives) • FN = 0 (no false negatives)

MCC Calculation: • Numerator = (3 × 3) - (0 × 0) = 9 • Denominator = √((3+0)(3+0)(3+0)(3+0)) = √(3 × 3 × 3 × 3) = √81 = 9 • MCC = 9 / 9 = 1.0

An MCC of 1.0 represents the theoretical maximum—a flawless classifier that makes no errors whatsoever.

Example

Input

actual_labels = [0, 0, 1, 1, 0, 1]
predicted_labels = [0, 1, 1, 0, 0, 1]

Output

0.3333

Explanation

Let's analyze this mixed-performance scenario:

Confusion Matrix Elements: • Position 0: actual=0, pred=0 → TN • Position 1: actual=0, pred=1 → FP • Position 2: actual=1, pred=1 → TP • Position 3: actual=1, pred=0 → FN • Position 4: actual=0, pred=0 → TN • Position 5: actual=1, pred=1 → TP

Counts: TP = 2, TN = 2, FP = 1, FN = 1

MCC Calculation: • Numerator = (2 × 2) - (1 × 1) = 4 - 1 = 3 • Denominator = √((2+1)(2+1)(2+1)(2+1)) = √(3 × 3 × 3 × 3) = √81 = 9 • MCC = 3 / 9 = 0.3333

This moderate MCC of 0.3333 shows the classifier is performing better than random but with notable errors in both directions (one false positive and one false negative).

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of actual_labels, predicted_labels ≤ 10,000
actual_labels and predicted_labels have the same length
All elements in both lists are either 0 or 1
Both lists contain at least one element
Output should be rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

y_pred =

[1,0,0,1,1,0,0,0]

y_true =

[1,1,0,0,1,0,1,0]

Loading problem...

101

00:00:00

Description

Editorial

Binary Classification Quality Score Using Matthews Correlation

MEDIUM20 pts

Mathematical Foundation: The MCC is computed using the following formula:

$$MCC = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$$

Interpretation of MCC Values:

+1: Perfect classification — the model makes no errors whatsoever
0: Performance equivalent to random guessing — no correlation between predictions and actual labels
-1: Complete inverse prediction — every prediction is wrong (which could actually be useful if inverted)

Example

Input

actual_labels = [1, 1, 0, 0, 1, 0, 1, 0]
predicted_labels = [1, 0, 0, 1, 1, 0, 0, 0]

Output

0.2582

Explanation

Let's break down the confusion matrix step by step:

Step 2: Count each category: • TP = 2, TN = 3, FP = 1, FN = 2

The positive MCC of 0.2582 indicates a weak positive correlation—the classifier is performing better than random chance, but there's significant room for improvement.

Example

Input

actual_labels = [1, 1, 0, 0, 1, 0]
predicted_labels = [1, 1, 0, 0, 1, 0]

Output

1.0

Explanation

This is a perfect classification scenario where every prediction matches the actual label:

MCC Calculation: • Numerator = (3 × 3) - (0 × 0) = 9 • Denominator = √((3+0)(3+0)(3+0)(3+0)) = √(3 × 3 × 3 × 3) = √81 = 9 • MCC = 9 / 9 = 1.0

An MCC of 1.0 represents the theoretical maximum—a flawless classifier that makes no errors whatsoever.

Example

Input

actual_labels = [0, 0, 1, 1, 0, 1]
predicted_labels = [0, 1, 1, 0, 0, 1]

Output

0.3333

Explanation

Let's analyze this mixed-performance scenario:

Counts: TP = 2, TN = 2, FP = 1, FN = 1

MCC Calculation: • Numerator = (2 × 2) - (1 × 1) = 4 - 1 = 3 • Denominator = √((2+1)(2+1)(2+1)(2+1)) = √(3 × 3 × 3 × 3) = √81 = 9 • MCC = 3 / 9 = 0.3333

This moderate MCC of 0.3333 shows the classifier is performing better than random but with notable errors in both directions (one false positive and one false negative).

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of actual_labels, predicted_labels ≤ 10,000
actual_labels and predicted_labels have the same length
All elements in both lists are either 0 or 1
Both lists contain at least one element
Output should be rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

y_pred =

[1,0,0,1,1,0,0,0]

y_true =

[1,1,0,0,1,0,1,0]

Binary Classification Quality Score Using Matthews Correlation

Hints

Binary Classification Quality Score Using Matthews Correlation

Hints