0/318

00:00:00

Description

Editorial

Categorical Cross-Entropy Loss Computation

EASY10 pts

In supervised machine learning for classification tasks, the categorical cross-entropy loss (also known as log loss for multi-class problems) serves as a critical metric for measuring how well a model's predicted probability distribution aligns with the actual ground truth labels.

This loss function quantifies the dissimilarity between two probability distributions: the predicted probabilities from your model and the true label distribution (represented as one-hot encoded vectors). The fundamental intuition is that the loss penalizes predictions that assign low probability to the correct class, with the penalty growing exponentially as confidence in the wrong prediction increases.

Mathematical Foundation:

For a single sample with C classes, where y is the one-hot encoded true label vector and p is the predicted probability vector, the cross-entropy loss is defined as:

$$L = -\sum_{c=1}^{C} y_c \cdot \log(p_c)$$

Since y is one-hot encoded (only one element equals 1, the rest are 0), this simplifies to:

$$L = -\log(p_{\text{correct_class}})$$

For a batch of N samples, the average cross-entropy loss is:

$$\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{C} y_i^{(c)} \cdot \log(p_i^{(c)})$$

Numerical Stability:

In practice, predicted probabilities very close to 0 can cause numerical issues when taking the logarithm (approaching negative infinity). To prevent this, probabilities are typically clipped using a small epsilon value (e.g., 1e-15) to ensure they remain within a safe numerical range.

Your Task:

Implement a function that computes the mean categorical cross-entropy loss for a batch of multi-class classification predictions. Your function should:

Accept a batch of predicted probability distributions and corresponding one-hot encoded true labels
Apply appropriate numerical clipping for stability (use epsilon = 1e-15)
Return the average cross-entropy loss across all samples, rounded to 4 decimal places

Example

Input

predicted_probs = [[0.7, 0.2, 0.1], [0.3, 0.6, 0.1]]
true_labels = [[1, 0, 0], [0, 1, 0]]

Output

0.4338

Explanation

For this batch of 2 samples with 3 classes each:

Sample 1: • True class: index 0 (first class) • Predicted probability for correct class: 0.7 • Individual loss: -log(0.7) ≈ 0.3567

Sample 2: • True class: index 1 (second class) • Predicted probability for correct class: 0.6 • Individual loss: -log(0.6) ≈ 0.5108

Mean Loss: (0.3567 + 0.5108) / 2 ≈ 0.4338

The model is relatively confident on both predictions (>50% probability for correct class), resulting in a moderately low loss.

Example

Input

predicted_probs = [[0.9, 0.1], [0.2, 0.8], [0.5, 0.5]]
true_labels = [[1, 0], [0, 1], [1, 0]]

Output

0.3406

Explanation

For this batch of 3 binary classification samples:

Sample 1: • True class: index 0 | Predicted probability: 0.9 • Loss: -log(0.9) ≈ 0.1054

Sample 2: • True class: index 1 | Predicted probability: 0.8 • Loss: -log(0.8) ≈ 0.2231

Sample 3: • True class: index 0 | Predicted probability: 0.5 • Loss: -log(0.5) ≈ 0.6931

Mean Loss: (0.1054 + 0.2231 + 0.6931) / 3 ≈ 0.3406

Note how Sample 3 contributes the highest loss due to the model's uncertainty (50-50 prediction).

Example

Input

predicted_probs = [[0.1, 0.2, 0.3, 0.4]]
true_labels = [[0, 0, 1, 0]]

Output

1.204

Explanation

For this single sample with 4 classes:

• True class: index 2 (third class) • Predicted probability for correct class: 0.3 • Loss: -log(0.3) ≈ 1.2040

This relatively high loss indicates poor model confidence. The model assigned only 30% probability to the correct class while giving higher probability (40%) to the wrong class at index 3. Cross-entropy heavily penalizes such misconfident predictions, signaling to the optimization algorithm that significant weight adjustments are needed.

Accepted0/0·0% Acceptance

Constraints

1 ≤ N ≤ 1000 (number of samples in the batch)
2 ≤ C ≤ 1000 (number of classes)
0 < predicted_probs[i][j] ≤ 1 for all valid indices
Each row in predicted_probs sums to 1.0 (valid probability distribution)
Each row in true_labels contains exactly one 1 and the rest 0s (valid one-hot encoding)
true_labels[i][j] ∈ {0, 1}
The dimensions of predicted_probs and true_labels are identical
Use epsilon = 1e-15 for numerical clipping

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

true_labels =

[[0,0,1,0]]

predicted_probs =

[[0.1,0.2,0.3,0.4]]

Categorical Cross-Entropy Loss Computation

Hints

Categorical Cross-Entropy Loss Computation

Hints