Loading content...
In supervised machine learning for classification tasks, the categorical cross-entropy loss (also known as log loss for multi-class problems) serves as a critical metric for measuring how well a model's predicted probability distribution aligns with the actual ground truth labels.
This loss function quantifies the dissimilarity between two probability distributions: the predicted probabilities from your model and the true label distribution (represented as one-hot encoded vectors). The fundamental intuition is that the loss penalizes predictions that assign low probability to the correct class, with the penalty growing exponentially as confidence in the wrong prediction increases.
Mathematical Foundation:
For a single sample with C classes, where y is the one-hot encoded true label vector and p is the predicted probability vector, the cross-entropy loss is defined as:
$$L = -\sum_{c=1}^{C} y_c \cdot \log(p_c)$$
Since y is one-hot encoded (only one element equals 1, the rest are 0), this simplifies to:
$$L = -\log(p_{\text{correct_class}})$$
For a batch of N samples, the average cross-entropy loss is:
$$\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{C} y_i^{(c)} \cdot \log(p_i^{(c)})$$
Numerical Stability:
In practice, predicted probabilities very close to 0 can cause numerical issues when taking the logarithm (approaching negative infinity). To prevent this, probabilities are typically clipped using a small epsilon value (e.g., 1e-15) to ensure they remain within a safe numerical range.
Your Task:
Implement a function that computes the mean categorical cross-entropy loss for a batch of multi-class classification predictions. Your function should:
predicted_probs = [[0.7, 0.2, 0.1], [0.3, 0.6, 0.1]]
true_labels = [[1, 0, 0], [0, 1, 0]]0.4338For this batch of 2 samples with 3 classes each:
Sample 1: • True class: index 0 (first class) • Predicted probability for correct class: 0.7 • Individual loss: -log(0.7) ≈ 0.3567
Sample 2: • True class: index 1 (second class) • Predicted probability for correct class: 0.6 • Individual loss: -log(0.6) ≈ 0.5108
Mean Loss: (0.3567 + 0.5108) / 2 ≈ 0.4338
The model is relatively confident on both predictions (>50% probability for correct class), resulting in a moderately low loss.
predicted_probs = [[0.9, 0.1], [0.2, 0.8], [0.5, 0.5]]
true_labels = [[1, 0], [0, 1], [1, 0]]0.3406For this batch of 3 binary classification samples:
Sample 1: • True class: index 0 | Predicted probability: 0.9 • Loss: -log(0.9) ≈ 0.1054
Sample 2: • True class: index 1 | Predicted probability: 0.8 • Loss: -log(0.8) ≈ 0.2231
Sample 3: • True class: index 0 | Predicted probability: 0.5 • Loss: -log(0.5) ≈ 0.6931
Mean Loss: (0.1054 + 0.2231 + 0.6931) / 3 ≈ 0.3406
Note how Sample 3 contributes the highest loss due to the model's uncertainty (50-50 prediction).
predicted_probs = [[0.1, 0.2, 0.3, 0.4]]
true_labels = [[0, 0, 1, 0]]1.204For this single sample with 4 classes:
• True class: index 2 (third class) • Predicted probability for correct class: 0.3 • Loss: -log(0.3) ≈ 1.2040
This relatively high loss indicates poor model confidence. The model assigned only 30% probability to the correct class while giving higher probability (40%) to the wrong class at index 3. Cross-entropy heavily penalizes such misconfident predictions, signaling to the optimization algorithm that significant weight adjustments are needed.
Constraints