0/318

00:00:00

Description

Editorial

Soft Target Classification Loss

MEDIUM20 pts

In classification tasks, neural networks are typically trained using hard labels—one-hot encoded vectors where the true class has probability 1.0 and all other classes have probability 0.0. However, this approach can lead to overconfident predictions and poor generalization, especially when training data is noisy or limited.

Soft target distributions address this limitation by redistributing a small portion of probability mass from the true class to all other classes using a smoothing parameter ε (epsilon). This technique, known as label relaxation or confidence regularization, encourages the model to be less certain about its predictions while still learning the correct class.

Mathematical Foundation

Given a classification problem with K classes, a smoothing parameter ε, and a true class label y, the soft target distribution q is defined as:

$$q_i = \begin{cases} 1 - \varepsilon + \frac{\varepsilon}{K} & \text{if } i = y \text{ (true class)} \ \frac{\varepsilon}{K} & \text{otherwise} \end{cases}$$

The cross-entropy loss between the model's predicted distribution p (obtained via softmax) and the soft target q is:

$$\mathcal{L} = -\sum_{i=1}^{K} q_i \log(p_i)$$

Numerical Stability

Direct computation of softmax followed by logarithm can cause numerical overflow or underflow (e.g., computing (e^{1000}) or (\log(0))). A numerically stable approach computes the log-softmax directly:

$$\log(\text{softmax}(z))i = z_i - \log\left(\sum{j=1}^{K} e^{z_j}\right)$$

Using the log-sum-exp trick with a max-shift for stability:

$$\log\left(\sum_{j=1}^{K} e^{z_j}\right) = m + \log\left(\sum_{j=1}^{K} e^{z_j - m}\right), \text{ where } m = \max_j(z_j)$$

Your Task

Implement a function that:

Constructs the soft target distribution for each sample based on its true label
Computes the log-softmax of the logits in a numerically stable manner
Calculates the cross-entropy loss between predictions and soft targets
Returns the mean loss across all samples, rounded to the specified decimal places

The function should handle arbitrary batch sizes, varying numbers of classes, and different smoothing intensities from 0.0 (hard labels) to values approaching 1.0 (maximum smoothing).

Example

Input

logits = [[2.0, 0.0, -1.0], [0.0, 1.0, 0.0]]
y_true = [0, 2]
num_classes = 3
epsilon = 0.1
round_decimals = 6

Output

0.927312

Explanation

Sample 1: True class is 0. With ε=0.1 and K=3: • Soft target: [1 - 0.1 + 0.1/3, 0.1/3, 0.1/3] = [0.9333, 0.0333, 0.0333] • Log-softmax of [2.0, 0.0, -1.0] ≈ [-0.407, -2.407, -3.407] • Cross-entropy: -[(0.9333 × -0.407) + (0.0333 × -2.407) + (0.0333 × -3.407)] ≈ 0.574

Sample 2: True class is 2. • Soft target: [0.0333, 0.0333, 0.9333] • Log-softmax of [0.0, 1.0, 0.0] ≈ [-1.551, -0.551, -1.551] • Cross-entropy ≈ 1.281

Mean Loss: (0.574 + 1.281) / 2 ≈ 0.927312

Example

Input

logits = [[1.0, -1.0], [0.5, 0.5]]
y_true = [0, 1]
num_classes = 2
epsilon = 0.05
round_decimals = 6

Output

0.435038

Explanation

Sample 1: True class is 0. With ε=0.05 and K=2: • Soft target: [0.975, 0.025] • Log-softmax of [1.0, -1.0] ≈ [-0.127, -2.127] • Cross-entropy ≈ 0.177

Sample 2: True class is 1. • Soft target: [0.025, 0.975] • Log-softmax of [0.5, 0.5] = [-0.693, -0.693] • Cross-entropy ≈ 0.693

Mean Loss: (0.177 + 0.693) / 2 ≈ 0.435038

Example

Input

logits = [[3.0, 1.0, 0.0, -1.0]]
y_true = [0]
num_classes = 4
epsilon = 0.0
round_decimals = 6

Output

0.185182

Explanation

When ε=0.0, we revert to standard hard labels (no smoothing): • Soft target: [1.0, 0.0, 0.0, 0.0] (pure one-hot) • Log-softmax of [3.0, 1.0, 0.0, -1.0] ≈ [-0.185, -2.185, -3.185, -4.185] • Cross-entropy: -(1.0 × -0.185) = 0.185182

This demonstrates that with zero smoothing, the loss reduces to standard categorical cross-entropy focusing only on the true class probability.

Accepted0/0·0% Acceptance

Constraints

1 ≤ batch_size ≤ 1000 (number of samples)
2 ≤ num_classes ≤ 100
len(logits) == len(y_true) == batch_size
len(logits[i]) == num_classes for all samples
0 ≤ y_true[i] < num_classes (valid class indices)
0.0 ≤ epsilon ≤ 1.0 (smoothing parameter)
-10⁶ ≤ logits[i][j] ≤ 10⁶
0 ≤ round_decimals ≤ 10

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

logits =

[[1,-1],[0.5,0.5]]

y_true =

[0,1]

epsilon =

0.05

num_classes =

round_decimals =

Mathematical Foundation

Given a classification problem with K classes, a smoothing parameter ε, and a true class label y, the soft target distribution q is defined as:

$$q_i = \begin{cases} 1 - \varepsilon + \frac{\varepsilon}{K} & \text{if } i = y \text{ (true class)} \ \frac{\varepsilon}{K} & \text{otherwise} \end{cases}$$

The cross-entropy loss between the model's predicted distribution p (obtained via softmax) and the soft target q is:

$$\mathcal{L} = -\sum_{i=1}^{K} q_i \log(p_i)$$

Numerical Stability

$$\log(\text{softmax}(z))i = z_i - \log\left(\sum{j=1}^{K} e^{z_j}\right)$$

Using the log-sum-exp trick with a max-shift for stability:

$$\log\left(\sum_{j=1}^{K} e^{z_j}\right) = m + \log\left(\sum_{j=1}^{K} e^{z_j - m}\right), \text{ where } m = \max_j(z_j)$$

Your Task

Implement a function that:

Constructs the soft target distribution for each sample based on its true label

Computes the log-softmax of the logits in a numerically stable manner

Calculates the cross-entropy loss between predictions and soft targets

Returns the mean loss across all samples, rounded to the specified decimal places

The function should handle arbitrary batch sizes, varying numbers of classes, and different smoothing intensities from 0.0 (hard labels) to values approaching 1.0 (maximum smoothing).

Soft Target Classification Loss

Mathematical Foundation

Numerical Stability

Your Task

Hints

Soft Target Classification Loss

Mathematical Foundation

Numerical Stability

Your Task

Hints