0/318

00:00:00

Description

Editorial

Gradient of Softmax Cross-Entropy Loss

MEDIUM20 pts

One of the most critical computations in training neural network classifiers is calculating the gradient of the cross-entropy loss with respect to the pre-activation outputs (logits). This gradient formula is remarkably elegant and computationally efficient, making it the foundation of backpropagation in classification networks.

Background: The Softmax Cross-Entropy Pipeline

In multi-class classification, neural networks typically output raw scores called logits—unnormalized values for each class. These logits are transformed into a valid probability distribution using the softmax function:

$$p_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}$$

Where z is the vector of logits and K is the number of classes.

The cross-entropy loss then measures the discrepancy between the predicted probability distribution p and the true target distribution (typically a one-hot encoded vector y):

$$L = -\sum_{i=1}^{K} y_i \log(p_i)$$

For a single correct class c (one-hot encoding), this simplifies to:

$$L = -\log(p_c)$$

The Gradient Formula

The elegant result from calculus is that the gradient of the combined softmax cross-entropy loss with respect to the logits has an extraordinarily simple closed-form solution:

$$\frac{\partial L}{\partial z_i} = p_i - y_i$$

This means:

For the correct class (target index): gradient = predicted probability − 1
For all other classes: gradient = predicted probability − 0 = predicted probability

This formula's simplicity is why softmax cross-entropy is the preferred loss function for classification—the gradient computation is both numerically stable and computationally efficient.

Your Task

Write a Python function that computes the gradient vector of the softmax cross-entropy loss with respect to the input logits. Given a vector of logits and a target class index, return the gradient for each logit position, rounded to 4 decimal places.

Example

Input

logits = [1.0, 2.0, 3.0], target = 0

Output

[-0.91, 0.2447, 0.6652]

Explanation

Step 1: Compute Softmax Probabilities

Apply softmax to convert logits to probabilities: • Compute exponentials: e^1.0 ≈ 2.718, e^2.0 ≈ 7.389, e^3.0 ≈ 20.086 • Sum of exponentials: 2.718 + 7.389 + 20.086 ≈ 30.193 • Probabilities: p = [2.718/30.193, 7.389/30.193, 20.086/30.193] ≈ [0.09, 0.2447, 0.6652]

Step 2: Construct One-Hot Target Vector

Since target = 0, the one-hot vector is: y = [1, 0, 0]

Step 3: Compute Gradient (p − y)

• Gradient[0] = 0.09 − 1 = −0.91 (negative gradient pushes logit up) • Gradient[1] = 0.2447 − 0 = 0.2447 (positive gradient pushes logit down) • Gradient[2] = 0.6652 − 0 = 0.6652 (positive gradient pushes logit down)

The negative gradient for the target class indicates we should increase that logit to reduce loss, while positive gradients for non-target classes indicate those logits should decrease.

Example

Input

logits = [0.0, 0.0], target = 1

Output

[0.5, -0.5]

Explanation

Step 1: Compute Softmax Probabilities

With equal logits, softmax produces uniform probabilities: • Exponentials: e^0.0 = 1, e^0.0 = 1 • Sum: 1 + 1 = 2 • Probabilities: p = [0.5, 0.5]

Step 2: Construct One-Hot Target Vector

Since target = 1, the one-hot vector is: y = [0, 1]

Step 3: Compute Gradient (p − y)

• Gradient[0] = 0.5 − 0 = 0.5 (reduce this logit) • Gradient[1] = 0.5 − 1 = −0.5 (increase this logit)

The symmetric result shows that with equal starting logits and 50/50 predictions, the network needs to equally push one class up and the other down.

Example

Input

logits = [1.0, 2.0, 3.0, 4.0], target = 2

Output

[0.0321, 0.0871, -0.7631, 0.6439]

Explanation

Step 1: Compute Softmax Probabilities

Apply softmax to the 4-class logit vector: • Exponentials: e^1 ≈ 2.718, e^2 ≈ 7.389, e^3 ≈ 20.086, e^4 ≈ 54.598 • Sum ≈ 84.791 • Probabilities: p ≈ [0.0321, 0.0871, 0.2369, 0.6439]

Step 2: Construct One-Hot Target Vector

Since target = 2 (third class, 0-indexed), the one-hot vector is: y = [0, 0, 1, 0]

Step 3: Compute Gradient (p − y)

• Gradient[0] = 0.0321 − 0 = 0.0321 • Gradient[1] = 0.0871 − 0 = 0.0871 • Gradient[2] = 0.2369 − 1 = −0.7631 (large negative gradient) • Gradient[3] = 0.6439 − 0 = 0.6439

Note: Class 3 (index 2) is the target, but the model currently assigns it only ~24% probability while class 4 gets ~64%. The large negative gradient for class 3 will strongly push its logit upward, while the substantial positive gradient for class 4 will push its logit downward.

Accepted0/0·0% Acceptance

Constraints

1 ≤ len(logits) ≤ 1000 (number of classes)
-100 ≤ logits[i] ≤ 100
0 ≤ target < len(logits) (valid class index)
Output values should be rounded to 4 decimal places
Use numerically stable softmax computation to handle extreme logit values

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

logits =

[1,2,3]

target =

Background: The Softmax Cross-Entropy Pipeline

$$p_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}$$

Where z is the vector of logits and K is the number of classes.

The cross-entropy loss then measures the discrepancy between the predicted probability distribution p and the true target distribution (typically a one-hot encoded vector y):

$$L = -\sum_{i=1}^{K} y_i \log(p_i)$$

For a single correct class c (one-hot encoding), this simplifies to:

$$L = -\log(p_c)$$

The Gradient Formula

The elegant result from calculus is that the gradient of the combined softmax cross-entropy loss with respect to the logits has an extraordinarily simple closed-form solution:

$$\frac{\partial L}{\partial z_i} = p_i - y_i$$

This means:

For the correct class (target index): gradient = predicted probability − 1

For all other classes: gradient = predicted probability − 0 = predicted probability

This formula's simplicity is why softmax cross-entropy is the preferred loss function for classification—the gradient computation is both numerically stable and computationally efficient.

Gradient of Softmax Cross-Entropy Loss

Background: The Softmax Cross-Entropy Pipeline

The Gradient Formula

Your Task

Hints

Gradient of Softmax Cross-Entropy Loss

Background: The Softmax Cross-Entropy Pipeline

The Gradient Formula

Your Task

Hints