0/318

00:00:00

Description

Editorial

Jacobian Matrix of the Softmax Transformation

MEDIUM20 pts

The softmax function is one of the most ubiquitous operations in machine learning, transforming a vector of arbitrary real numbers (often called logits) into a valid probability distribution. It is the final activation function in virtually every classification neural network, converting raw network outputs into interpretable class probabilities.

For an input vector x of length n, the softmax function computes:

$$\text{softmax}(x)i = \frac{e^{x_i}}{\sum{k=1}^{n} e^{x_k}}$$

To train neural networks using gradient descent, we need to compute how the softmax outputs change with respect to the inputs—this is where the Jacobian matrix comes in. The Jacobian is a matrix of all first-order partial derivatives, where each element J[i][j] represents:

$$J_{ij} = \frac{\partial , \text{softmax}(x)_i}{\partial , x_j}$$

Mathematical Derivation:

Let s = softmax(x) denote the softmax output vector. The Jacobian elements follow an elegant closed-form expression:

Diagonal elements (when i = j): $$J_{ii} = s_i \cdot (1 - s_i)$$
Off-diagonal elements (when i ≠ j): $$J_{ij} = -s_i \cdot s_j$$

This can be written compactly as: $$J_{ij} = s_i \cdot (\delta_{ij} - s_j)$$

where δᵢⱼ is the Kronecker delta (1 when i = j, 0 otherwise).

Intuitive Understanding:

The Jacobian captures the sensitivity of each probability output to changes in each input logit:

Increasing a logit primarily increases its corresponding probability (positive diagonal terms)
But this "steals" probability mass from all other classes (negative off-diagonal terms)
The rows of the Jacobian always sum to zero, reflecting that probability mass is conserved

Your Task: Write a Python function that computes the Jacobian matrix of the softmax function for a given input vector. Return all values rounded to 4 decimal places for numerical consistency.

Example

Input

x = [1.0, 2.0, 3.0]

Output

[[0.0819, -0.022, -0.0599], [-0.022, 0.1848, -0.1628], [-0.0599, -0.1628, 0.2227]]

Explanation

Step 1: Compute the softmax probabilities

First, calculate the exponentials and normalize: • e¹·⁰ ≈ 2.718, e²·⁰ ≈ 7.389, e³·⁰ ≈ 20.086 • Sum = 2.718 + 7.389 + 20.086 ≈ 30.193

Softmax outputs: s = [0.0900, 0.2447, 0.6652]

Step 2: Compute diagonal elements J[i][i] = sᵢ × (1 - sᵢ) • J[0][0] = 0.0900 × (1 - 0.0900) = 0.0900 × 0.9100 = 0.0819 • J[1][1] = 0.2447 × (1 - 0.2447) = 0.2447 × 0.7553 = 0.1848 • J[2][2] = 0.6652 × (1 - 0.6652) = 0.6652 × 0.3348 = 0.2227

Step 3: Compute off-diagonal elements J[i][j] = -sᵢ × sⱼ • J[0][1] = J[1][0] = -0.0900 × 0.2447 = -0.0220 • J[0][2] = J[2][0] = -0.0900 × 0.6652 = -0.0599 • J[1][2] = J[2][1] = -0.2447 × 0.6652 = -0.1628

The resulting Jacobian is symmetric (since J[i][j] = J[j][i] = -sᵢsⱼ) and each row sums to zero.

Example

Input

x = [0.0, 1.0]

Output

[[0.1966, -0.1966], [-0.1966, 0.1966]]

Explanation

Step 1: Compute softmax probabilities • e⁰·⁰ = 1.0, e¹·⁰ ≈ 2.718 • Sum = 1.0 + 2.718 ≈ 3.718

Softmax outputs: s = [0.2689, 0.7311]

Step 2: Compute the 2×2 Jacobian matrix

Diagonal elements: • J[0][0] = 0.2689 × (1 - 0.2689) = 0.2689 × 0.7311 = 0.1966 • J[1][1] = 0.7311 × (1 - 0.7311) = 0.7311 × 0.2689 = 0.1966

Off-diagonal elements: • J[0][1] = J[1][0] = -0.2689 × 0.7311 = -0.1966

Notice the elegant structure: in the 2-class case, the Jacobian has identical diagonal elements and identical (negative) off-diagonal elements. The absolute values are all equal, reflecting the binary trade-off between two probabilities.

Example

Input

x = [1.0, -1.0, 2.0, -2.0]

Output

[[0.1906, -0.0089, -0.1784, -0.0033], [-0.0089, 0.0335, -0.0241, -0.0004], [-0.1784, -0.0241, 0.2114, -0.0089], [-0.0033, -0.0004, -0.0089, 0.0126]]

Explanation

Step 1: Compute softmax probabilities • e¹·⁰ ≈ 2.718, e⁻¹·⁰ ≈ 0.368, e²·⁰ ≈ 7.389, e⁻²·⁰ ≈ 0.135 • Sum ≈ 10.610

Softmax outputs: s ≈ [0.2563, 0.0347, 0.6965, 0.0127]

Step 2: Jacobian structure analysis

The largest probability (s₂ = 0.6965) has: • The largest diagonal element: J[2][2] = 0.2114 (highest sensitivity) • The most negative off-diagonal impact on others

The smallest probability (s₃ = 0.0127) has: • Very small diagonal element: J[3][3] = 0.0126 • Negligible influence on other classes

This 4×4 Jacobian demonstrates how probability mass flows: changes to the dominant class (index 2) propagate strongly throughout, while changes to low-probability classes have minimal effect.

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of x ≤ 100
-100 ≤ x[i] ≤ 100
The input vector will contain at least one element
All output values should be rounded to 4 decimal places
Use numerically stable implementations when computing exponentials

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

x =

[1,2,3]

Jacobian Matrix of the Softmax Transformation

Hints

Jacobian Matrix of the Softmax Transformation

Hints