0/318

00:00:00

Description

Editorial

Activation Function Gradient Computation

EASY10 pts

In the realm of neural network optimization, gradient computation forms the backbone of the learning process. During backpropagation, gradients (derivatives) of activation functions are essential for calculating how network weights should be adjusted to minimize the loss function.

Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. However, to train these networks effectively, we must understand how small changes in the input to an activation function affect its output—this is precisely what the derivative tells us.

The Three Core Activation Functions:

1. Sigmoid Activation (σ): $$\sigma(x) = \frac{1}{1 + e^{-x}}$$

The sigmoid function squashes input values to the range (0, 1), making it historically popular for output layers in binary classification. Its derivative has an elegant closed form: $$\sigma'(x) = \sigma(x) \cdot (1 - \sigma(x))$$

2. Hyperbolic Tangent (Tanh): $$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

Tanh maps inputs to the range (-1, 1), offering zero-centered outputs which can aid gradient flow. Its derivative is: $$\tanh'(x) = 1 - \tanh^2(x)$$

3. Rectified Linear Unit (ReLU): $$\text{ReLU}(x) = \max(0, x)$$

ReLU has become the default activation for hidden layers in modern deep learning due to its computational efficiency and ability to mitigate the vanishing gradient problem. Its derivative is simply: $$\text{ReLU}'(x) = \begin{cases} 1 & \text{if } x > 0 \ 0 & \text{if } x \leq 0 \end{cases}$$

Your Task: Implement a Python function that computes the derivatives of all three activation functions at a given input value x. Return a dictionary with keys 'sigmoid', 'tanh', and 'relu' containing the respective derivative values. Round the sigmoid derivative to 4 decimal places and the tanh derivative to 2 decimal places.

Example

Input

x = 0.0

Output

{'sigmoid': 0.25, 'tanh': 1.0, 'relu': 0.0}

Explanation

At x = 0:

Sigmoid derivative: • σ(0) = 1 / (1 + e⁰) = 1 / 2 = 0.5 • σ'(0) = 0.5 × (1 - 0.5) = 0.5 × 0.5 = 0.25

Tanh derivative: • tanh(0) = 0 • tanh'(0) = 1 - 0² = 1.0

ReLU derivative: • Since x = 0 is not greater than 0, ReLU'(0) = 0.0

Note: At x = 0, the tanh derivative reaches its maximum value of 1.0, while sigmoid derivative peaks at 0.25. This illustrates why tanh historically had better gradient flow than sigmoid.

Example

Input

x = 1.0

Output

{'sigmoid': 0.1966, 'tanh': 0.42, 'relu': 1.0}

Explanation

At x = 1.0:

Sigmoid derivative: • σ(1) = 1 / (1 + e⁻¹) ≈ 0.7311 • σ'(1) = 0.7311 × (1 - 0.7311) ≈ 0.7311 × 0.2689 ≈ 0.1966

Tanh derivative: • tanh(1) ≈ 0.7616 • tanh'(1) = 1 - 0.7616² ≈ 1 - 0.58 ≈ 0.42

ReLU derivative: • Since x = 1.0 > 0, ReLU'(1) = 1.0

Observe how both sigmoid and tanh derivatives are already decreasing from their peaks, while ReLU maintains a constant gradient of 1.0 in the positive region.

Example

Input

x = -1.0

Output

{'sigmoid': 0.1966, 'tanh': 0.42, 'relu': 0.0}

Explanation

At x = -1.0:

Sigmoid derivative: • σ(-1) = 1 / (1 + e¹) ≈ 0.2689 • σ'(-1) = 0.2689 × (1 - 0.2689) ≈ 0.2689 × 0.7311 ≈ 0.1966

Note: Due to symmetry, σ'(-x) = σ'(x), so the derivative at x = -1 equals the derivative at x = 1.

Tanh derivative: • tanh(-1) ≈ -0.7616 • tanh'(-1) = 1 - (-0.7616)² = 1 - 0.58 ≈ 0.42

Similarly, tanh' is symmetric: tanh'(-x) = tanh'(x).

ReLU derivative: • Since x = -1.0 ≤ 0, ReLU'(-1) = 0.0

This demonstrates the "dying ReLU" phenomenon—neurons with negative pre-activations have zero gradient and cannot learn.

Accepted0/0·0% Acceptance

Constraints

-100.0 ≤ x ≤ 100.0
The sigmoid derivative should be rounded to 4 decimal places
The tanh derivative should be rounded to 2 decimal places
ReLU derivative is exactly 0.0 for x ≤ 0 and 1.0 for x > 0
Use stable numerical implementations to handle extreme values
Return values as floating-point numbers, not integers

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

x =

Activation Function Gradient Computation

Hints

Activation Function Gradient Computation

Hints