Loading content...
In deep learning, activation functions play a critical role in introducing non-linearity and enabling neural networks to learn complex patterns. While traditional activation functions like ReLU have proven effective, they often suffer from issues such as the vanishing gradient problem and require explicit normalization layers (like Batch Normalization) to maintain stable training dynamics.
The Self-Normalizing Exponential Activation (SELU) was introduced as a revolutionary solution that inherently maintains self-normalizing properties within deep neural networks. This means that activations automatically converge towards zero mean and unit variance as they propagate through the network, eliminating the need for external normalization techniques.
The SELU activation function is defined as:
$$\text{SELU}(x) = \lambda \begin{cases} x & \text{if } x > 0 \ \alpha (e^x - 1) & \text{if } x \leq 0 \end{cases}$$
Where the mathematically-derived constants are:
These specific values were derived using the Banach fixed-point theorem to guarantee self-normalizing behavior under certain conditions.
Automatic Normalization: Unlike ReLU-based networks that often require Batch Normalization, SELU networks naturally maintain stable activation statistics.
Non-Zero Mean: For negative inputs, SELU produces negative outputs (unlike ReLU which outputs zero), allowing the network to adjust the mean of activations.
Scaling Factor: The λ > 1 scaling ensures that variance can grow for inputs that would otherwise compress the variance.
Smooth Transition: The exponential characteristic provides a smooth, differentiable transition across the entire input domain.
Your Task: Implement the Self-Normalizing Exponential Activation function that computes the SELU value for a given input. Your implementation should use the standard SELU parameters (λ ≈ 1.0507, α ≈ 1.6733) and return the result rounded to 4 decimal places.
x = -1.0-1.1113For negative input x = -1.0:
• Since x ≤ 0, we use the exponential branch: SELU(x) = λ × α × (e^x - 1) • Calculate e^(-1.0) ≈ 0.3679 • Compute α × (e^(-1.0) - 1) = 1.6733 × (0.3679 - 1) = 1.6733 × (-0.6321) ≈ -1.0578 • Apply scale factor: λ × (-1.0578) = 1.0507 × (-1.0578) ≈ -1.1113
The negative output helps shift the mean of activations, contributing to the self-normalizing property.
x = 2.02.1014For positive input x = 2.0:
• Since x > 0, we use the linear branch: SELU(x) = λ × x • Apply scale factor: λ × 2.0 = 1.0507 × 2.0 = 2.1014
The scaling factor λ > 1 ensures that variance can be maintained or increased for positive activations, balancing the variance reduction from negative inputs.
x = 0.00.0For input x = 0.0:
• At the boundary x = 0, we apply the exponential branch: SELU(x) = λ × α × (e^0 - 1) • Calculate e^0 = 1 • Compute α × (1 - 1) = 1.6733 × 0 = 0 • Apply scale factor: λ × 0 = 0.0
SELU passes through the origin, providing a natural reference point for the activation function.
Constraints