Loading problem...
In deep learning, activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns and representations. Among the family of modern activations, the self-gated activation function has emerged as a particularly effective alternative to traditional functions like ReLU.
The self-gated activation function works by multiplying the input by its own sigmoid-transformed value, creating a self-regulating gate that controls the information flow:
$$f(x) = x \cdot \sigma(x) = \frac{x}{1 + e^{-x}}$$
where σ(x) is the sigmoid function: $\sigma(x) = \frac{1}{1 + e^{-x}}$
Key Characteristics:
Smooth and Non-Monotonic: Unlike ReLU which has a sharp corner at zero, this function is infinitely differentiable, providing smoother gradients during backpropagation.
Self-Gating Mechanism: The output is bounded below (approaching 0 for large negative inputs) but unbounded above, with the sigmoid component acting as a soft gate that modulates the input.
Non-Zero for Negative Inputs: Unlike ReLU which outputs zero for all negative inputs (potentially causing "dying neurons"), this function can output small negative values, maintaining gradient flow.
Asymptotic Behavior:
Your Task: Write a Python function that computes the self-gated activation value for a given input. The function should return the result rounded to 4 decimal places for numerical precision.
x = 1.00.7311For x = 1.0:
First, compute the sigmoid: σ(1.0) = 1 / (1 + e⁻¹) = 1 / (1 + 0.3679) ≈ 0.7311
Then, multiply by the input: f(1.0) = 1.0 × 0.7311 = 0.7311
Notice that for positive inputs, the self-gated activation returns a value close to but slightly less than the input itself, due to the gating mechanism (sigmoid is < 1 for finite values).
x = 2.01.7616For x = 2.0:
Compute the sigmoid: σ(2.0) = 1 / (1 + e⁻²) = 1 / (1 + 0.1353) ≈ 0.8808
Multiply by input: f(2.0) = 2.0 × 0.8808 = 1.7616
As the input increases, the sigmoid approaches 1, so the output gets closer to the identity function. Here, f(2.0) ≈ 0.88 × 2.0, showing the gate is nearly fully open.
x = -1.0-0.2689For x = -1.0:
Compute the sigmoid: σ(-1.0) = 1 / (1 + e¹) = 1 / (1 + 2.7183) ≈ 0.2689
Multiply by input: f(-1.0) = -1.0 × 0.2689 = -0.2689
This demonstrates a key advantage over ReLU: negative inputs produce small negative outputs rather than being completely zeroed out. This preserves gradient flow and prevents the 'dying neuron' problem.
Constraints