Loading content...
In modern deep learning architectures, activation functions play a critical role in introducing nonlinearity, enabling neural networks to learn complex patterns and representations. Among the many activation functions available, those based on Gaussian probability theory have gained significant attention for their unique properties.
The Gaussian Error-based Nonlinear Activator is a smooth, continuously differentiable activation function that combines the input value with the probability of it being greater than other values under a standard normal distribution. Unlike simpler activators like ReLU which have hard cutoffs, this function creates a soft gating mechanism where inputs are weighted by their own probability under the Gaussian CDF.
Mathematical Formulation: The activation is computed as:
$$\text{Activator}(x) = x \cdot \Phi(x)$$
Where Φ(x) is the cumulative distribution function (CDF) of the standard normal distribution:
$$\Phi(x) = \frac{1}{2} \left[1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right]$$
This can also be approximated using the tanh function for computational efficiency:
$$\text{Activator}(x) \approx 0.5x \left[1 + \tanh\left(\sqrt{\frac{2}{\pi}}\left(x + 0.044715x^3\right)\right)\right]$$
Key Properties:
Your Task: Implement a function that applies this Gaussian-based nonlinear activation to a NumPy array of input values. The function should:
logits = np.array([-2.0, -1.0, 0.0, 1.0, 2.0])[-0.0455, -0.1587, 0.0, 0.8413, 1.9545]Each value is processed through the Gaussian-based activation:
• For x = -2.0: The CDF Φ(-2.0) ≈ 0.0228, so output = -2.0 × 0.0228 ≈ -0.0455 • For x = -1.0: The CDF Φ(-1.0) ≈ 0.1587, so output = -1.0 × 0.1587 ≈ -0.1587 • For x = 0.0: The CDF Φ(0.0) = 0.5, so output = 0.0 × 0.5 = 0.0 • For x = 1.0: The CDF Φ(1.0) ≈ 0.8413, so output = 1.0 × 0.8413 ≈ 0.8413 • For x = 2.0: The CDF Φ(2.0) ≈ 0.9772, so output = 2.0 × 0.9772 ≈ 1.9545
Note how positive values are largely preserved (gated open) while negative values are suppressed (gated toward zero).
logits = np.array([0.5, 1.5, 2.5, 3.5])[0.3457, 1.3998, 2.4845, 3.4992]For positive inputs, the activation behaves nearly linearly for large values:
• For x = 0.5: Φ(0.5) ≈ 0.6915, output = 0.5 × 0.6915 ≈ 0.3457 • For x = 1.5: Φ(1.5) ≈ 0.9332, output = 1.5 × 0.9332 ≈ 1.3998 • For x = 2.5: Φ(2.5) ≈ 0.9938, output = 2.5 × 0.9938 ≈ 2.4845 • For x = 3.5: Φ(3.5) ≈ 0.9998, output = 3.5 × 0.9998 ≈ 3.4992
As x increases, Φ(x) approaches 1, so the output approaches x itself.
logits = np.array([-0.5, -1.5, -2.5, -3.5])[-0.1543, -0.1002, -0.0155, -0.0008]For negative inputs, the activation suppresses the magnitude significantly:
• For x = -0.5: Φ(-0.5) ≈ 0.3085, output = -0.5 × 0.3085 ≈ -0.1543 • For x = -1.5: Φ(-1.5) ≈ 0.0668, output = -1.5 × 0.0668 ≈ -0.1002 • For x = -2.5: Φ(-2.5) ≈ 0.0062, output = -2.5 × 0.0062 ≈ -0.0155 • For x = -3.5: Φ(-3.5) ≈ 0.00023, output = -3.5 × 0.00023 ≈ -0.0008
The more negative the input, the more it gets suppressed toward zero.
Constraints