0/318

00:00:00

Description

Editorial

Adaptive Hyperbolic Tangent Transformation

EASY10 pts

In modern deep learning architectures, normalization layers like Layer Normalization and Batch Normalization play a crucial role in stabilizing training and accelerating convergence. However, these layers introduce computational overhead by requiring explicit statistics computation across features or batches.

The Adaptive Hyperbolic Tangent Transformation offers an elegant, normalization-free alternative that preserves the essential squashing behavior required for stable training while eliminating the need for computing running statistics. This transformation combines the bounded, smooth properties of the hyperbolic tangent function with learnable affine parameters.

Mathematical Formulation:

Given an input tensor x with feature dimension D, and learnable parameters:

α (alpha): A scalar scaling coefficient that controls the input sensitivity
γ (gamma): A scale vector of shape (D,) for post-transformation scaling
β (beta): A shift vector of shape (D,) for post-transformation bias

The transformation is computed as:

$$\text{output} = \gamma \cdot \tanh(\alpha \cdot x) + \beta$$

Key Properties:

Bounded Output: The tanh function squashes values to the range (-1, 1), and the learnable γ and β allow rescaling to any desired range
Smooth Gradient: Unlike ReLU, tanh provides smooth gradients throughout its domain, reducing the risk of dead neurons
Zero-Centered: The output naturally centers around zero (when β = 0), which can aid optimization
Computationally Efficient: No batch or layer statistics need to be maintained or computed

Broadcasting Semantics:

The transformation applies element-wise to the input tensor. The gamma and beta parameters are broadcast across all dimensions except the last (feature) dimension. This means:

For input shape (B, D): gamma and beta of shape (D,) are applied to each sample
For input shape (B, S, D): gamma and beta are broadcast across batch and sequence dimensions
For input shape (B, C, H, W) with feature dimension W: gamma and beta of shape (W,) are applied appropriately

Your Task:

Implement a function that applies this adaptive hyperbolic tangent transformation to an input tensor. The function should correctly handle tensors of arbitrary shape, applying the transformation element-wise while properly broadcasting the gamma and beta parameters along the feature dimension.

Example

Input

x = [[[0.1412, 0.0037, 0.2413, 0.2218]]]
alpha = 0.5
gamma = [1.0, 1.0, 1.0, 1.0]
beta = [0.0, 0.0, 0.0, 0.0]

Output

[[[0.0705, 0.0019, 0.1201, 0.1105]]]

Explanation

With gamma=1 and beta=0, the output is simply tanh(alpha * x):

• tanh(0.5 × 0.1412) = tanh(0.0706) ≈ 0.0705 • tanh(0.5 × 0.0037) = tanh(0.00185) ≈ 0.0019 • tanh(0.5 × 0.2413) = tanh(0.12065) ≈ 0.1201 • tanh(0.5 × 0.2218) = tanh(0.1109) ≈ 0.1105

The alpha scaling controls how steeply the tanh saturates for large inputs.

Example

Input

x = [[[0.5, -0.5, 0.0, 1.0]]]
alpha = 1.0
gamma = [1.0, 2.0, 1.5, 0.5]
beta = [0.0, 0.1, -0.1, 0.2]

Output

[[[0.4621, -0.8242, -0.1, 0.5808]]]

Explanation

Each element undergoes the full transformation gamma * tanh(alpha * x) + beta:

• γ₁ × tanh(1.0 × 0.5) + β₁ = 1.0 × 0.4621 + 0.0 = 0.4621 • γ₂ × tanh(1.0 × -0.5) + β₂ = 2.0 × (-0.4621) + 0.1 = -0.8242 • γ₃ × tanh(1.0 × 0.0) + β₃ = 1.5 × 0.0 + (-0.1) = -0.1 • γ₄ × tanh(1.0 × 1.0) + β₄ = 0.5 × 0.7616 + 0.2 = 0.5808

The gamma scales and beta shifts allow fine-grained control over the output distribution.

Example

Input

x = [[[[0.1, 0.2], [0.3, 0.4]], [[-0.1, -0.2], [-0.3, -0.4]]]]
alpha = 0.5
gamma = [1.0, 1.0]
beta = [0.0, 0.0]

Output

[[[[0.05, 0.0997], [0.1489, 0.1974]], [[-0.05, -0.0997], [-0.1489, -0.1974]]]]

Explanation

For 4D tensors with shape (1, 2, 2, 2), the gamma and beta of shape (2,) broadcast across all spatial dimensions. Each position applies tanh(0.5 * x) independently:

• Positive values: tanh(0.5 × 0.1) ≈ 0.05, tanh(0.5 × 0.2) ≈ 0.0997, etc. • Negative values: The transformation is odd-symmetric, so tanh(-z) = -tanh(z)

This demonstrates how the transformation naturally handles multi-dimensional feature maps.

Accepted0/0·0% Acceptance

Constraints

Input tensor x can have arbitrary dimensions with the last dimension being the feature dimension D
1 ≤ D ≤ 1024 (feature dimension)
gamma and beta must have shape (D,) matching the last dimension of x
-10⁶ ≤ x[i] ≤ 10⁶ (input values)
-10 ≤ alpha ≤ 10 (scaling coefficient)
-10⁶ ≤ gamma[i], beta[i] ≤ 10⁶ (learnable parameters)
Output values should be rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

x =

[[[0.1412,0.0037,0.2413,0.2218]]]

beta =

[0,0,0,0]

alpha =

0.5

gamma =

[1,1,1,1]

Adaptive Hyperbolic Tangent Transformation

Hints

Adaptive Hyperbolic Tangent Transformation

Hints