Parametric Rectified Linear Activation (Easy) — Practice with Code Visualizer

The Parametric Rectified Linear Unit (PReLU) is an advanced activation function used extensively in deep learning architectures. It builds upon the standard Rectified Linear Unit (ReLU) by introducing a learnable slope parameter α (alpha) that controls the behavior for negative input values, addressing the "dying ReLU" problem where neurons can become permanently inactive.

Mathematical Formulation

The PReLU activation function is defined piecewise as:

$$\text{PReLU}(x) = \begin{cases} x & \text{if } x > 0 \ \alpha \cdot x & \text{if } x \leq 0 \end{cases}$$

Where:

x is the input to the activation function
α (alpha) is a learnable parameter that determines the slope for negative inputs

This can also be expressed compactly as:

$$\text{PReLU}(x) = \max(0, x) + \alpha \cdot \min(0, x)$$

Understanding the Behavior

For positive inputs (x > 0): The function behaves identically to ReLU, returning the input unchanged. This preserves the gradient flow for positive activations.

For negative inputs (x ≤ 0): Instead of outputting zero (as ReLU does), PReLU multiplies the input by the parameter α. This creates a scaled linear response in the negative domain, allowing gradients to propagate through the network even for negative activations.

The α Parameter

The alpha parameter is typically:

Learned during training through backpropagation, making it adaptive to the data
Initialized to a small value like 0.25 or 0.01
Can be shared across channels or learned independently per channel in convolutional networks

When α = 0, PReLU reduces to standard ReLU. When α = 1, it becomes the identity function. When α < 0, it creates a non-monotonic function (rarely used).

Your Task

Implement a function that computes the PReLU activation value for a given input value x and alpha parameter. The function should correctly handle the piecewise nature of the activation, applying the identity function for positive inputs and the scaled linear function for non-positive inputs.

Since x = -2.0 is negative (x ≤ 0), we apply the negative branch of the PReLU function:

PReLU(x) = α × x = 0.25 × (-2.0) = -0.5

The negative input is scaled by alpha, allowing a small gradient to flow through during backpropagation, which helps prevent dead neurons.

Since x = 3.0 is positive (x > 0), we apply the positive branch of the PReLU function:

PReLU(x) = x = 3.0

For positive inputs, PReLU behaves identically to ReLU, returning the input unchanged. The alpha parameter has no effect when x > 0.

At the boundary when x = 0, we apply the non-positive branch:

PReLU(x) = α × x = 0.25 × 0.0 = 0.0

Note that both branches technically give the same result at x = 0, making the function continuous. This ensures smooth gradient flow at the origin.

Mathematical Formulation

The PReLU activation function is defined piecewise as:

$$\text{PReLU}(x) = \begin{cases} x & \text{if } x > 0 \ \alpha \cdot x & \text{if } x \leq 0 \end{cases}$$

Where:

x is the input to the activation function
α (alpha) is a learnable parameter that determines the slope for negative inputs

This can also be expressed compactly as:

$$\text{PReLU}(x) = \max(0, x) + \alpha \cdot \min(0, x)$$

Understanding the Behavior

For positive inputs (x > 0): The function behaves identically to ReLU, returning the input unchanged. This preserves the gradient flow for positive activations.

The α Parameter

The alpha parameter is typically:

Learned during training through backpropagation, making it adaptive to the data
Initialized to a small value like 0.25 or 0.01
Can be shared across channels or learned independently per channel in convolutional networks

When α = 0, PReLU reduces to standard ReLU. When α = 1, it becomes the identity function. When α < 0, it creates a non-monotonic function (rarely used).

Your Task

Since x = -2.0 is negative (x ≤ 0), we apply the negative branch of the PReLU function:

PReLU(x) = α × x = 0.25 × (-2.0) = -0.5

The negative input is scaled by alpha, allowing a small gradient to flow through during backpropagation, which helps prevent dead neurons.

Since x = 3.0 is positive (x > 0), we apply the positive branch of the PReLU function:

PReLU(x) = x = 3.0

For positive inputs, PReLU behaves identically to ReLU, returning the input unchanged. The alpha parameter has no effect when x > 0.

At the boundary when x = 0, we apply the non-positive branch:

PReLU(x) = α × x = 0.25 × 0.0 = 0.0

Note that both branches technically give the same result at x = 0, making the function continuous. This ensures smooth gradient flow at the origin.

Parametric Rectified Linear Activation

Mathematical Formulation

Understanding the Behavior

The α Parameter

Your Task

Hints

Parametric Rectified Linear Activation

Mathematical Formulation

Understanding the Behavior

The α Parameter

Your Task

Hints