0/318

00:00:00

Description

Editorial

Kaiming Weight Initialization Strategy

EASY10 pts

Proper weight initialization is a cornerstone of training deep neural networks successfully. When weights are initialized poorly—either too small or too large—networks suffer from vanishing or exploding gradients, making learning extremely slow or impossible. The Kaiming initialization strategy (also known as He initialization, named after Kaiming He) was specifically engineered to address these challenges for networks using ReLU (Rectified Linear Unit) and its variants as activation functions.

The Problem with Random Initialization

Traditional random initialization methods often fail when applied to deep networks with ReLU activations. The ReLU function zeroes out negative values, effectively halving the variance of activations as signals propagate through layers. Without compensation, this variance shrinkage compounds exponentially, causing:

Vanishing signals: Activations approach zero in deeper layers
Dead neurons: Neurons that never activate due to consistently negative pre-activations
Gradient starvation: Extremely small gradients that prevent weight updates

The Kaiming Solution

Kaiming initialization counteracts variance decay by scaling initial weights based on the fan dimension of each layer. The key insight is to maintain consistent variance across layers by scaling weights proportionally to (\sqrt{\frac{2}{n}}), where (n) is the fan dimension.

Mode Selection

fan_in mode (default): Uses the number of input connections. Ideal for forward propagation stability—preserves signal magnitude as data flows through the network.
fan_out mode: Uses the number of output connections. Better for backward propagation stability—preserves gradient magnitude during backpropagation.

Distribution Methods

Normal Distribution: Weights are sampled from a Gaussian distribution with zero mean and standard deviation calculated as:

$$\sigma = \sqrt{\frac{2}{\text{fan}}}$$

Uniform Distribution: Weights are sampled uniformly from the interval ([-\text{bound}, +\text{bound}]) where:

$$\text{bound} = \sqrt{\frac{6}{\text{fan}}}$$

Your Task

Implement a function kaiming_weight_setup(n_in, n_out, mode, distribution, seed) that generates a weight matrix initialized according to the Kaiming strategy:

n_in: Number of input neurons (fan-in dimension)
n_out: Number of output neurons (fan-out dimension)
mode: Either 'fan_in' or 'fan_out' to determine the scaling dimension
distribution: Either 'normal' for Gaussian sampling or 'uniform' for uniform distribution
seed: Integer seed for the random number generator to ensure reproducibility

The function should return a NumPy array of shape (n_in, n_out) containing the initialized weights, with all values rounded to 4 decimal places.

Example

Input

n_in = 3, n_out = 2, mode = 'fan_in', distribution = 'normal', seed = 42

Output

[[0.4056, -0.1129], [0.5288, 1.2435], [-0.1912, -0.1912]]

Explanation

With 'fan_in' mode and n_in = 3, the fan value is 3. For the normal distribution, we compute the standard deviation as σ = √(2/3) ≈ 0.8165. Using seed 42, NumPy's random number generator produces standard normal samples which are then scaled by σ.

• The first sample from randn (≈0.4967) becomes 0.4967 × 0.8165 ≈ 0.4056 • The second sample (≈-0.1383) becomes -0.1383 × 0.8165 ≈ -0.1129

This scaling ensures that the variance of activations remains stable as signals propagate through ReLU layers.

Example

Input

n_in = 3, n_out = 2, mode = 'fan_in', distribution = 'uniform', seed = 42

Output

[[-0.3549, 1.2748], [0.6562, 0.279], [-0.9729, -0.973]]

Explanation

With fan_in mode and n_in = 3, we calculate the bound as √(6/3) = √2 ≈ 1.4142. Uniform random samples from [0, 1) are transformed to the range [-1.4142, 1.4142]. Using seed 42, the first uniform sample (≈0.3745) is mapped to 2 × 0.3745 × 1.4142 - 1.4142 ≈ -0.3549. The uniform distribution provides potentially better conditioning for certain network architectures and optimization scenarios.

Example

Input

n_in = 3, n_out = 4, mode = 'fan_out', distribution = 'normal', seed = 42

Output

[[0.3512, -0.0978, 0.458, 1.0769], [-0.1656, -0.1656, 1.1167, 0.5427], [-0.332, 0.3836, -0.3277, -0.3293]]

Explanation

Switching to 'fan_out' mode with n_out = 4, the fan value becomes 4. The standard deviation is now σ = √(2/4) = √0.5 ≈ 0.7071. This mode is preferred when gradient flow during backpropagation is the primary concern. Each weight is computed by scaling standard normal samples by this σ value. Notice the smaller scaling factor compared to fan_in mode, reflecting the larger fan dimension.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_in ≤ 1000 (number of input neurons)
1 ≤ n_out ≤ 1000 (number of output neurons)
mode ∈ {'fan_in', 'fan_out'}
distribution ∈ {'normal', 'uniform'}
0 ≤ seed ≤ 2³¹ - 1 (valid integer seed value)
Output values must be rounded to exactly 4 decimal places
Use NumPy's random number generator with the provided seed for reproducibility

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

mode =

"fan_in"

n_in =

seed =

n_out =

distribution =

"normal"

The Problem with Random Initialization

Vanishing signals: Activations approach zero in deeper layers

Dead neurons: Neurons that never activate due to consistently negative pre-activations

Gradient starvation: Extremely small gradients that prevent weight updates

The Kaiming Solution

Mode Selection

fan_in mode (default): Uses the number of input connections. Ideal for forward propagation stability—preserves signal magnitude as data flows through the network.

fan_out mode: Uses the number of output connections. Better for backward propagation stability—preserves gradient magnitude during backpropagation.

Distribution Methods

Normal Distribution: Weights are sampled from a Gaussian distribution with zero mean and standard deviation calculated as:

$$\sigma = \sqrt{\frac{2}{\text{fan}}}$$

Uniform Distribution: Weights are sampled uniformly from the interval ([-\text{bound}, +\text{bound}]) where:

$$\text{bound} = \sqrt{\frac{6}{\text{fan}}}$$

Your Task

Implement a function kaiming_weight_setup(n_in, n_out, mode, distribution, seed) that generates a weight matrix initialized according to the Kaiming strategy:

n_in: Number of input neurons (fan-in dimension)

n_out: Number of output neurons (fan-out dimension)

mode: Either 'fan_in' or 'fan_out' to determine the scaling dimension

distribution: Either 'normal' for Gaussian sampling or 'uniform' for uniform distribution

seed: Integer seed for the random number generator to ensure reproducibility

The function should return a NumPy array of shape (n_in, n_out) containing the initialized weights, with all values rounded to 4 decimal places.

Kaiming Weight Initialization Strategy

The Problem with Random Initialization

The Kaiming Solution

Mode Selection

Distribution Methods

Your Task

Hints

Kaiming Weight Initialization Strategy

The Problem with Random Initialization

The Kaiming Solution

Mode Selection

Distribution Methods

Your Task

Hints