Loading content...
In deep learning, the initial values assigned to neural network weights play a crucial role in determining whether a model trains successfully or fails to converge. The Glorot initialization scheme (also known as Xavier initialization) is a widely adopted weight initialization strategy designed to maintain consistent variance of activations and gradients across network layers.
When weights are initialized with values that are too large, activations can explode as they propagate through the network, leading to numerical overflow. Conversely, when weights are too small, gradients become vanishingly tiny during backpropagation, effectively preventing any learning from occurring. Both scenarios result in training failure.
The Glorot initialization method addresses this by scaling the initial weights based on the fan-in (number of input connections) and fan-out (number of output connections) of each layer. This scaling ensures that the variance of activations remains approximately constant as signals flow forward through the network, and that gradient magnitudes remain stable during backpropagation.
Your implementation should support two distribution modes:
Initialize weights by sampling from a uniform distribution U(-limit, limit), where:
$$\text{limit} = \sqrt{\frac{6}{\text{fan_in} + \text{fan_out}}}$$
Initialize weights by sampling from a normal (Gaussian) distribution N(0, σ²), where the standard deviation is:
$$\sigma = \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}$$
Write a Python function that implements the Glorot weight initialization scheme. The function should:
(fan_in, fan_out) specifying the weight matrix dimensions'uniform' or 'normal' distributionThis initialization technique is fundamental to modern deep learning and is the default weight initialization method in many popular frameworks.
shape = (3, 3)
mode = 'uniform'
seed = 42[[-0.2509, 0.9014, 0.464], [0.1973, -0.688, -0.688], [-0.8838, 0.7324, 0.2022]]For a 3×3 weight matrix with uniform initialization:
• fan_in = 3, fan_out = 3 • limit = √(6 / (3 + 3)) = √(6 / 6) = √1 = 1.0 • Weights are sampled from U(-1.0, 1.0)
Using seed=42 for reproducibility, NumPy's random generator produces values within this range. After rounding to 4 decimal places, the resulting weight matrix preserves the variance properties needed for stable training.
shape = (2, 2)
mode = 'normal'
seed = 42[[0.3512, -0.0978], [0.458, 1.0769]]For a 2×2 weight matrix with normal initialization:
• fan_in = 2, fan_out = 2 • σ = √(2 / (2 + 2)) = √(2 / 4) = √0.5 ≈ 0.7071 • Weights are sampled from N(0, 0.7071²)
Using seed=42, the random generator produces normally distributed values with the calculated standard deviation. The resulting matrix maintains appropriate variance for gradient flow during training.
shape = (3, 4)
mode = 'uniform'
seed = 42[[-0.2323, 0.8346, 0.4296, 0.1827], [-0.6369, -0.637, -0.8183, 0.678], [0.1872, 0.3853, -0.8877, 0.8701]]For a 3×4 weight matrix (asymmetric dimensions) with uniform initialization:
• fan_in = 3, fan_out = 4 • limit = √(6 / (3 + 4)) = √(6 / 7) ≈ 0.9258 • Weights are sampled from U(-0.9258, 0.9258)
This example demonstrates how asymmetric layer sizes affect the initialization range. The limit adapts to account for the different numbers of input and output connections, ensuring balanced variance regardless of layer shape.
Constraints