Loading problem...
Batch Normalization is a transformative technique in deep learning that accelerates training and improves model stability by normalizing activations within a neural network. When applied to convolutional neural networks (CNNs), batch normalization operates on 4D feature map tensors in the BCHW format: Batch × Channels × Height × Width.
The core principle of batch normalization is to standardize inputs to each layer by centering activations around zero mean and unit variance. For 4D feature maps, this normalization is performed per-channel across all spatial locations and all samples in the batch.
Mathematical Formulation:
For a 4D tensor X with shape (B, C, H, W), the batch normalization for each channel c is computed as follows:
Compute the mean across the batch and spatial dimensions: $$\mu_c = \frac{1}{B \cdot H \cdot W} \sum_{b=1}^{B} \sum_{h=1}^{H} \sum_{w=1}^{W} X_{b,c,h,w}$$
Compute the variance across the same dimensions: $$\sigma_c^2 = \frac{1}{B \cdot H \cdot W} \sum_{b=1}^{B} \sum_{h=1}^{H} \sum_{w=1}^{W} (X_{b,c,h,w} - \mu_c)^2$$
Normalize the input using the computed statistics: $$\hat{X}{b,c,h,w} = \frac{X{b,c,h,w} - \mu_c}{\sqrt{\sigma_c^2 + \epsilon}}$$
Apply the affine transformation using learnable parameters: $$Y_{b,c,h,w} = \gamma_c \cdot \hat{X}_{b,c,h,w} + \beta_c$$
Where:
Your Task: Implement a function that performs batch normalization on a 4D NumPy array in BCHW format. The function should normalize the input tensor across the batch and spatial dimensions for each channel independently, then apply the provided scale (gamma) and shift (beta) parameters.
X = [[[[0.49671415, -0.1382643], [0.64768854, 1.52302986]], [[-0.23415337, -0.23413696], [1.57921282, 0.76743473]]], [[[-0.46947439, 0.54256004], [-0.46341769, -0.46572975]], [[0.24196227, -1.91328024], [-1.72491783, -0.56228753]]]]
gamma = [1.0, 1.0]
beta = [0.0, 0.0]
epsilon = 1e-5[[[[0.42860, -0.51776], [0.65361, 1.95821]], [[0.02354, 0.02355], [1.67355, 0.9349]]], [[[-1.0114, 0.49693], [-1.00237, -1.00581]], [[0.45676, -1.50433], [-1.33294, -0.27504]]]]The input X is a 4D tensor with shape (2, 2, 2, 2) - 2 samples in the batch, 2 channels, and 2×2 spatial dimensions.
For Channel 0:
For Channel 1:
With gamma = [1, 1] and beta = [0, 0], the output is the pure normalized values without additional scaling or shifting.
X = [[[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]], [[[9.0, 10.0], [11.0, 12.0]], [[13.0, 14.0], [15.0, 16.0]]]]
gamma = [1.0, 1.0]
beta = [0.0, 0.0]
epsilon = 1e-5[[[[-1.32424, -1.08347], [-0.8427, -0.60193]], [[-1.32424, -1.08347], [-0.8427, -0.60193]]], [[[0.60193, 0.8427], [1.08347, 1.32424]], [[0.60193, 0.8427], [1.08347, 1.32424]]]]With a sequential pattern in the input tensor, notice how the normalized output is symmetric:
Channel 0: Values 1-4 (first batch) and 9-12 (second batch) get normalized together. The first batch elements become negative (below mean), while second batch elements become positive (above mean).
Channel 1: Similarly, values 5-8 and 13-16 are normalized, producing the same pattern as Channel 0 since both channels have the same relative distribution.
This demonstrates how batch normalization centers each channel's activations around zero.
X = [[[[0.0, 1.0], [1.0, 0.0]], [[2.0, 3.0], [3.0, 2.0]]], [[[1.0, 0.0], [0.0, 1.0]], [[3.0, 2.0], [2.0, 3.0]]]]
gamma = [2.0, 0.5]
beta = [1.0, -1.0]
epsilon = 1e-5[[[[-1.0, 3.0], [3.0, -1.0]], [[-1.5, -0.5], [-0.5, -1.5]]], [[[3.0, -1.0], [-1.0, 3.0]], [[-0.5, -1.5], [-1.5, -0.5]]]]This example demonstrates the effect of non-trivial gamma and beta values:
Channel 0 (gamma=2.0, beta=1.0):
Channel 1 (gamma=0.5, beta=-1.0):
This shows how learnable parameters can restore representational power after normalization.
Constraints