00:00:00

Description

Editorial

Channel-wise Batch Normalization for 4D Feature Maps

EASY10 pts

Batch Normalization is a transformative technique in deep learning that accelerates training and improves model stability by normalizing activations within a neural network. When applied to convolutional neural networks (CNNs), batch normalization operates on 4D feature map tensors in the BCHW format: Batch × Channels × Height × Width.

The core principle of batch normalization is to standardize inputs to each layer by centering activations around zero mean and unit variance. For 4D feature maps, this normalization is performed per-channel across all spatial locations and all samples in the batch.

Mathematical Formulation:

For a 4D tensor X with shape (B, C, H, W), the batch normalization for each channel c is computed as follows:

Compute the mean across the batch and spatial dimensions: $$\mu_c = \frac{1}{B \cdot H \cdot W} \sum_{b=1}^{B} \sum_{h=1}^{H} \sum_{w=1}^{W} X_{b,c,h,w}$$
Compute the variance across the same dimensions: $$\sigma_c^2 = \frac{1}{B \cdot H \cdot W} \sum_{b=1}^{B} \sum_{h=1}^{H} \sum_{w=1}^{W} (X_{b,c,h,w} - \mu_c)^2$$
Normalize the input using the computed statistics: $$\hat{X}{b,c,h,w} = \frac{X{b,c,h,w} - \mu_c}{\sqrt{\sigma_c^2 + \epsilon}}$$
Apply the affine transformation using learnable parameters: $$Y_{b,c,h,w} = \gamma_c \cdot \hat{X}_{b,c,h,w} + \beta_c$$

Where:

γ (gamma) is the per-channel scale parameter
β (beta) is the per-channel shift parameter
ε (epsilon) is a small constant (typically 1e-5) added for numerical stability to prevent division by zero

Your Task: Implement a function that performs batch normalization on a 4D NumPy array in BCHW format. The function should normalize the input tensor across the batch and spatial dimensions for each channel independently, then apply the provided scale (gamma) and shift (beta) parameters.

Example

Input

X = [[[[0.49671415, -0.1382643], [0.64768854, 1.52302986]], [[-0.23415337, -0.23413696], [1.57921282, 0.76743473]]], [[[-0.46947439, 0.54256004], [-0.46341769, -0.46572975]], [[0.24196227, -1.91328024], [-1.72491783, -0.56228753]]]]
gamma = [1.0, 1.0]
beta = [0.0, 0.0]
epsilon = 1e-5

Output

[[[[0.42860, -0.51776], [0.65361, 1.95821]], [[0.02354, 0.02355], [1.67355, 0.9349]]], [[[-1.0114, 0.49693], [-1.00237, -1.00581]], [[0.45676, -1.50433], [-1.33294, -0.27504]]]]

Explanation

The input X is a 4D tensor with shape (2, 2, 2, 2) - 2 samples in the batch, 2 channels, and 2×2 spatial dimensions.

For Channel 0:

Mean μ₀ is computed across all 8 elements (2 batch × 2 height × 2 width)
Variance σ₀² is computed similarly
Each element is normalized: (x - μ₀) / √(σ₀² + ε)

For Channel 1:

Same process is applied independently to channel 1's elements

With gamma = [1, 1] and beta = [0, 0], the output is the pure normalized values without additional scaling or shifting.

Example

Input

X = [[[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]], [[[9.0, 10.0], [11.0, 12.0]], [[13.0, 14.0], [15.0, 16.0]]]]
gamma = [1.0, 1.0]
beta = [0.0, 0.0]
epsilon = 1e-5

Output

[[[[-1.32424, -1.08347], [-0.8427, -0.60193]], [[-1.32424, -1.08347], [-0.8427, -0.60193]]], [[[0.60193, 0.8427], [1.08347, 1.32424]], [[0.60193, 0.8427], [1.08347, 1.32424]]]]

Explanation

With a sequential pattern in the input tensor, notice how the normalized output is symmetric:

Channel 0: Values 1-4 (first batch) and 9-12 (second batch) get normalized together. The first batch elements become negative (below mean), while second batch elements become positive (above mean).

Channel 1: Similarly, values 5-8 and 13-16 are normalized, producing the same pattern as Channel 0 since both channels have the same relative distribution.

This demonstrates how batch normalization centers each channel's activations around zero.

Example

Input

X = [[[[0.0, 1.0], [1.0, 0.0]], [[2.0, 3.0], [3.0, 2.0]]], [[[1.0, 0.0], [0.0, 1.0]], [[3.0, 2.0], [2.0, 3.0]]]]
gamma = [2.0, 0.5]
beta = [1.0, -1.0]
epsilon = 1e-5

Output

[[[[-1.0, 3.0], [3.0, -1.0]], [[-1.5, -0.5], [-0.5, -1.5]]], [[[3.0, -1.0], [-1.0, 3.0]], [[-0.5, -1.5], [-1.5, -0.5]]]]

Explanation

This example demonstrates the effect of non-trivial gamma and beta values:

Channel 0 (gamma=2.0, beta=1.0):

After normalization, values are scaled by 2.0 and shifted by 1.0
The binary pattern (0s and 1s) becomes a -1 and 3 pattern
Formula: y = 2.0 × normalized + 1.0

Channel 1 (gamma=0.5, beta=-1.0):

After normalization, values are scaled by 0.5 and shifted by -1.0
The range is compressed and shifted into negative territory
Formula: y = 0.5 × normalized - 1.0

This shows how learnable parameters can restore representational power after normalization.

Accepted0/0·0% Acceptance

Constraints

1 ≤ B ≤ 32 (batch size)
1 ≤ C ≤ 512 (number of channels)
1 ≤ H, W ≤ 256 (spatial dimensions)
gamma has shape (C,) matching the number of channels
beta has shape (C,) matching the number of channels
1e-8 ≤ epsilon ≤ 1e-3
All input values are floating-point numbers
The input tensor X is guaranteed to be a valid 4D array in BCHW format

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[[[0,1],[1,0]],[[2,3],[3,2]]],[[[1,0],[0,1]],[[3,2],[2,3]]]]

beta =

[1,-1]

gamma =

[2,0.5]

epsilon =

0.00001

Loading problem...

101

00:00:00

Description

Editorial

Channel-wise Batch Normalization for 4D Feature Maps

EASY10 pts

Mathematical Formulation:

For a 4D tensor X with shape (B, C, H, W), the batch normalization for each channel c is computed as follows:

Compute the mean across the batch and spatial dimensions: $$\mu_c = \frac{1}{B \cdot H \cdot W} \sum_{b=1}^{B} \sum_{h=1}^{H} \sum_{w=1}^{W} X_{b,c,h,w}$$
Compute the variance across the same dimensions: $$\sigma_c^2 = \frac{1}{B \cdot H \cdot W} \sum_{b=1}^{B} \sum_{h=1}^{H} \sum_{w=1}^{W} (X_{b,c,h,w} - \mu_c)^2$$
Normalize the input using the computed statistics: $$\hat{X}{b,c,h,w} = \frac{X{b,c,h,w} - \mu_c}{\sqrt{\sigma_c^2 + \epsilon}}$$
Apply the affine transformation using learnable parameters: $$Y_{b,c,h,w} = \gamma_c \cdot \hat{X}_{b,c,h,w} + \beta_c$$

Where:

γ (gamma) is the per-channel scale parameter
β (beta) is the per-channel shift parameter
ε (epsilon) is a small constant (typically 1e-5) added for numerical stability to prevent division by zero

Example

Input

X = [[[[0.49671415, -0.1382643], [0.64768854, 1.52302986]], [[-0.23415337, -0.23413696], [1.57921282, 0.76743473]]], [[[-0.46947439, 0.54256004], [-0.46341769, -0.46572975]], [[0.24196227, -1.91328024], [-1.72491783, -0.56228753]]]]
gamma = [1.0, 1.0]
beta = [0.0, 0.0]
epsilon = 1e-5

Output

[[[[0.42860, -0.51776], [0.65361, 1.95821]], [[0.02354, 0.02355], [1.67355, 0.9349]]], [[[-1.0114, 0.49693], [-1.00237, -1.00581]], [[0.45676, -1.50433], [-1.33294, -0.27504]]]]

Explanation

The input X is a 4D tensor with shape (2, 2, 2, 2) - 2 samples in the batch, 2 channels, and 2×2 spatial dimensions.

For Channel 0:

Mean μ₀ is computed across all 8 elements (2 batch × 2 height × 2 width)
Variance σ₀² is computed similarly
Each element is normalized: (x - μ₀) / √(σ₀² + ε)

For Channel 1:

Same process is applied independently to channel 1's elements

With gamma = [1, 1] and beta = [0, 0], the output is the pure normalized values without additional scaling or shifting.

Example

Input

X = [[[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]], [[[9.0, 10.0], [11.0, 12.0]], [[13.0, 14.0], [15.0, 16.0]]]]
gamma = [1.0, 1.0]
beta = [0.0, 0.0]
epsilon = 1e-5

Output

[[[[-1.32424, -1.08347], [-0.8427, -0.60193]], [[-1.32424, -1.08347], [-0.8427, -0.60193]]], [[[0.60193, 0.8427], [1.08347, 1.32424]], [[0.60193, 0.8427], [1.08347, 1.32424]]]]

Explanation

With a sequential pattern in the input tensor, notice how the normalized output is symmetric:

Channel 1: Similarly, values 5-8 and 13-16 are normalized, producing the same pattern as Channel 0 since both channels have the same relative distribution.

This demonstrates how batch normalization centers each channel's activations around zero.

Example

Input

X = [[[[0.0, 1.0], [1.0, 0.0]], [[2.0, 3.0], [3.0, 2.0]]], [[[1.0, 0.0], [0.0, 1.0]], [[3.0, 2.0], [2.0, 3.0]]]]
gamma = [2.0, 0.5]
beta = [1.0, -1.0]
epsilon = 1e-5

Output

[[[[-1.0, 3.0], [3.0, -1.0]], [[-1.5, -0.5], [-0.5, -1.5]]], [[[3.0, -1.0], [-1.0, 3.0]], [[-0.5, -1.5], [-1.5, -0.5]]]]

Explanation

This example demonstrates the effect of non-trivial gamma and beta values:

Channel 0 (gamma=2.0, beta=1.0):

After normalization, values are scaled by 2.0 and shifted by 1.0
The binary pattern (0s and 1s) becomes a -1 and 3 pattern
Formula: y = 2.0 × normalized + 1.0

Channel 1 (gamma=0.5, beta=-1.0):

After normalization, values are scaled by 0.5 and shifted by -1.0
The range is compressed and shifted into negative territory
Formula: y = 0.5 × normalized - 1.0

This shows how learnable parameters can restore representational power after normalization.

Accepted0/0·0% Acceptance

Constraints

1 ≤ B ≤ 32 (batch size)
1 ≤ C ≤ 512 (number of channels)
1 ≤ H, W ≤ 256 (spatial dimensions)
gamma has shape (C,) matching the number of channels
beta has shape (C,) matching the number of channels
1e-8 ≤ epsilon ≤ 1e-3
All input values are floating-point numbers
The input tensor X is guaranteed to be a valid 4D array in BCHW format

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[[[0,1],[1,0]],[[2,3],[3,2]]],[[[1,0],[0,1]],[[3,2],[2,3]]]]

beta =

[1,-1]

gamma =

[2,0.5]

epsilon =

0.00001

Channel-wise Batch Normalization for 4D Feature Maps

Hints

Channel-wise Batch Normalization for 4D Feature Maps

Hints