0/318

00:00:00

Description

Editorial

Channel-Wise Spatial Normalization

MEDIUM20 pts

In deep learning for computer vision, normalization techniques play a critical role in stabilizing training and improving model convergence. While batch normalization computes statistics across the entire batch, there are scenarios—particularly in image style transfer, image generation, and domain adaptation—where normalizing each sample independently across its spatial dimensions proves more effective.

Channel-Wise Spatial Normalization operates on 4D tensors of shape (B, C, H, W), where:

B represents the batch size (number of samples)
C represents the number of channels (e.g., RGB channels or feature maps)
H represents the spatial height
W represents the spatial width

For each individual sample in the batch and each channel independently, this normalization technique:

Computes the mean (μ) across all spatial locations (H × W elements)
Computes the standard deviation (σ) across those same spatial locations
Normalizes each element by subtracting the mean and dividing by the standard deviation
Applies a learned scale (γ) and shift (β) to restore representational power

Mathematical Formulation:

For a given sample b and channel c, let x_{b,c} be the 2D slice of shape (H, W). The normalized output is computed as:

$$\mu_{b,c} = \frac{1}{H \times W} \sum_{h=1}^{H} \sum_{w=1}^{W} x_{b,c,h,w}$$

$$\sigma_{b,c} = \sqrt{\frac{1}{H \times W} \sum_{h=1}^{H} \sum_{w=1}^{W} (x_{b,c,h,w} - \mu_{b,c})^2 + \epsilon}$$

$$\hat{x}{b,c,h,w} = \frac{x{b,c,h,w} - \mu_{b,c}}{\sigma_{b,c}}$$

$$y_{b,c,h,w} = \gamma_c \cdot \hat{x}_{b,c,h,w} + \beta_c$$

Where ε (epsilon) is a small constant (typically 1e-5) added for numerical stability to prevent division by zero.

Key Distinction: Unlike batch normalization (which normalizes across the batch dimension), channel-wise spatial normalization treats each sample independently. This makes it particularly useful when:

Batch sizes are small or equal to 1 (as in inference)
The style or domain of each sample differs significantly
Image generation tasks require per-sample feature adaptation

Your Task: Implement the channel_wise_spatial_normalization function that takes an input tensor X of shape (B, C, H, W), along with learnable parameters gamma and beta of shape (C,), and returns the normalized tensor of the same shape.

Example

Input

X = [
  [
    [[0.497, -0.138], [0.648, 1.523]],
    [[-0.234, -0.234], [1.579, 0.767]]
  ],
  [
    [[-0.469, 0.543], [-0.463, -0.466]],
    [[0.242, -1.913], [-1.725, -0.562]]
  ]
]
gamma = [1.0, 1.0]
beta = [0.0, 0.0]

Output

[
  [
    [[-0.229, -1.300], [0.026, 1.502]],
    [[-0.926, -0.926], [1.460, 0.392]]
  ],
  [
    [[-0.585, 1.732], [-0.571, -0.576]],
    [[1.401, -1.050], [-0.836, 0.486]]
  ]
]

Explanation

The input tensor has shape (2, 2, 2, 2), meaning 2 samples, 2 channels, and 2×2 spatial dimensions.

For sample 0, channel 0:

Values: [0.497, -0.138, 0.648, 1.523]
Mean μ = (0.497 - 0.138 + 0.648 + 1.523) / 4 ≈ 0.6325
After normalization and with γ=1, β=0: The values are centered around 0 with unit-like variance

For sample 0, channel 1:

Values: [-0.234, -0.234, 1.579, 0.767]
Mean is computed across these 4 spatial elements
Standard deviation captures the spread of these values
Normalized output scales to approximately unit variance

Since γ = [1.0, 1.0] and β = [0.0, 0.0], the output is simply the standardized values without additional scaling or shifting.

Example

Input

X = [
  [
    [[0.497, -0.138, 0.648], [1.523, -0.234, -0.234], [1.579, 0.767, -0.469]],
    [[0.543, -0.463, -0.466], [0.242, -1.913, -1.725], [-0.562, -1.013, 0.314]]
  ]
]
gamma = [2.0, 0.5]
beta = [1.0, -1.0]

Output

[
  [
    [[1.164, -0.595, 1.582], [4.006, -0.860, -0.860], [4.161, 1.913, -1.512]],
    [[-0.327, -0.941, -0.942], [-0.510, -1.826, -1.711], [-1.001, -1.276, -0.466]]
  ]
]

Explanation

This example demonstrates the effect of non-trivial gamma and beta values.

For channel 0:

γ = 2.0, β = 1.0
After normalizing to zero mean and unit variance, values are scaled by 2.0 and shifted by 1.0
This explains the larger output range centered around 1.0

For channel 1:

γ = 0.5, β = -1.0
After normalization, values are scaled by 0.5 (halving the spread) and shifted by -1.0
This results in a compressed range centered around -1.0

The gamma and beta parameters are per-channel, allowing the network to learn optimal scales and shifts for each feature channel independently.

Example

Input

X = [
  [
    [[0.257, -0.908, -0.379, -0.535], [0.858, -0.413, 0.498, 2.010], [1.263, -0.439, -0.346, 0.455], [-1.669, -0.862, 0.493, -0.124]]
  ]
]
gamma = [1.5]
beta = [0.5]

Output

[
  [
    [[0.921, -1.064, -0.161, -0.428], [1.944, -0.220, 1.331, 3.906], [2.633, -0.265, -0.107, 1.258], [-2.358, -0.985, 1.322, 0.271]]
  ]
]

Explanation

A single-sample, single-channel example with larger spatial dimensions (4×4 = 16 elements).

Process:

Compute mean across all 16 spatial elements
Compute standard deviation across those elements
Normalize each element: (x - μ) / σ
Apply gamma = 1.5 and beta = 0.5: output = 1.5 × normalized + 0.5

The 16 spatial values now have a controlled distribution, scaled by 1.5 and shifted by 0.5, demonstrating how the normalization adapts to arbitrary spatial dimensions while preserving per-channel learnable parameters.

Accepted0/0·0% Acceptance

Constraints

1 ≤ B ≤ 32 (batch size)
1 ≤ C ≤ 512 (number of channels)
1 ≤ H, W ≤ 256 (spatial dimensions)
gamma.shape == (C,) and beta.shape == (C,)
All input values are floating-point numbers
-10⁴ ≤ X[b][c][h][w] ≤ 10⁴
-10² ≤ gamma[c], beta[c] ≤ 10²
Use epsilon = 1e-5 for numerical stability

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[[[0.4967141530112327,-0.13826430117118466],[0.6476885381006925,1.5230298564080254]],[[-0.23415337472333597,-0.23413695694918055],[1.5792128155073915,0.7674347291529088]]],[[[-0.4694743859349521,0.5425600435859647],[-0.46341769281246226,-0.46572975357025687]],[[0.24196227156603412,-1.913280244657798],[-1.7249178325130328,-0.5622875292409727]]]]

beta =

[0,0]

gamma =

[1,1]

Channel-Wise Spatial Normalization

Hints

Channel-Wise Spatial Normalization

Hints