Spatial Mean Downsampling (Easy) — Practice with Code Visualizer

Spatial Mean Downsampling is a fundamental dimensionality reduction technique widely employed in modern convolutional neural networks and image processing pipelines. This operation systematically reduces the spatial dimensions of a feature map while preserving the most representative information from localized regions.

Given a 2D numerical matrix (often representing an image or feature map) and a specified window size, the operation divides the input into non-overlapping square regions of the given size. For each region, the arithmetic mean of all contained values is computed, producing a single representative value in the output matrix.

Mathematical Formulation:

For an input matrix X of dimensions H × W and a pooling window of size k × k, the output matrix Y has dimensions (H/k) × (W/k). Each output element is computed as:

$$Y_{i,j} = \frac{1}{k^2} \sum_{p=0}^{k-1} \sum_{q=0}^{k-1} X_{(i \cdot k + p), (j \cdot k + q)}$$

Key Properties:

Dimensionality Reduction: The spatial dimensions are reduced by a factor equal to the pool size in each dimension
Translation Invariance: Small shifts in input produce similar outputs, making models robust to positional variations
Noise Reduction: Averaging naturally smooths out high-frequency noise and minor fluctuations
Feature Abstraction: Converts fine-grained local details into coarser regional summaries

Your Task: Implement a function that performs spatial mean downsampling on a 2D matrix. The function receives the input matrix and the pooling window size, then returns the downsampled result. You may assume that the input dimensions are evenly divisible by the pool size.

The 4×4 input matrix is divided into four non-overlapping 2×2 regions:

• Top-Left Region [1, 2, 5, 6]: Mean = (1 + 2 + 5 + 6) ÷ 4 = 14 ÷ 4 = 3.5 • Top-Right Region [3, 4, 7, 8]: Mean = (3 + 4 + 7 + 8) ÷ 4 = 22 ÷ 4 = 5.5 • Bottom-Left Region [9, 10, 13, 14]: Mean = (9 + 10 + 13 + 14) ÷ 4 = 46 ÷ 4 = 11.5 • Bottom-Right Region [11, 12, 15, 16]: Mean = (11 + 12 + 15 + 16) ÷ 4 = 54 ÷ 4 = 13.5

The resulting 2×2 output matrix is [[3.5, 5.5], [11.5, 13.5]].

The entire 2×2 input matrix forms a single pooling region.

• Single Region [1, 2, 3, 4]: Mean = (1 + 2 + 3 + 4) ÷ 4 = 10 ÷ 4 = 2.5

The output is a 1×1 matrix containing the single value [[2.5]]. This represents the extreme case where the pool size equals the input dimensions, reducing the entire matrix to a single scalar value.

The 6×6 input matrix is divided into four non-overlapping 3×3 regions:

• Top-Left 3×3 Region: Contains values 1,2,3,7,8,9,13,14,15. Mean = (1+2+3+7+8+9+13+14+15) ÷ 9 = 72 ÷ 9 = 8.0 • Top-Right 3×3 Region: Contains values 4,5,6,10,11,12,16,17,18. Mean = (4+5+6+10+11+12+16+17+18) ÷ 9 = 99 ÷ 9 = 11.0 • Bottom-Left 3×3 Region: Contains values 19,20,21,25,26,27,31,32,33. Mean = 234 ÷ 9 = 26.0 • Bottom-Right 3×3 Region: Contains values 22,23,24,28,29,30,34,35,36. Mean = 261 ÷ 9 = 29.0

The resulting 2×2 output matrix is [[8.0, 11.0], [26.0, 29.0]].