Loading problem...
Spatial Mean Downsampling is a fundamental dimensionality reduction technique widely employed in modern convolutional neural networks and image processing pipelines. This operation systematically reduces the spatial dimensions of a feature map while preserving the most representative information from localized regions.
Given a 2D numerical matrix (often representing an image or feature map) and a specified window size, the operation divides the input into non-overlapping square regions of the given size. For each region, the arithmetic mean of all contained values is computed, producing a single representative value in the output matrix.
Mathematical Formulation:
For an input matrix X of dimensions H × W and a pooling window of size k × k, the output matrix Y has dimensions (H/k) × (W/k). Each output element is computed as:
$$Y_{i,j} = \frac{1}{k^2} \sum_{p=0}^{k-1} \sum_{q=0}^{k-1} X_{(i \cdot k + p), (j \cdot k + q)}$$
Key Properties:
Your Task: Implement a function that performs spatial mean downsampling on a 2D matrix. The function receives the input matrix and the pooling window size, then returns the downsampled result. You may assume that the input dimensions are evenly divisible by the pool size.
input_matrix = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
pool_size = 2[[3.5, 5.5], [11.5, 13.5]]The 4×4 input matrix is divided into four non-overlapping 2×2 regions:
• Top-Left Region [1, 2, 5, 6]: Mean = (1 + 2 + 5 + 6) ÷ 4 = 14 ÷ 4 = 3.5 • Top-Right Region [3, 4, 7, 8]: Mean = (3 + 4 + 7 + 8) ÷ 4 = 22 ÷ 4 = 5.5 • Bottom-Left Region [9, 10, 13, 14]: Mean = (9 + 10 + 13 + 14) ÷ 4 = 46 ÷ 4 = 11.5 • Bottom-Right Region [11, 12, 15, 16]: Mean = (11 + 12 + 15 + 16) ÷ 4 = 54 ÷ 4 = 13.5
The resulting 2×2 output matrix is [[3.5, 5.5], [11.5, 13.5]].
input_matrix = [[1, 2], [3, 4]]
pool_size = 2[[2.5]]The entire 2×2 input matrix forms a single pooling region.
• Single Region [1, 2, 3, 4]: Mean = (1 + 2 + 3 + 4) ÷ 4 = 10 ÷ 4 = 2.5
The output is a 1×1 matrix containing the single value [[2.5]]. This represents the extreme case where the pool size equals the input dimensions, reducing the entire matrix to a single scalar value.
input_matrix = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]]
pool_size = 3[[8.0, 11.0], [26.0, 29.0]]The 6×6 input matrix is divided into four non-overlapping 3×3 regions:
• Top-Left 3×3 Region: Contains values 1,2,3,7,8,9,13,14,15. Mean = (1+2+3+7+8+9+13+14+15) ÷ 9 = 72 ÷ 9 = 8.0 • Top-Right 3×3 Region: Contains values 4,5,6,10,11,12,16,17,18. Mean = (4+5+6+10+11+12+16+17+18) ÷ 9 = 99 ÷ 9 = 11.0 • Bottom-Left 3×3 Region: Contains values 19,20,21,25,26,27,31,32,33. Mean = 234 ÷ 9 = 26.0 • Bottom-Right 3×3 Region: Contains values 22,23,24,28,29,30,34,35,36. Mean = 261 ÷ 9 = 29.0
The resulting 2×2 output matrix is [[8.0, 11.0], [26.0, 29.0]].
Constraints