Loading problem...
In convolutional neural networks, pooling operations serve as critical downsampling mechanisms that reduce spatial dimensionality while preserving the most salient features. Among pooling variants, strided maximum pooling with overlapping regions offers a unique balance between feature preservation and spatial reduction.
Unlike standard max pooling where adjacent windows are non-overlapping (stride equals kernel size), overlapping pooling uses a stride smaller than the kernel size, causing consecutive pooling windows to share common spatial regions. This overlap creates redundancy in the extracted features, which has been shown to improve generalization and reduce overfitting in deep networks.
Mathematical Formulation:
Given a 4D input tensor X with shape (N, C, H, W) representing:
The overlapping max pooling operation applies a k × k window with stride s (where s < k) across the height and width dimensions. For each window position, the maximum value within that window is selected.
Output Dimension Calculation (Ceiling Mode):
The output spatial dimensions are computed using ceiling mode, which allows partial windows at boundaries:
$$H_{out} = \lceil \frac{H - k}{s} \rceil + 1$$
$$W_{out} = \lceil \frac{W - k}{s} \rceil + 1$$
Key Properties of Overlapping Pooling:
Your Task:
Implement a function that performs 2D maximum pooling with overlapping regions on a 4D input tensor. The function should:
x = np.arange(1, 17).reshape(1, 1, 4, 4)
kernel_size = 3
stride = 2[[[[11.0, 12.0], [15.0, 16.0]]]]The input is a 4×4 feature map with values 1 through 16 arranged sequentially:
[[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]]
With kernel_size=3 and stride=2, pooling windows overlap by 1 element. Using ceiling mode, the output is 2×2:
• Top-left window (rows 0-2, cols 0-2): Values 1,2,3,5,6,7,9,10,11 → max = 11 • Top-right window (rows 0-2, cols 2-3): Values 3,4,7,8,11,12 → max = 12 • Bottom-left window (rows 2-3, cols 0-2): Values 9,10,11,13,14,15 → max = 15 • Bottom-right window (rows 2-3, cols 2-3): Values 11,12,15,16 → max = 16
Result: [[[[11.0, 12.0], [15.0, 16.0]]]]
x = np.arange(1, 10).reshape(1, 1, 3, 3)
kernel_size = 2
stride = 1[[[[5.0, 6.0], [8.0, 9.0]]]]The input is a 3×3 feature map:
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
With kernel_size=2 and stride=1, each 2×2 window moves by just 1 position, creating maximum overlap:
• Position (0,0): Values 1,2,4,5 → max = 5 • Position (0,1): Values 2,3,5,6 → max = 6 • Position (1,0): Values 4,5,7,8 → max = 8 • Position (1,1): Values 5,6,8,9 → max = 9
The output preserves more spatial detail due to the high overlap (stride=1).
x with shape (1, 2, 4, 4) containing two channels with reversed values
kernel_size = 3
stride = 2[[[[11.0, 12.0], [15.0, 16.0]], [[16.0, 14.0], [8.0, 6.0]]]]This example demonstrates multi-channel processing. The input has 2 channels:
Channel 1: Values 1-16 in ascending order Channel 2: Values 16-1 in descending order
Each channel is pooled independently:
Channel 1 Output: • Follows the same pattern as Example 1: [[11.0, 12.0], [15.0, 16.0]]
Channel 2 Output (descending values): • Top-left window max: 16.0 • Top-right window max: 14.0 • Bottom-left window max: 8.0 • Bottom-right window max: 6.0
The operation preserves channel independence while applying identical pooling to each.
Constraints