Strided Maximum Pooling with Overlap (Medium) — Practice with Code Visualizer

In convolutional neural networks, pooling operations serve as critical downsampling mechanisms that reduce spatial dimensionality while preserving the most salient features. Among pooling variants, strided maximum pooling with overlapping regions offers a unique balance between feature preservation and spatial reduction.

Unlike standard max pooling where adjacent windows are non-overlapping (stride equals kernel size), overlapping pooling uses a stride smaller than the kernel size, causing consecutive pooling windows to share common spatial regions. This overlap creates redundancy in the extracted features, which has been shown to improve generalization and reduce overfitting in deep networks.

Mathematical Formulation:

Given a 4D input tensor X with shape (N, C, H, W) representing:

N: Batch size (number of samples)
C: Number of channels (feature maps)
H: Height of the spatial dimension
W: Width of the spatial dimension

The overlapping max pooling operation applies a k × k window with stride s (where s < k) across the height and width dimensions. For each window position, the maximum value within that window is selected.

Output Dimension Calculation (Ceiling Mode):

The output spatial dimensions are computed using ceiling mode, which allows partial windows at boundaries:

$$H_{out} = \lceil \frac{H - k}{s} \rceil + 1$$

$$W_{out} = \lceil \frac{W - k}{s} \rceil + 1$$

Key Properties of Overlapping Pooling:

Enhanced Translation Invariance: Overlapping regions capture features across multiple spatial positions
Richer Feature Maps: Adjacent output positions share input regions, preserving more spatial information
Boundary Handling: Ceiling mode ensures all input elements participate in at least one pooling window

Your Task:

Implement a function that performs 2D maximum pooling with overlapping regions on a 4D input tensor. The function should:

Accept a 4D numpy array, kernel size, and stride as inputs
Apply max pooling with the specified kernel and stride
Use ceiling mode for computing output dimensions
Handle partial windows at boundaries correctly
Return the pooled 4D output tensor

The input is a 4×4 feature map with values 1 through 16 arranged sequentially:

[[ 1,  2,  3,  4],
 [ 5,  6,  7,  8],
 [ 9, 10, 11, 12],
 [13, 14, 15, 16]]

With kernel_size=3 and stride=2, pooling windows overlap by 1 element. Using ceiling mode, the output is 2×2:

• Top-left window (rows 0-2, cols 0-2): Values 1,2,3,5,6,7,9,10,11 → max = 11 • Top-right window (rows 0-2, cols 2-3): Values 3,4,7,8,11,12 → max = 12 • Bottom-left window (rows 2-3, cols 0-2): Values 9,10,11,13,14,15 → max = 15 • Bottom-right window (rows 2-3, cols 2-3): Values 11,12,15,16 → max = 16

Result: [[[[11.0, 12.0], [15.0, 16.0]]]]

The input is a 3×3 feature map:

[[1, 2, 3],
 [4, 5, 6],
 [7, 8, 9]]

With kernel_size=2 and stride=1, each 2×2 window moves by just 1 position, creating maximum overlap:

• Position (0,0): Values 1,2,4,5 → max = 5 • Position (0,1): Values 2,3,5,6 → max = 6 • Position (1,0): Values 4,5,7,8 → max = 8 • Position (1,1): Values 5,6,8,9 → max = 9

The output preserves more spatial detail due to the high overlap (stride=1).

This example demonstrates multi-channel processing. The input has 2 channels:

Channel 1: Values 1-16 in ascending order Channel 2: Values 16-1 in descending order

Each channel is pooled independently:

Channel 1 Output: • Follows the same pattern as Example 1: [[11.0, 12.0], [15.0, 16.0]]

Channel 2 Output (descending values): • Top-left window max: 16.0 • Top-right window max: 14.0 • Bottom-left window max: 8.0 • Bottom-right window max: 6.0

The operation preserves channel independence while applying identical pooling to each.