00:00:00

Description

Editorial

Volumetric Convolution Forward Pass

HARD30 pts

In deep learning, volumetric convolution (also known as 3D convolution) extends the concept of traditional 2D convolution to three-dimensional data. This operation is fundamental for processing spatiotemporal data like video sequences, medical imaging scans (CT, MRI), and point cloud representations.

A volumetric convolution works by sliding a 3D kernel (a small cubic or rectangular prism of learnable weights) across an input 3D volume along all three spatial dimensions: depth, height, and width. At each position, the kernel computes a weighted sum (dot product) of the overlapping elements, producing a single output value.

Mathematical Formulation: For an input volume V with dimensions (D × H × W) and a kernel K with dimensions (Kd × Kh × Kw), the output at position (d, h, w) is computed as:

$$O[d, h, w] = \sum_{i=0}^{K_d-1} \sum_{j=0}^{K_h-1} \sum_{k=0}^{K_w-1} V[d \cdot s_d + i, h \cdot s_h + j, w \cdot s_w + k] \cdot K[i, j, k]$$

where (sₐ, sₕ, sᵥ) represents the stride in each dimension.

Output Dimensions: Given padding (pₐ, pₕ, pᵥ) and stride (sₐ, sₕ, sᵥ), the output dimensions are calculated as:

Output Depth: ⌊(D + 2·pₐ - Kd) / sₐ⌋ + 1
Output Height: ⌊(H + 2·pₕ - Kh) / sₕ⌋ + 1
Output Width: ⌊(W + 2·pᵥ - Kw) / sᵥ⌋ + 1

Your Task: Implement a function that computes the forward pass of a volumetric convolution. Given an input 3D volume, a 3D kernel, stride values for each dimension, and padding values for each dimension, compute and return the resulting output feature map.

Key Concepts:

Stride: Controls how far the kernel moves between positions. A stride of 2 means the kernel skips every other position, effectively downsampling the output.
Padding: Zero values added around the input volume's borders to control the output size and preserve spatial dimensions at the boundaries.
Feature Extraction: The kernel learns to detect specific 3D patterns like motion trajectories, volumetric textures, or anatomical structures.

Example

Input

input_volume = [
  [[1.0, 2.0], [3.0, 4.0]],
  [[5.0, 6.0], [7.0, 8.0]]
]
kernel = [
  [[1.0, 0.0], [0.0, 0.0]],
  [[0.0, 0.0], [0.0, 0.0]]
]
stride = [1, 1, 1]
padding = [0, 0, 0]

Output

[[[1.0]]]

Explanation

The input is a 2×2×2 volume (depth=2, height=2, width=2), and the kernel is also 2×2×2.

With no padding and stride of 1 in all dimensions, the kernel can only be placed at exactly one position (the origin).

The kernel has a 1.0 at position [0,0,0] and 0.0 everywhere else. This effectively extracts the value at the top-left-front corner of the input.

Computing the dot product: • (1.0 × 1.0) + (0.0 × 2.0) + (0.0 × 3.0) + (0.0 × 4.0) + (0.0 × 5.0) + (0.0 × 6.0) + (0.0 × 7.0) + (0.0 × 8.0) • = 1.0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 = 1.0

Output: A 1×1×1 volume containing [[[ 1.0 ]]]

Example

Input

input_volume = [
  [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
  [[10.0, 11.0, 12.0], [13.0, 14.0, 15.0], [16.0, 17.0, 18.0]],
  [[19.0, 20.0, 21.0], [22.0, 23.0, 24.0], [25.0, 26.0, 27.0]]
]
kernel = [
  [[1.0, 0.0], [0.0, 1.0]],
  [[0.0, 1.0], [1.0, 0.0]]
]
stride = [1, 1, 1]
padding = [0, 0, 0]

Output

[[[30.0, 34.0], [42.0, 46.0]], [[66.0, 70.0], [78.0, 82.0]]]

Explanation

The input is a 3×3×3 volume, and the kernel is 2×2×2. The diagonal kernel pattern captures anti-diagonal spatial relationships.

Output dimensions: (3-2)/1 + 1 = 2 in each dimension → 2×2×2 output.

For output position [0,0,0]: • Input patch: V[0:2, 0:2, 0:2] = [[[1,2],[4,5]], [[10,11],[13,14]]] • Kernel weights at (0,0,0)=1, (0,1,1)=1, (1,0,1)=1, (1,1,0)=1 • Dot product: 1×1 + 5×1 + 11×1 + 13×1 = 1 + 5 + 11 + 13 = 30.0

Similar calculations for all positions yield the 2×2×2 output feature map.

Example

Input

input_volume = [
  [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0], [13.0, 14.0, 15.0, 16.0]],
  [[17.0, 18.0, 19.0, 20.0], [21.0, 22.0, 23.0, 24.0], [25.0, 26.0, 27.0, 28.0], [29.0, 30.0, 31.0, 32.0]],
  [[33.0, 34.0, 35.0, 36.0], [37.0, 38.0, 39.0, 40.0], [41.0, 42.0, 43.0, 44.0], [45.0, 46.0, 47.0, 48.0]],
  [[49.0, 50.0, 51.0, 52.0], [53.0, 54.0, 55.0, 56.0], [57.0, 58.0, 59.0, 60.0], [61.0, 62.0, 63.0, 64.0]]
]
kernel = [
  [[1.0, 1.0], [1.0, 1.0]],
  [[1.0, 1.0], [1.0, 1.0]]
]
stride = [2, 2, 2]
padding = [0, 0, 0]

Output

[[[92.0, 108.0], [156.0, 172.0]], [[348.0, 364.0], [412.0, 428.0]]]

Explanation

The input is a 4×4×4 volume with sequential values 1-64. The kernel is a 2×2×2 "all-ones" averaging kernel (sums all values in the receptive field).

With stride [2,2,2], the kernel jumps 2 positions in each dimension, performing spatial downsampling.

Output dimensions: (4-2)/2 + 1 = 2 in each dimension → 2×2×2 output.

For output position [0,0,0]: • Input patch: V[0:2, 0:2, 0:2] containing values 1,2,5,6,17,18,21,22 • Sum: 1+2+5+6+17+18+21+22 = 92.0

For output position [1,1,1] (last position): • Input patch: V[2:4, 2:4, 2:4] containing values 43,44,47,48,59,60,63,64 • Sum: 43+44+47+48+59+60+63+64 = 428.0

This demonstrates how stride-2 convolution halves the spatial dimensions while capturing local sums.

Accepted0/0·0% Acceptance

Constraints

1 ≤ D, H, W ≤ 64 (input volume dimensions)
1 ≤ Kd, Kh, Kw ≤ min(16, corresponding input dimension + 2×padding)
1 ≤ stride[i] ≤ 8 for each dimension
0 ≤ padding[i] ≤ 4 for each dimension
-10⁴ ≤ input_volume[d][h][w] ≤ 10⁴
-10⁴ ≤ kernel[d][h][w] ≤ 10⁴
Kernel dimensions must not exceed (input dimension + 2×padding) in any axis
All input arrays will be well-formed with consistent dimensions

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

kernel =

[[[1,0],[0,0]],[[0,0],[0,0]]]

stride =

[1,1,1]

padding =

[0,0,0]

input_volume =

[[[1,2],[3,4]],[[5,6],[7,8]]]

Loading problem...

101

00:00:00

Description

Editorial

Volumetric Convolution Forward Pass

HARD30 pts

Mathematical Formulation: For an input volume V with dimensions (D × H × W) and a kernel K with dimensions (Kd × Kh × Kw), the output at position (d, h, w) is computed as:

$$O[d, h, w] = \sum_{i=0}^{K_d-1} \sum_{j=0}^{K_h-1} \sum_{k=0}^{K_w-1} V[d \cdot s_d + i, h \cdot s_h + j, w \cdot s_w + k] \cdot K[i, j, k]$$

where (sₐ, sₕ, sᵥ) represents the stride in each dimension.

Output Dimensions: Given padding (pₐ, pₕ, pᵥ) and stride (sₐ, sₕ, sᵥ), the output dimensions are calculated as:

Output Depth: ⌊(D + 2·pₐ - Kd) / sₐ⌋ + 1
Output Height: ⌊(H + 2·pₕ - Kh) / sₕ⌋ + 1
Output Width: ⌊(W + 2·pᵥ - Kw) / sᵥ⌋ + 1

Key Concepts:

Stride: Controls how far the kernel moves between positions. A stride of 2 means the kernel skips every other position, effectively downsampling the output.
Padding: Zero values added around the input volume's borders to control the output size and preserve spatial dimensions at the boundaries.
Feature Extraction: The kernel learns to detect specific 3D patterns like motion trajectories, volumetric textures, or anatomical structures.

Example

Input

input_volume = [
  [[1.0, 2.0], [3.0, 4.0]],
  [[5.0, 6.0], [7.0, 8.0]]
]
kernel = [
  [[1.0, 0.0], [0.0, 0.0]],
  [[0.0, 0.0], [0.0, 0.0]]
]
stride = [1, 1, 1]
padding = [0, 0, 0]

Output

[[[1.0]]]

Explanation

The input is a 2×2×2 volume (depth=2, height=2, width=2), and the kernel is also 2×2×2.

With no padding and stride of 1 in all dimensions, the kernel can only be placed at exactly one position (the origin).

The kernel has a 1.0 at position [0,0,0] and 0.0 everywhere else. This effectively extracts the value at the top-left-front corner of the input.

Computing the dot product: • (1.0 × 1.0) + (0.0 × 2.0) + (0.0 × 3.0) + (0.0 × 4.0) + (0.0 × 5.0) + (0.0 × 6.0) + (0.0 × 7.0) + (0.0 × 8.0) • = 1.0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 = 1.0

Output: A 1×1×1 volume containing [[[ 1.0 ]]]

Example

Input

input_volume = [
  [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
  [[10.0, 11.0, 12.0], [13.0, 14.0, 15.0], [16.0, 17.0, 18.0]],
  [[19.0, 20.0, 21.0], [22.0, 23.0, 24.0], [25.0, 26.0, 27.0]]
]
kernel = [
  [[1.0, 0.0], [0.0, 1.0]],
  [[0.0, 1.0], [1.0, 0.0]]
]
stride = [1, 1, 1]
padding = [0, 0, 0]

Output

[[[30.0, 34.0], [42.0, 46.0]], [[66.0, 70.0], [78.0, 82.0]]]

Explanation

The input is a 3×3×3 volume, and the kernel is 2×2×2. The diagonal kernel pattern captures anti-diagonal spatial relationships.

Output dimensions: (3-2)/1 + 1 = 2 in each dimension → 2×2×2 output.

Similar calculations for all positions yield the 2×2×2 output feature map.

Example

Input

input_volume = [
  [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0], [13.0, 14.0, 15.0, 16.0]],
  [[17.0, 18.0, 19.0, 20.0], [21.0, 22.0, 23.0, 24.0], [25.0, 26.0, 27.0, 28.0], [29.0, 30.0, 31.0, 32.0]],
  [[33.0, 34.0, 35.0, 36.0], [37.0, 38.0, 39.0, 40.0], [41.0, 42.0, 43.0, 44.0], [45.0, 46.0, 47.0, 48.0]],
  [[49.0, 50.0, 51.0, 52.0], [53.0, 54.0, 55.0, 56.0], [57.0, 58.0, 59.0, 60.0], [61.0, 62.0, 63.0, 64.0]]
]
kernel = [
  [[1.0, 1.0], [1.0, 1.0]],
  [[1.0, 1.0], [1.0, 1.0]]
]
stride = [2, 2, 2]
padding = [0, 0, 0]

Output

[[[92.0, 108.0], [156.0, 172.0]], [[348.0, 364.0], [412.0, 428.0]]]

Explanation

The input is a 4×4×4 volume with sequential values 1-64. The kernel is a 2×2×2 "all-ones" averaging kernel (sums all values in the receptive field).

With stride [2,2,2], the kernel jumps 2 positions in each dimension, performing spatial downsampling.

Output dimensions: (4-2)/2 + 1 = 2 in each dimension → 2×2×2 output.

For output position [0,0,0]: • Input patch: V[0:2, 0:2, 0:2] containing values 1,2,5,6,17,18,21,22 • Sum: 1+2+5+6+17+18+21+22 = 92.0

For output position [1,1,1] (last position): • Input patch: V[2:4, 2:4, 2:4] containing values 43,44,47,48,59,60,63,64 • Sum: 43+44+47+48+59+60+63+64 = 428.0

This demonstrates how stride-2 convolution halves the spatial dimensions while capturing local sums.

Accepted0/0·0% Acceptance

Constraints

1 ≤ D, H, W ≤ 64 (input volume dimensions)
1 ≤ Kd, Kh, Kw ≤ min(16, corresponding input dimension + 2×padding)
1 ≤ stride[i] ≤ 8 for each dimension
0 ≤ padding[i] ≤ 4 for each dimension
-10⁴ ≤ input_volume[d][h][w] ≤ 10⁴
-10⁴ ≤ kernel[d][h][w] ≤ 10⁴
Kernel dimensions must not exceed (input dimension + 2×padding) in any axis
All input arrays will be well-formed with consistent dimensions

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

kernel =

[[[1,0],[0,0]],[[0,0],[0,0]]]

stride =

[1,1,1]

padding =

[0,0,0]

input_volume =

[[[1,2],[3,4]],[[5,6],[7,8]]]

Volumetric Convolution Forward Pass

Hints

Volumetric Convolution Forward Pass

Hints