00:00:00

Description

Editorial

Convolutional Network Backpropagation Trainer

HARD100 pts

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision and pattern recognition. At the heart of training these networks lies backpropagation—an elegant algorithm that computes gradients by propagating error signals backward through the network's layers. This enables the model to learn hierarchical feature representations directly from raw pixel data.

In this challenge, you will implement a complete training pipeline for a simplified CNN architecture that includes:

Network Architecture

Convolutional Layer: Applies learnable filters to detect local patterns in the input
- Filters of size kernel_size × kernel_size
- Number of filters controlled by num_filters
- Uses "valid" convolution (no padding)
ReLU Activation: Introduces non-linearity by applying the rectified linear unit function $$\text{ReLU}(x) = \max(0, x)$$
Flattening: Reshapes the 2D feature maps into a 1D vector for the dense layer
Dense Layer: Fully-connected layer that maps features to class scores $$z = W_{dense} \cdot flatten(activation) + b_{dense}$$
Softmax Output: Converts raw scores to probability distribution $$\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}$$
Cross-Entropy Loss: Measures the discrepancy between predictions and true labels $$L = -\sum_{i} y_i \log(\hat{y}_i)$$

Training Process

Your implementation must handle the complete training cycle:

Forward Pass:

Convolve input images with learned filters
Apply ReLU activation
Flatten spatial dimensions
Compute dense layer output
Apply softmax to get predictions
Calculate cross-entropy loss

Backward Pass (Backpropagation):

Compute gradients of loss with respect to softmax output
Propagate gradients through the dense layer
Propagate gradients through the flattening operation
Handle ReLU derivative (gradient is zero where input was negative)
Compute gradients for convolutional weights via correlation

Weight Update (Stochastic Gradient Descent): $$W \leftarrow W - \eta \cdot \frac{\partial L}{\partial W}$$

where (\eta) is the learning rate.

Technical Requirements

Initialize all weights using NumPy's random number generator with seed 42
Weights should be initialized from a standard normal distribution scaled by 0.01
Biases should be initialized to zeros
Process samples one at a time (stochastic gradient descent)
Repeat the entire dataset for the specified number of epochs

Your Task

Implement the train_cnn_with_gradient_descent function that trains the described network and returns the final trained parameters after completing all epochs.

Example

Input

X = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]
y = [[1.0, 0.0]]
epochs = 1
learning_rate = 0.01
kernel_size = 3
num_filters = 1

Output

(
  conv_weights: [[[ 0.00501739], [-0.00128214], [ 0.00662764]],
                 [[ 0.01543131], [-0.00209028], [-0.00203986]],
                 [[ 0.01614389], [ 0.00807636], [-0.00424248]]],
  conv_bias: [5.025e-05],
  dense_weights: [[ 0.00635715, -0.00556573]],
  dense_bias: [ 0.00499531, -0.00499531]
)

Explanation

With a 3×3 input image and a 3×3 kernel using valid convolution, the output feature map is 1×1.

Forward Pass:

The convolution produces a single scalar value (dot product of kernel with entire image)
ReLU activation is applied (passes positive values, zeros negatives)
The flattened output feeds into a dense layer with 2 output classes
Softmax converts to probabilities

Backward Pass:

The softmax-cross-entropy gradient flows back
Dense layer gradients update the weights connecting features to classes
The gradient propagates through ReLU (only where activations were positive)
Convolutional weight gradients are computed via correlation with input

After one epoch with learning rate 0.01, the weights show small adjustments moving toward the correct classification.

Example

Input

X = [[[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 1.0, 0.5, 0.3], [0.2, 0.4, 0.6, 0.8]]]
y = [[0.0, 1.0]]
epochs = 1
learning_rate = 0.1
kernel_size = 2
num_filters = 1

Output

(
  conv_weights: [[[ 0.00471968], [-0.00130821]],
                 [[ 0.00532375], [ 0.01442281]]],
  conv_bias: [-0.00083181],
  dense_weights: [[-0.00297151, -0.0017114],
                  [ 0.01503567,  0.00843081],
                  [-0.00557769,  0.00630854],
                  [-0.00577009, -0.00352138],
                  [ 0.00161423, -0.01832741],
                  [-0.01775821, -0.00511385],
                  [-0.01065219,  0.00366635],
                  [-0.00988065, -0.01332263],
                  [ 0.01374934, -0.00135061]],
  dense_bias: [-0.05001055, 0.05001055]
)

Explanation

With a 4×4 input and 2×2 kernel (valid convolution), the output feature map is 3×3, resulting in 9 values per filter.

The target label [0.0, 1.0] indicates the second class. The larger learning rate (0.1) produces more significant weight updates. Notice how dense_bias moves in opposite directions (negative for class 0, positive for class 1), pushing the network toward correct classification.

Example

Input

X = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
     [[9.0, 8.0, 7.0], [6.0, 5.0, 4.0], [3.0, 2.0, 1.0]]]
y = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
epochs = 2
learning_rate = 0.01
kernel_size = 2
num_filters = 2

Output

(
  conv_weights: [[[ 0.00854715, -0.00365224], [ 0.00977395,  0.01290805]],
                 [[ 0.00038962, -0.00476892], [ 0.01824033,  0.00519415]]],
  conv_bias: [0.00042807, -0.00034737],
  dense_weights: [[-0.00471649,  0.00734749, -0.00653432],
                  [-0.00411338,  0.0037669,  -0.02102399],
                  [-0.01557395, -0.00542483, -0.01200158],
                  [ 0.00538336, -0.00945681, -0.01598735],
                  [ 0.01469215, -0.00133749, -0.00028065],
                  [-0.01390007, -0.00481346,  0.00013145],
                  [-0.01053903,  0.00380754, -0.00702785],
                  [-0.00163429, -0.0062564,   0.01747947]],
  dense_bias: [0.00659985, 0.00663734, -0.01323718]
)

Explanation

This example demonstrates batch training with 2 samples, 3 output classes, and 2 convolutional filters over 2 epochs.

Architecture Details:

2×2 kernel on 3×3 images produces 2×2 feature maps
2 filters create 2 channels, yielding 2×2×2 = 8 flattened features
Dense layer maps 8 features to 3 classes

Training Dynamics:

Sample 1 belongs to class 0; Sample 2 belongs to class 1
After 2 complete epochs (4 gradient updates per weight), the network learns distinct filter patterns
The two filters (columns in conv_weights) develop different specializations

Accepted0/0·0% Acceptance

Constraints

1 ≤ num_samples ≤ 100 (number of training samples)
3 ≤ image_height, image_width ≤ 28 (spatial dimensions of input images)
2 ≤ kernel_size ≤ min(image_height, image_width)
1 ≤ num_filters ≤ 16 (number of convolutional filters)
2 ≤ num_classes ≤ 10 (number of output classes)
1 ≤ epochs ≤ 100 (number of training iterations)
0.0001 ≤ learning_rate ≤ 1.0 (gradient descent step size)
All input pixel values are normalized to range [0, 10]
Labels are provided as valid one-hot encoded vectors
Use np.random.seed(42) for weight initialization reproducibility

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[[1,2,3],[4,5,6],[7,8,9]]]

y =

[[1,0]]

epochs =

kernel_size =

num_filters =

learning_rate =

0.01

Loading problem...

101

00:00:00

Description

Editorial

Convolutional Network Backpropagation Trainer

HARD100 pts

In this challenge, you will implement a complete training pipeline for a simplified CNN architecture that includes:

Network Architecture

Convolutional Layer: Applies learnable filters to detect local patterns in the input
- Filters of size kernel_size × kernel_size
- Number of filters controlled by num_filters
- Uses "valid" convolution (no padding)
ReLU Activation: Introduces non-linearity by applying the rectified linear unit function $$\text{ReLU}(x) = \max(0, x)$$
Flattening: Reshapes the 2D feature maps into a 1D vector for the dense layer
Dense Layer: Fully-connected layer that maps features to class scores $$z = W_{dense} \cdot flatten(activation) + b_{dense}$$
Softmax Output: Converts raw scores to probability distribution $$\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}$$
Cross-Entropy Loss: Measures the discrepancy between predictions and true labels $$L = -\sum_{i} y_i \log(\hat{y}_i)$$

Training Process

Your implementation must handle the complete training cycle:

Forward Pass:

Convolve input images with learned filters
Apply ReLU activation
Flatten spatial dimensions
Compute dense layer output
Apply softmax to get predictions
Calculate cross-entropy loss

Backward Pass (Backpropagation):

Compute gradients of loss with respect to softmax output
Propagate gradients through the dense layer
Propagate gradients through the flattening operation
Handle ReLU derivative (gradient is zero where input was negative)
Compute gradients for convolutional weights via correlation

Weight Update (Stochastic Gradient Descent): $$W \leftarrow W - \eta \cdot \frac{\partial L}{\partial W}$$

where (\eta) is the learning rate.

Technical Requirements

Initialize all weights using NumPy's random number generator with seed 42
Weights should be initialized from a standard normal distribution scaled by 0.01
Biases should be initialized to zeros
Process samples one at a time (stochastic gradient descent)
Repeat the entire dataset for the specified number of epochs

Your Task

Implement the train_cnn_with_gradient_descent function that trains the described network and returns the final trained parameters after completing all epochs.

Example

Input

X = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]
y = [[1.0, 0.0]]
epochs = 1
learning_rate = 0.01
kernel_size = 3
num_filters = 1

Output

(
  conv_weights: [[[ 0.00501739], [-0.00128214], [ 0.00662764]],
                 [[ 0.01543131], [-0.00209028], [-0.00203986]],
                 [[ 0.01614389], [ 0.00807636], [-0.00424248]]],
  conv_bias: [5.025e-05],
  dense_weights: [[ 0.00635715, -0.00556573]],
  dense_bias: [ 0.00499531, -0.00499531]
)

Explanation

With a 3×3 input image and a 3×3 kernel using valid convolution, the output feature map is 1×1.

Forward Pass:

The convolution produces a single scalar value (dot product of kernel with entire image)
ReLU activation is applied (passes positive values, zeros negatives)
The flattened output feeds into a dense layer with 2 output classes
Softmax converts to probabilities

Backward Pass:

The softmax-cross-entropy gradient flows back
Dense layer gradients update the weights connecting features to classes
The gradient propagates through ReLU (only where activations were positive)
Convolutional weight gradients are computed via correlation with input

After one epoch with learning rate 0.01, the weights show small adjustments moving toward the correct classification.

Example

Input

X = [[[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 1.0, 0.5, 0.3], [0.2, 0.4, 0.6, 0.8]]]
y = [[0.0, 1.0]]
epochs = 1
learning_rate = 0.1
kernel_size = 2
num_filters = 1

Output

(
  conv_weights: [[[ 0.00471968], [-0.00130821]],
                 [[ 0.00532375], [ 0.01442281]]],
  conv_bias: [-0.00083181],
  dense_weights: [[-0.00297151, -0.0017114],
                  [ 0.01503567,  0.00843081],
                  [-0.00557769,  0.00630854],
                  [-0.00577009, -0.00352138],
                  [ 0.00161423, -0.01832741],
                  [-0.01775821, -0.00511385],
                  [-0.01065219,  0.00366635],
                  [-0.00988065, -0.01332263],
                  [ 0.01374934, -0.00135061]],
  dense_bias: [-0.05001055, 0.05001055]
)

Explanation

With a 4×4 input and 2×2 kernel (valid convolution), the output feature map is 3×3, resulting in 9 values per filter.

Example

Input

X = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
     [[9.0, 8.0, 7.0], [6.0, 5.0, 4.0], [3.0, 2.0, 1.0]]]
y = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
epochs = 2
learning_rate = 0.01
kernel_size = 2
num_filters = 2

Output

(
  conv_weights: [[[ 0.00854715, -0.00365224], [ 0.00977395,  0.01290805]],
                 [[ 0.00038962, -0.00476892], [ 0.01824033,  0.00519415]]],
  conv_bias: [0.00042807, -0.00034737],
  dense_weights: [[-0.00471649,  0.00734749, -0.00653432],
                  [-0.00411338,  0.0037669,  -0.02102399],
                  [-0.01557395, -0.00542483, -0.01200158],
                  [ 0.00538336, -0.00945681, -0.01598735],
                  [ 0.01469215, -0.00133749, -0.00028065],
                  [-0.01390007, -0.00481346,  0.00013145],
                  [-0.01053903,  0.00380754, -0.00702785],
                  [-0.00163429, -0.0062564,   0.01747947]],
  dense_bias: [0.00659985, 0.00663734, -0.01323718]
)

Explanation

This example demonstrates batch training with 2 samples, 3 output classes, and 2 convolutional filters over 2 epochs.

Architecture Details:

2×2 kernel on 3×3 images produces 2×2 feature maps
2 filters create 2 channels, yielding 2×2×2 = 8 flattened features
Dense layer maps 8 features to 3 classes

Training Dynamics:

Sample 1 belongs to class 0; Sample 2 belongs to class 1
After 2 complete epochs (4 gradient updates per weight), the network learns distinct filter patterns
The two filters (columns in conv_weights) develop different specializations

Accepted0/0·0% Acceptance

Constraints

1 ≤ num_samples ≤ 100 (number of training samples)
3 ≤ image_height, image_width ≤ 28 (spatial dimensions of input images)
2 ≤ kernel_size ≤ min(image_height, image_width)
1 ≤ num_filters ≤ 16 (number of convolutional filters)
2 ≤ num_classes ≤ 10 (number of output classes)
1 ≤ epochs ≤ 100 (number of training iterations)
0.0001 ≤ learning_rate ≤ 1.0 (gradient descent step size)
All input pixel values are normalized to range [0, 10]
Labels are provided as valid one-hot encoded vectors
Use np.random.seed(42) for weight initialization reproducibility

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[[1,2,3],[4,5,6],[7,8,9]]]

y =

[[1,0]]

epochs =

kernel_size =

num_filters =

learning_rate =

0.01

Convolutional Network Backpropagation Trainer

Network Architecture

Training Process

Technical Requirements

Your Task

Hints

Convolutional Network Backpropagation Trainer

Network Architecture

Training Process

Technical Requirements

Your Task

Hints