Loading problem...
Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision and pattern recognition. At the heart of training these networks lies backpropagation—an elegant algorithm that computes gradients by propagating error signals backward through the network's layers. This enables the model to learn hierarchical feature representations directly from raw pixel data.
In this challenge, you will implement a complete training pipeline for a simplified CNN architecture that includes:
Convolutional Layer: Applies learnable filters to detect local patterns in the input
kernel_size × kernel_sizenum_filtersReLU Activation: Introduces non-linearity by applying the rectified linear unit function $$\text{ReLU}(x) = \max(0, x)$$
Flattening: Reshapes the 2D feature maps into a 1D vector for the dense layer
Dense Layer: Fully-connected layer that maps features to class scores $$z = W_{dense} \cdot flatten(activation) + b_{dense}$$
Softmax Output: Converts raw scores to probability distribution $$\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}$$
Cross-Entropy Loss: Measures the discrepancy between predictions and true labels $$L = -\sum_{i} y_i \log(\hat{y}_i)$$
Your implementation must handle the complete training cycle:
Forward Pass:
Backward Pass (Backpropagation):
Weight Update (Stochastic Gradient Descent): $$W \leftarrow W - \eta \cdot \frac{\partial L}{\partial W}$$
where (\eta) is the learning rate.
Implement the train_cnn_with_gradient_descent function that trains the described network and returns the final trained parameters after completing all epochs.
X = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]
y = [[1.0, 0.0]]
epochs = 1
learning_rate = 0.01
kernel_size = 3
num_filters = 1(
conv_weights: [[[ 0.00501739], [-0.00128214], [ 0.00662764]],
[[ 0.01543131], [-0.00209028], [-0.00203986]],
[[ 0.01614389], [ 0.00807636], [-0.00424248]]],
conv_bias: [5.025e-05],
dense_weights: [[ 0.00635715, -0.00556573]],
dense_bias: [ 0.00499531, -0.00499531]
)With a 3×3 input image and a 3×3 kernel using valid convolution, the output feature map is 1×1.
Forward Pass:
Backward Pass:
After one epoch with learning rate 0.01, the weights show small adjustments moving toward the correct classification.
X = [[[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 1.0, 0.5, 0.3], [0.2, 0.4, 0.6, 0.8]]]
y = [[0.0, 1.0]]
epochs = 1
learning_rate = 0.1
kernel_size = 2
num_filters = 1(
conv_weights: [[[ 0.00471968], [-0.00130821]],
[[ 0.00532375], [ 0.01442281]]],
conv_bias: [-0.00083181],
dense_weights: [[-0.00297151, -0.0017114],
[ 0.01503567, 0.00843081],
[-0.00557769, 0.00630854],
[-0.00577009, -0.00352138],
[ 0.00161423, -0.01832741],
[-0.01775821, -0.00511385],
[-0.01065219, 0.00366635],
[-0.00988065, -0.01332263],
[ 0.01374934, -0.00135061]],
dense_bias: [-0.05001055, 0.05001055]
)With a 4×4 input and 2×2 kernel (valid convolution), the output feature map is 3×3, resulting in 9 values per filter.
The target label [0.0, 1.0] indicates the second class. The larger learning rate (0.1) produces more significant weight updates. Notice how dense_bias moves in opposite directions (negative for class 0, positive for class 1), pushing the network toward correct classification.
X = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],
[[9.0, 8.0, 7.0], [6.0, 5.0, 4.0], [3.0, 2.0, 1.0]]]
y = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
epochs = 2
learning_rate = 0.01
kernel_size = 2
num_filters = 2(
conv_weights: [[[ 0.00854715, -0.00365224], [ 0.00977395, 0.01290805]],
[[ 0.00038962, -0.00476892], [ 0.01824033, 0.00519415]]],
conv_bias: [0.00042807, -0.00034737],
dense_weights: [[-0.00471649, 0.00734749, -0.00653432],
[-0.00411338, 0.0037669, -0.02102399],
[-0.01557395, -0.00542483, -0.01200158],
[ 0.00538336, -0.00945681, -0.01598735],
[ 0.01469215, -0.00133749, -0.00028065],
[-0.01390007, -0.00481346, 0.00013145],
[-0.01053903, 0.00380754, -0.00702785],
[-0.00163429, -0.0062564, 0.01747947]],
dense_bias: [0.00659985, 0.00663734, -0.01323718]
)This example demonstrates batch training with 2 samples, 3 output classes, and 2 convolutional filters over 2 epochs.
Architecture Details:
Training Dynamics:
Constraints