0/318

00:00:00

Description

Editorial

Building a Fully Connected Neural Network Layer

HARD30 pts

In deep learning, the fully connected layer (also known as a dense layer or linear layer) is one of the most fundamental building blocks of neural networks. This layer establishes connections between every neuron in the previous layer to every neuron in the current layer, enabling the network to learn complex, non-linear representations of data.

A fully connected layer performs a linear transformation of its inputs followed by an optional non-linear activation. Mathematically, for an input vector x of dimension m, a fully connected layer with n neurons computes:

$$y = x \cdot W + b$$

Where:

W is a weight matrix of shape (m × n) containing learnable parameters
b is a bias vector of shape (n,) containing learnable parameters
y is the output vector of shape (n,)

Weight Initialization: Proper initialization is crucial for training stability. A common approach uses a uniform distribution bounded by:

$$\text{limit} = \frac{1}{\sqrt{m}}$$

where m is the number of input features. This prevents vanishing or exploding gradients during early training.

Gradient Computation: During backpropagation, the layer must compute gradients with respect to:

Weights (∂L/∂W): Calculated as X^T @ accumulated_gradient
Biases (∂L/∂b): Sum of accumulated gradients across batch dimension
Input (∂L/∂X): Calculated as accumulated_gradient @ W^T for propagation to previous layers

Your Task: Implement the FullyConnectedLayer class by extending the provided Layer base class. Your implementation should include:

__init__(n_units, input_shape): Initialize the layer with the specified number of neurons and optional input shape
initialize(optimizer): Set up weights using uniform initialization with limit 1/√input_size, zero-initialize biases, and create optimizer copies for both
parameters(): Return the total count of trainable parameters (weights + biases)
forward_pass(X, training): Compute and return X @ W + b, storing the input for backpropagation
backward_pass(accum_grad): Calculate gradients, update weights/biases if trainable, and return input gradients
output_shape(): Return the tuple (n_units,)

Example

Input

n_units = 3, input_shape = (2,), X = [[1.0, 2.0]], accum_grad = [[0.1, 0.2, 0.3]], learning_rate = 0.01

Output

{"parameters": 9, "output_shape": [3], "forward_output": [[0.10162127, -0.33551992, -0.64490545]], "backward_output": [[0.20816524, -0.22928937]]}

Explanation

Setup: A fully connected layer with 3 neurons receiving 2-dimensional input.

Parameter Count:

Weight matrix W: 2 × 3 = 6 parameters
Bias vector b: 3 parameters
Total: 9 trainable parameters

Forward Pass Computation: The output is computed as y = X @ W + b, where:

X is the input matrix [[1.0, 2.0]]
W is initialized with uniform random values in [-1/√2, 1/√2]
b is initialized to zeros

Backward Pass: Given the accumulated gradient [[0.1, 0.2, 0.3]]:

Compute gradient w.r.t. input: accum_grad @ W^T = [[0.208, -0.229]]
Update weights W using optimizer: W -= 0.01 × (X^T @ accum_grad)
Update biases b: b -= 0.01 × sum(accum_grad)

The backward pass returns gradients for the previous layer to continue backpropagation.

Example

Input

n_units = 4, input_shape = (3,), X = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], accum_grad = [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]], learning_rate = 0.01

Output

{"parameters": 16, "output_shape": [4], "forward_output": [[-0.58898684, 0.44677798, -2.41342268, 2.58738407], [-1.86490632, 1.53721275, -4.80135939, 5.82543394]], "backward_output": [[0.21553461, -0.10311902, 0.11069572], [0.51848501, -0.45586945, 0.24911288]]}

Explanation

Setup: A fully connected layer with 4 neurons processing a batch of 2 samples, each with 3 features.

Parameter Count:

Weight matrix W: 3 × 4 = 12 parameters
Bias vector b: 4 parameters
Total: 16 trainable parameters

Batch Processing: The forward pass handles both input samples simultaneously:

Input shape: (2, 3) - batch of 2, each with 3 features
Output shape: (2, 4) - batch of 2, each with 4 neurons

Gradient Flow: The backward pass processes gradients for the entire batch:

Input gradient shape: (2, 3) - one gradient vector per sample
Weight gradient: averaged or summed over batch for stable updates

Example

Input

n_units = 1, input_shape = (5,), X = [[1.0, 2.0, 3.0, 4.0, 5.0]], accum_grad = [[1.0]], learning_rate = 0.01

Output

{"parameters": 6, "output_shape": [1], "forward_output": [[0.13119252]], "backward_output": [[-0.11221473, 0.40313113, 0.20750169, 0.08824283, -0.30766628]]}

Explanation

Setup: A single-neuron layer (similar to linear regression output) with 5-dimensional input.

Parameter Count:

Weight matrix W: 5 × 1 = 5 parameters
Bias vector b: 1 parameter
Total: 6 trainable parameters

Use Case: Single-neuron output layers are common in:

Regression: Predicting a single continuous value
Binary Classification: Output before sigmoid activation
Score Computation: Computing attention scores or similarity metrics

The backward pass propagates the gradient through all 5 input dimensions, enabling the previous layer to continue learning.

Accepted0/0·0% Acceptance

Constraints

1 ≤ n_units ≤ 1024 (number of neurons in the layer)
1 ≤ input_shape[0] ≤ 1024 (input feature dimension)
1 ≤ batch_size ≤ 128 (number of samples processed simultaneously)
0.0001 ≤ learning_rate ≤ 1.0
All input values and gradients are finite floating-point numbers
Weight initialization must use uniform distribution with limit 1/√(input_features)
Biases must be zero-initialized
Random seed is fixed at 42 for reproducibility

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,2]]

n_units =

accum_grad =

[[0.1,0.2,0.3]]

input_shape =

[2]

learning_rate =

0.01

Building a Fully Connected Neural Network Layer

Hints

Building a Fully Connected Neural Network Layer

Hints