Loading content...
In deep learning, the fully connected layer (also known as a dense layer or linear layer) is one of the most fundamental building blocks of neural networks. This layer establishes connections between every neuron in the previous layer to every neuron in the current layer, enabling the network to learn complex, non-linear representations of data.
A fully connected layer performs a linear transformation of its inputs followed by an optional non-linear activation. Mathematically, for an input vector x of dimension m, a fully connected layer with n neurons computes:
$$y = x \cdot W + b$$
Where:
Weight Initialization: Proper initialization is crucial for training stability. A common approach uses a uniform distribution bounded by:
$$\text{limit} = \frac{1}{\sqrt{m}}$$
where m is the number of input features. This prevents vanishing or exploding gradients during early training.
Gradient Computation: During backpropagation, the layer must compute gradients with respect to:
X^T @ accumulated_gradientaccumulated_gradient @ W^T for propagation to previous layersYour Task:
Implement the FullyConnectedLayer class by extending the provided Layer base class. Your implementation should include:
__init__(n_units, input_shape): Initialize the layer with the specified number of neurons and optional input shapeinitialize(optimizer): Set up weights using uniform initialization with limit 1/√input_size, zero-initialize biases, and create optimizer copies for bothparameters(): Return the total count of trainable parameters (weights + biases)forward_pass(X, training): Compute and return X @ W + b, storing the input for backpropagationbackward_pass(accum_grad): Calculate gradients, update weights/biases if trainable, and return input gradientsoutput_shape(): Return the tuple (n_units,)n_units = 3, input_shape = (2,), X = [[1.0, 2.0]], accum_grad = [[0.1, 0.2, 0.3]], learning_rate = 0.01{"parameters": 9, "output_shape": [3], "forward_output": [[0.10162127, -0.33551992, -0.64490545]], "backward_output": [[0.20816524, -0.22928937]]}Setup: A fully connected layer with 3 neurons receiving 2-dimensional input.
Parameter Count:
Forward Pass Computation: The output is computed as y = X @ W + b, where:
Backward Pass: Given the accumulated gradient [[0.1, 0.2, 0.3]]:
The backward pass returns gradients for the previous layer to continue backpropagation.
n_units = 4, input_shape = (3,), X = [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], accum_grad = [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]], learning_rate = 0.01{"parameters": 16, "output_shape": [4], "forward_output": [[-0.58898684, 0.44677798, -2.41342268, 2.58738407], [-1.86490632, 1.53721275, -4.80135939, 5.82543394]], "backward_output": [[0.21553461, -0.10311902, 0.11069572], [0.51848501, -0.45586945, 0.24911288]]}Setup: A fully connected layer with 4 neurons processing a batch of 2 samples, each with 3 features.
Parameter Count:
Batch Processing: The forward pass handles both input samples simultaneously:
Gradient Flow: The backward pass processes gradients for the entire batch:
n_units = 1, input_shape = (5,), X = [[1.0, 2.0, 3.0, 4.0, 5.0]], accum_grad = [[1.0]], learning_rate = 0.01{"parameters": 6, "output_shape": [1], "forward_output": [[0.13119252]], "backward_output": [[-0.11221473, 0.40313113, 0.20750169, 0.08824283, -0.30766628]]}Setup: A single-neuron layer (similar to linear regression output) with 5-dimensional input.
Parameter Count:
Use Case: Single-neuron output layers are common in:
The backward pass propagates the gradient through all 5 input dimensions, enabling the previous layer to continue learning.
Constraints