0/318

00:00:00

Description

Editorial

Recurrent Network with Temporal Gradient Learning

HARD30 pts

In the realm of sequence modeling, Recurrent Neural Networks (RNNs) represent a foundational architecture designed to process sequential data by maintaining an internal memory of previous inputs. Unlike feedforward networks that treat each input independently, RNNs create a temporal dependency structure that allows information to persist across time steps, making them exceptionally well-suited for tasks involving temporal or sequential patterns.

Core Concept: Temporal State Evolution

At the heart of an RNN lies the hidden state—a dynamically evolving representation that captures the contextual history of the sequence processed thus far. At each time step t, the network computes a new hidden state by combining the current input with the previous hidden state through learned weight transformations:

$$h_t = \tanh(W_{xh} \cdot x_t + W_{hh} \cdot h_{t-1} + b_h)$$

Where:

x_t is the input vector at time step t
h_{t-1} is the hidden state from the previous time step
W_{xh} is the input-to-hidden weight matrix
W_{hh} is the hidden-to-hidden (recurrent) weight matrix
b_h is the hidden layer bias
tanh is the hyperbolic tangent activation function

The output at each time step is then computed from the current hidden state:

$$y_t = W_{hy} \cdot h_t + b_y$$

Temporal Gradient Propagation (Backpropagation Through Time)

Training an RNN requires a specialized gradient computation technique called Backpropagation Through Time (BPTT). Since the network's computations unfold across multiple time steps, gradients must flow backward through the entire sequence to capture temporal dependencies.

The key insight of BPTT is that at each time step, the hidden state depends on all previous hidden states. Therefore, when computing gradients, we must:

Unroll the network across all time steps
Compute output error at each step using the loss function
Propagate gradients backward through both the output layer and the recurrent connections
Accumulate gradients across all time steps for each weight matrix

For this problem, use half Mean Squared Error (MSE) as the loss function:

$$L = \frac{1}{2} \sum_{t=1}^{T} (y_t - \hat{y}_t)^2$$

The total loss is the sum of losses at each individual time step.

Your Task

Implement a complete recurrent neural network class with the following specifications:

Class: `SequentialRecurrentNetwork`

Constructor: __init__(self, input_size, hidden_size, output_size)

Initialize weight matrices using Xavier/Glorot initialization: sample from a normal distribution with mean 0 and standard deviation sqrt(2 / (fan_in + fan_out))
Initialize all bias vectors to zeros
Initialize the hidden state to a zero vector

Forward Pass: forward(self, x)

Process the input sequence step by step
At each time step, update the hidden state and compute the output
Return the outputs for all time steps

Backward Pass: backward(self, x, y, learning_rate)

Compute gradients using temporal gradient propagation (BPTT)
Update all weight matrices and biases using gradient descent
Handle gradient flow through the recurrent connections

Important Implementation Notes:

Use np.random.seed(42) at the start of your weight initialization for reproducibility
Use the tanh activation function for the hidden layer
No activation function is applied to the output layer (linear output)
The initial hidden state before processing the sequence should be zeros

Example

Input

input_sequence = [[1.0], [2.0], [3.0], [4.0]]
expected_output = [[2.0], [3.0], [4.0], [5.0]]
input_size = 1
hidden_size = 5
output_size = 1
learning_rate = 0.01

Output

[[-0.0002], [-0.0005], [-0.0007], [-0.001]]

Explanation

The network is initialized with a 1-dimensional input, 5 hidden units, and 1-dimensional output. The forward pass processes the sequence [1.0, 2.0, 3.0, 4.0], updating the hidden state at each step. With Xavier-initialized weights (using seed 42), the initial predictions are small values near zero. The backward pass then computes gradients to learn the pattern of predicting the next value in the sequence. The small output values reflect the network's initial state before significant training.

Example

Input

input_sequence = [[0.5, 0.5], [1.0, 1.0], [1.5, 1.5]]
expected_output = [[1.0, 1.0], [1.5, 1.5], [2.0, 2.0]]
input_size = 2
hidden_size = 4
output_size = 2
learning_rate = 0.01

Output

[[0.0001, 0.0002], [0.0001, 0.0004], [0.0002, 0.0006]]

Explanation

This example uses a 2-dimensional input and output with 4 hidden units. The network processes a sequence of 2D vectors, learning to predict the next vector in the pattern. The multi-dimensional setup demonstrates how the recurrent architecture handles higher-dimensional sequential data while maintaining the temporal gradient flow across all dimensions.

Example

Input

input_sequence = [[0.1], [0.2], [0.3], [0.4], [0.5]]
expected_output = [[0.2], [0.3], [0.4], [0.5], [0.6]]
input_size = 1
hidden_size = 10
output_size = 1
learning_rate = 0.001

Output

[[0.0], [0.0], [0.0001], [0.0001], [0.0001]]

Explanation

A longer sequence with smaller learning rate and more hidden units (10) demonstrates the network's behavior with increased capacity and smaller gradient steps. The smaller inputs and lower learning rate result in predictions very close to zero initially. The 5-step sequence shows how hidden state accumulates information across longer time spans.

Accepted0/0·0% Acceptance

Constraints

1 ≤ input_size ≤ 100 (dimensionality of input vectors)
1 ≤ hidden_size ≤ 256 (number of hidden neurons)
1 ≤ output_size ≤ 100 (dimensionality of output vectors)
1 ≤ sequence_length ≤ 100 (number of time steps)
0.0001 ≤ learning_rate ≤ 1.0
-10⁶ ≤ input values ≤ 10⁶
Use np.random.seed(42) for reproducible weight initialization
Return outputs rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

input_size =

hidden_size =

output_size =

learning_rate =

0.01

input_sequence =

[[0.5,0.5],[1,1],[1.5,1.5]]

expected_output =

[[1,1],[1.5,1.5],[2,2]]

Recurrent Network with Temporal Gradient Learning

Core Concept: Temporal State Evolution

Temporal Gradient Propagation (Backpropagation Through Time)

Your Task

Class: SequentialRecurrentNetwork

Hints

Recurrent Network with Temporal Gradient Learning

Core Concept: Temporal State Evolution

Temporal Gradient Propagation (Backpropagation Through Time)

Your Task

Class: SequentialRecurrentNetwork

Hints

Class: `SequentialRecurrentNetwork`

Class: `SequentialRecurrentNetwork`