0/318

00:00:00

Description

Editorial

Sequential Memory Unit: Building an RNN Cell

MEDIUM20 pts

Recurrent Neural Networks (RNNs) are a powerful class of neural architectures specifically designed to process sequential data by maintaining a notion of "memory" across time steps. Unlike feedforward networks that treat each input independently, RNNs maintain a hidden state that carries information from previous time steps to future ones, enabling them to model temporal dependencies and patterns.

At the heart of every RNN lies the RNN cell—a fundamental computational unit that processes one element of the sequence at a time while updating its internal memory. Understanding this building block is essential for grasping more advanced architectures like LSTMs and GRUs.

The RNN Cell Mechanics

Given an input sequence X = [x₁, x₂, ..., xₜ] where each xᵢ is a vector of dimension d_input, the RNN cell maintains a hidden state h of dimension d_hidden. At each time step t, the cell performs the following computation:

$$h_t = \tanh(W_x \cdot x_t + W_h \cdot h_{t-1} + b)$$

Where:

h_t is the hidden state at time step t (the cell's "memory")
h_{t-1} is the hidden state from the previous time step
x_t is the input vector at time step t
W_x is the weight matrix connecting inputs to hidden state (shape: d_hidden × d_input)
W_h is the recurrent weight matrix connecting previous hidden state to current (shape: d_hidden × d_hidden)
b is the bias vector (shape: d_hidden)
tanh is the hyperbolic tangent activation function, which squashes values to the range (-1, 1)

The tanh activation is crucial as it:

Introduces non-linearity, enabling the network to learn complex patterns
Bounds the hidden state values, preventing explosive growth
Allows for both positive and negative activations, centered around zero

Your Task

Implement a function that simulates a simple RNN cell processing an entire input sequence. Given the input sequence, initial hidden state, weight matrices, and bias vector, compute and return the final hidden state after processing all inputs in the sequence. Round the final output values to 4 decimal places.

Step-by-Step Example

For a simple scalar example with:

Input sequence: [[1.0], [2.0], [3.0]]
Initial hidden state: [0.0]
Wx = [[0.5]], Wh = [[0.8]], b = [0.0]

Step 1 (t=1): x₁ = [1.0]

h₁ = tanh(0.5 × 1.0 + 0.8 × 0.0 + 0.0) = tanh(0.5) ≈ 0.4621

Step 2 (t=2): x₂ = [2.0]

h₂ = tanh(0.5 × 2.0 + 0.8 × 0.4621 + 0.0) = tanh(1.3697) ≈ 0.8785

Step 3 (t=3): x₃ = [3.0]

h₃ = tanh(0.5 × 3.0 + 0.8 × 0.8785 + 0.0) = tanh(2.2028) ≈ 0.9759

The final hidden state is [0.9759].

Example

Input

input_sequence = [[1.0], [2.0], [3.0]]
initial_hidden_state = [0.0]
Wx = [[0.5]]
Wh = [[0.8]]
b = [0.0]

Output

[0.9759]

Explanation

Processing a 3-step sequence with scalar inputs and hidden state:

• Step 1: h₁ = tanh(0.5×1.0 + 0.8×0.0 + 0.0) = tanh(0.5) ≈ 0.4621 • Step 2: h₂ = tanh(0.5×2.0 + 0.8×0.4621 + 0.0) = tanh(1.3697) ≈ 0.8785 • Step 3: h₃ = tanh(0.5×3.0 + 0.8×0.8785 + 0.0) = tanh(2.2028) ≈ 0.9759

The final hidden state after processing all inputs is [0.9759].

Example

Input

input_sequence = [[1.0, 0.5]]
initial_hidden_state = [0.0, 0.0]
Wx = [[0.5, 0.5], [0.3, 0.3]]
Wh = [[0.1, 0.0], [0.0, 0.1]]
b = [0.0, 0.0]

Output

[0.6351, 0.4219]

Explanation

Processing a single-step sequence with 2D input and hidden state:

• Input x₁ = [1.0, 0.5], initial h₀ = [0.0, 0.0] • Wx @ x₁ = [0.5×1.0 + 0.5×0.5, 0.3×1.0 + 0.3×0.5] = [0.75, 0.45] • Wh @ h₀ = [0.0, 0.0] • h₁ = tanh([0.75 + 0.0 + 0.0, 0.45 + 0.0 + 0.0]) = [tanh(0.75), tanh(0.45)] ≈ [0.6351, 0.4219]

The final hidden state is [0.6351, 0.4219].

Example

Input

input_sequence = [[1.0], [0.5], [-0.5]]
initial_hidden_state = [0.0, 0.0]
Wx = [[0.5], [0.3]]
Wh = [[0.2, 0.1], [0.1, 0.2]]
b = [0.1, -0.1]

Output

[-0.047, -0.1753]

Explanation

Processing a 3-step sequence with 1D inputs mapped to 2D hidden state:

• Step 1: x₁ = [1.0]

pre_activation = [0.5×1.0 + 0.1, 0.3×1.0 - 0.1] = [0.6, 0.2]
h₁ = tanh([0.6, 0.2]) ≈ [0.5370, 0.1974]

• Step 2: x₂ = [0.5]

Wh @ h₁ ≈ [0.2×0.537 + 0.1×0.197, 0.1×0.537 + 0.2×0.197] ≈ [0.1271, 0.0932]
pre_activation ≈ [0.25 + 0.1271 + 0.1, 0.15 + 0.0932 - 0.1] ≈ [0.4771, 0.1432]
h₂ ≈ [0.4428, 0.1423]

• Step 3: x₃ = [-0.5]

The negative input combined with the recurrent state produces h₃ ≈ [-0.047, -0.1753]

The final hidden state is [-0.047, -0.1753].

Accepted1/1·100% Acceptance

Constraints

1 ≤ sequence_length ≤ 100 (number of time steps)
1 ≤ input_dim ≤ 50 (dimension of each input vector)
1 ≤ hidden_dim ≤ 50 (dimension of the hidden state)
-10.0 ≤ input_sequence[t][i] ≤ 10.0 (input values)
-10.0 ≤ initial_hidden_state[i] ≤ 10.0
-5.0 ≤ Wx[i][j], Wh[i][j], b[i] ≤ 5.0 (weight and bias values)
Wx has shape (hidden_dim, input_dim)
Wh has shape (hidden_dim, hidden_dim)
b has shape (hidden_dim,)
Output values should be rounded to 4 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

b =

[0]

Wh =

[[0.8]]

Wx =

[[0.5]]

input_sequence =

[[1],[2],[3]]

initial_hidden_state =

[0]

The RNN Cell Mechanics

$$h_t = \tanh(W_x \cdot x_t + W_h \cdot h_{t-1} + b)$$

Where:

h_t is the hidden state at time step t (the cell's "memory")

h_{t-1} is the hidden state from the previous time step

x_t is the input vector at time step t

W_x is the weight matrix connecting inputs to hidden state (shape: d_hidden × d_input)

W_h is the recurrent weight matrix connecting previous hidden state to current (shape: d_hidden × d_hidden)

b is the bias vector (shape: d_hidden)

tanh is the hyperbolic tangent activation function, which squashes values to the range (-1, 1)

The tanh activation is crucial as it:

Introduces non-linearity, enabling the network to learn complex patterns

Bounds the hidden state values, preventing explosive growth

Allows for both positive and negative activations, centered around zero

Your Task

Step-by-Step Example

For a simple scalar example with:

Input sequence: [[1.0], [2.0], [3.0]]

Initial hidden state: [0.0]

Wx = [[0.5]], Wh = [[0.8]], b = [0.0]

Step 1 (t=1): x₁ = [1.0]

h₁ = tanh(0.5 × 1.0 + 0.8 × 0.0 + 0.0) = tanh(0.5) ≈ 0.4621

Step 2 (t=2): x₂ = [2.0]

h₂ = tanh(0.5 × 2.0 + 0.8 × 0.4621 + 0.0) = tanh(1.3697) ≈ 0.8785

Step 3 (t=3): x₃ = [3.0]

h₃ = tanh(0.5 × 3.0 + 0.8 × 0.8785 + 0.0) = tanh(2.2028) ≈ 0.9759

The final hidden state is [0.9759].

Sequential Memory Unit: Building an RNN Cell

The RNN Cell Mechanics

Your Task

Step-by-Step Example

Hints

Sequential Memory Unit: Building an RNN Cell

The RNN Cell Mechanics

Your Task

Step-by-Step Example

Hints