Loading content...
Recurrent Neural Networks (RNNs) are a powerful class of neural architectures specifically designed to process sequential data by maintaining a notion of "memory" across time steps. Unlike feedforward networks that treat each input independently, RNNs maintain a hidden state that carries information from previous time steps to future ones, enabling them to model temporal dependencies and patterns.
At the heart of every RNN lies the RNN cell—a fundamental computational unit that processes one element of the sequence at a time while updating its internal memory. Understanding this building block is essential for grasping more advanced architectures like LSTMs and GRUs.
Given an input sequence X = [x₁, x₂, ..., xₜ] where each xᵢ is a vector of dimension d_input, the RNN cell maintains a hidden state h of dimension d_hidden. At each time step t, the cell performs the following computation:
$$h_t = \tanh(W_x \cdot x_t + W_h \cdot h_{t-1} + b)$$
Where:
The tanh activation is crucial as it:
Implement a function that simulates a simple RNN cell processing an entire input sequence. Given the input sequence, initial hidden state, weight matrices, and bias vector, compute and return the final hidden state after processing all inputs in the sequence. Round the final output values to 4 decimal places.
For a simple scalar example with:
Step 1 (t=1): x₁ = [1.0]
Step 2 (t=2): x₂ = [2.0]
Step 3 (t=3): x₃ = [3.0]
The final hidden state is [0.9759].
input_sequence = [[1.0], [2.0], [3.0]]
initial_hidden_state = [0.0]
Wx = [[0.5]]
Wh = [[0.8]]
b = [0.0][0.9759]Processing a 3-step sequence with scalar inputs and hidden state:
• Step 1: h₁ = tanh(0.5×1.0 + 0.8×0.0 + 0.0) = tanh(0.5) ≈ 0.4621 • Step 2: h₂ = tanh(0.5×2.0 + 0.8×0.4621 + 0.0) = tanh(1.3697) ≈ 0.8785 • Step 3: h₃ = tanh(0.5×3.0 + 0.8×0.8785 + 0.0) = tanh(2.2028) ≈ 0.9759
The final hidden state after processing all inputs is [0.9759].
input_sequence = [[1.0, 0.5]]
initial_hidden_state = [0.0, 0.0]
Wx = [[0.5, 0.5], [0.3, 0.3]]
Wh = [[0.1, 0.0], [0.0, 0.1]]
b = [0.0, 0.0][0.6351, 0.4219]Processing a single-step sequence with 2D input and hidden state:
• Input x₁ = [1.0, 0.5], initial h₀ = [0.0, 0.0] • Wx @ x₁ = [0.5×1.0 + 0.5×0.5, 0.3×1.0 + 0.3×0.5] = [0.75, 0.45] • Wh @ h₀ = [0.0, 0.0] • h₁ = tanh([0.75 + 0.0 + 0.0, 0.45 + 0.0 + 0.0]) = [tanh(0.75), tanh(0.45)] ≈ [0.6351, 0.4219]
The final hidden state is [0.6351, 0.4219].
input_sequence = [[1.0], [0.5], [-0.5]]
initial_hidden_state = [0.0, 0.0]
Wx = [[0.5], [0.3]]
Wh = [[0.2, 0.1], [0.1, 0.2]]
b = [0.1, -0.1][-0.047, -0.1753]Processing a 3-step sequence with 1D inputs mapped to 2D hidden state:
• Step 1: x₁ = [1.0]
• Step 2: x₂ = [0.5]
• Step 3: x₃ = [-0.5]
The final hidden state is [-0.047, -0.1753].
Constraints