LSTM Memory Network: Gated Recurrent Memory Cell (Medium) — Practice with Code Visualizer

The Long Short-Term Memory (LSTM) network is a revolutionary architecture in deep learning that addresses one of the most fundamental challenges in processing sequential data: the vanishing gradient problem. Unlike vanilla recurrent neural networks (RNNs) that struggle to maintain relevant information across long sequences, LSTMs employ a sophisticated system of gating mechanisms to selectively remember, forget, and output information.

At the heart of an LSTM lies the cell state—a highway of information that flows through time, modified only by carefully regulated gates. This design allows LSTMs to maintain memories spanning hundreds or even thousands of time steps, making them extraordinarily powerful for tasks like language modeling, speech recognition, and time series forecasting.

The LSTM Architecture

An LSTM cell consists of four primary components:

1. Forget Gate (f_t)

Controls what information to discard from the cell state: $$f_t = \sigma(W_f \cdot [x_t, h_{t-1}] + b_f)$$

The forget gate examines the current input (x_t) and the previous hidden state (h_{t-1}), producing values between 0 and 1 for each element of the cell state. A value of 0 means "completely forget this," while 1 means "keep this entirely."

2. Input Gate (i_t)

Determines what new information to store in the cell state: $$i_t = \sigma(W_i \cdot [x_t, h_{t-1}] + b_i)$$

3. Candidate Cell State (\tilde{C}_t)

Creates a vector of candidate values that could be added to the cell state: $$\tilde{C}t = \tanh(W_c \cdot [x_t, h{t-1}] + b_c)$$

4. Output Gate (o_t)

Controls what parts of the cell state are revealed as the hidden state: $$o_t = \sigma(W_o \cdot [x_t, h_{t-1}] + b_o)$$

Cell State and Hidden State Updates

The cell state is updated by first forgetting the selected information, then adding the scaled candidate values: $$C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$$

The hidden state is computed by passing the cell state through tanh and filtering it with the output gate: $$h_t = o_t \odot \tanh(C_t)$$

Where (\odot) denotes element-wise multiplication and (\sigma) is the sigmoid activation function.

Your Task

Implement an LSTMCell class with the following methods:

__init__(self, input_size, hidden_size, weights=None): Initialize the LSTM cell. If weights are provided, use them; otherwise, initialize weights appropriately.
forward(self, input_sequence, initial_hidden_state, initial_cell_state): Process an input sequence through the LSTM cell and return the hidden states at each time step along with the final hidden and cell states.

Note: The concatenation ([x_t, h_{t-1}]) should be performed along the feature dimension, and the weight matrices (W_f), (W_i), (W_c), (W_o) each have shape ((hidden_size, input_size + hidden_size)).

This demonstrates a minimal LSTM with a single hidden unit processing a 3-step sequence.

Step 1 (x₁ = 1.0, h₀ = 0.0, C₀ = 0.0):

Concatenate: [x₁, h₀] = [1.0, 0.0]
Forget gate: f₁ = σ(0.4967 × 1.0 + (-0.1383) × 0.0) = σ(0.4967) ≈ 0.622
Input gate: i₁ = σ(0.6477 × 1.0 + 1.523 × 0.0) = σ(0.6477) ≈ 0.656
Candidate: C̃₁ = tanh(-0.2342 × 1.0 + (-0.2341) × 0.0) = tanh(-0.2342) ≈ -0.230
Cell state: C₁ = 0.622 × 0.0 + 0.656 × (-0.230) ≈ -0.151
Output gate: o₁ = σ(1.5792 × 1.0 + 0.7674 × 0.0) = σ(1.5792) ≈ 0.829
Hidden state: h₁ = 0.829 × tanh(-0.151) ≈ -0.1242

The LSTM continues processing steps 2 and 3, producing the final hidden state [-0.6464] and cell state [-0.7822].

This example demonstrates a 2-dimensional LSTM processing 2-dimensional inputs over 3 time steps. The weight matrices are (2 × 4) since they must accommodate the concatenated input and hidden state [x_t, h_{t-1}] which has dimension (input_size + hidden_size) = 4. Each gate produces a 2-dimensional output, and the final states capture the LSTM's learned representation after seeing the entire sequence.

This example shows an LSTM with a single time step, where the input has 3 features and the hidden state has 2 dimensions. With only one time step, the output equals the final hidden state. The weight matrices are (2 × 5) to accommodate the concatenated [x_t, h_{t-1}] of dimension 5. This configuration might be used for encoding fixed-size feature vectors through a recurrent cell.

The LSTM Architecture

An LSTM cell consists of four primary components:

1. Forget Gate (f_t)

Controls what information to discard from the cell state: $$f_t = \sigma(W_f \cdot [x_t, h_{t-1}] + b_f)$$

2. Input Gate (i_t)

Determines what new information to store in the cell state: $$i_t = \sigma(W_i \cdot [x_t, h_{t-1}] + b_i)$$

3. Candidate Cell State (\tilde{C}_t)

Creates a vector of candidate values that could be added to the cell state: $$\tilde{C}t = \tanh(W_c \cdot [x_t, h{t-1}] + b_c)$$

4. Output Gate (o_t)

Controls what parts of the cell state are revealed as the hidden state: $$o_t = \sigma(W_o \cdot [x_t, h_{t-1}] + b_o)$$

Cell State and Hidden State Updates

The cell state is updated by first forgetting the selected information, then adding the scaled candidate values: $$C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$$

The hidden state is computed by passing the cell state through tanh and filtering it with the output gate: $$h_t = o_t \odot \tanh(C_t)$$

Where (\odot) denotes element-wise multiplication and (\sigma) is the sigmoid activation function.

Your Task

Implement an LSTMCell class with the following methods:

__init__(self, input_size, hidden_size, weights=None): Initialize the LSTM cell. If weights are provided, use them; otherwise, initialize weights appropriately.
forward(self, input_sequence, initial_hidden_state, initial_cell_state): Process an input sequence through the LSTM cell and return the hidden states at each time step along with the final hidden and cell states.

This demonstrates a minimal LSTM with a single hidden unit processing a 3-step sequence.

Step 1 (x₁ = 1.0, h₀ = 0.0, C₀ = 0.0):

Concatenate: [x₁, h₀] = [1.0, 0.0]
Forget gate: f₁ = σ(0.4967 × 1.0 + (-0.1383) × 0.0) = σ(0.4967) ≈ 0.622
Input gate: i₁ = σ(0.6477 × 1.0 + 1.523 × 0.0) = σ(0.6477) ≈ 0.656
Candidate: C̃₁ = tanh(-0.2342 × 1.0 + (-0.2341) × 0.0) = tanh(-0.2342) ≈ -0.230
Cell state: C₁ = 0.622 × 0.0 + 0.656 × (-0.230) ≈ -0.151
Output gate: o₁ = σ(1.5792 × 1.0 + 0.7674 × 0.0) = σ(1.5792) ≈ 0.829
Hidden state: h₁ = 0.829 × tanh(-0.151) ≈ -0.1242

The LSTM continues processing steps 2 and 3, producing the final hidden state [-0.6464] and cell state [-0.7822].

LSTM Memory Network: Gated Recurrent Memory Cell

The LSTM Architecture

1. Forget Gate (f_t)

2. Input Gate (i_t)

3. Candidate Cell State (\tilde{C}_t)

4. Output Gate (o_t)

Cell State and Hidden State Updates

Your Task

Hints

LSTM Memory Network: Gated Recurrent Memory Cell

The LSTM Architecture

1. Forget Gate (f_t)

2. Input Gate (i_t)

3. Candidate Cell State (\tilde{C}_t)

4. Output Gate (o_t)

Cell State and Hidden State Updates

Your Task

Hints