Loading problem...
The Long Short-Term Memory (LSTM) network is a revolutionary architecture in deep learning that addresses one of the most fundamental challenges in processing sequential data: the vanishing gradient problem. Unlike vanilla recurrent neural networks (RNNs) that struggle to maintain relevant information across long sequences, LSTMs employ a sophisticated system of gating mechanisms to selectively remember, forget, and output information.
At the heart of an LSTM lies the cell state—a highway of information that flows through time, modified only by carefully regulated gates. This design allows LSTMs to maintain memories spanning hundreds or even thousands of time steps, making them extraordinarily powerful for tasks like language modeling, speech recognition, and time series forecasting.
An LSTM cell consists of four primary components:
Controls what information to discard from the cell state: $$f_t = \sigma(W_f \cdot [x_t, h_{t-1}] + b_f)$$
The forget gate examines the current input (x_t) and the previous hidden state (h_{t-1}), producing values between 0 and 1 for each element of the cell state. A value of 0 means "completely forget this," while 1 means "keep this entirely."
Determines what new information to store in the cell state: $$i_t = \sigma(W_i \cdot [x_t, h_{t-1}] + b_i)$$
Creates a vector of candidate values that could be added to the cell state: $$\tilde{C}t = \tanh(W_c \cdot [x_t, h{t-1}] + b_c)$$
Controls what parts of the cell state are revealed as the hidden state: $$o_t = \sigma(W_o \cdot [x_t, h_{t-1}] + b_o)$$
The cell state is updated by first forgetting the selected information, then adding the scaled candidate values: $$C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$$
The hidden state is computed by passing the cell state through tanh and filtering it with the output gate: $$h_t = o_t \odot \tanh(C_t)$$
Where (\odot) denotes element-wise multiplication and (\sigma) is the sigmoid activation function.
Implement an LSTMCell class with the following methods:
__init__(self, input_size, hidden_size, weights=None): Initialize the LSTM cell. If weights are provided, use them; otherwise, initialize weights appropriately.
forward(self, input_sequence, initial_hidden_state, initial_cell_state): Process an input sequence through the LSTM cell and return the hidden states at each time step along with the final hidden and cell states.
Note: The concatenation ([x_t, h_{t-1}]) should be performed along the feature dimension, and the weight matrices (W_f), (W_i), (W_c), (W_o) each have shape ((hidden_size, input_size + hidden_size)).
input_sequence = [[1.0], [2.0], [3.0]]
initial_hidden_state = [[0.0]]
initial_cell_state = [[0.0]]
input_size = 1, hidden_size = 1
weights = {
W_f: [[0.4967, -0.1383]], W_i: [[0.6477, 1.523]],
W_c: [[-0.2342, -0.2341]], W_o: [[1.5792, 0.7674]],
b_f: [[0.0]], b_i: [[0.0]], b_c: [[0.0]], b_o: [[0.0]]
}{
"outputs": [[[-0.1242]], [[-0.38]], [[-0.6464]]],
"final_hidden_state": [[-0.6464]],
"final_cell_state": [[-0.7822]]
}This demonstrates a minimal LSTM with a single hidden unit processing a 3-step sequence.
Step 1 (x₁ = 1.0, h₀ = 0.0, C₀ = 0.0):
The LSTM continues processing steps 2 and 3, producing the final hidden state [-0.6464] and cell state [-0.7822].
input_sequence = [[0.5, 0.5], [1.0, -0.5], [0.0, 0.5]]
initial_hidden_state = [[0.0], [0.0]]
initial_cell_state = [[0.0], [0.0]]
input_size = 2, hidden_size = 2
weights = { ... provided weight matrices ... }{
"outputs": [[[0.2054], [-0.0105]], [[0.3034], [0.0464]], [[0.066], [0.1383]]],
"final_hidden_state": [[0.066], [0.1383]],
"final_cell_state": [[0.1092], [0.2798]]
}This example demonstrates a 2-dimensional LSTM processing 2-dimensional inputs over 3 time steps. The weight matrices are (2 × 4) since they must accommodate the concatenated input and hidden state [x_t, h_{t-1}] which has dimension (input_size + hidden_size) = 4. Each gate produces a 2-dimensional output, and the final states capture the LSTM's learned representation after seeing the entire sequence.
input_sequence = [[1.0, 2.0, 3.0]]
initial_hidden_state = [[0.0], [0.0]]
initial_cell_state = [[0.0], [0.0]]
input_size = 3, hidden_size = 2
weights = { ... provided weight matrices ... }{
"outputs": [[[-0.0064], [-0.0]]],
"final_hidden_state": [[-0.0064], [-0.0]],
"final_cell_state": [[-0.0067], [-0.1193]]
}This example shows an LSTM with a single time step, where the input has 3 features and the hidden state has 2 dimensions. With only one time step, the output equals the final hidden state. The weight matrices are (2 × 5) to accommodate the concatenated [x_t, h_{t-1}] of dimension 5. This configuration might be used for encoding fixed-size feature vectors through a recurrent cell.
Constraints