Loading content...
Memory-augmented neural networks have emerged as a powerful paradigm for enabling models to learn and adapt continuously. A critical component of such architectures is the memory update mechanism, which determines how new information is integrated into the memory matrix while preserving relevant prior knowledge.
In this problem, you will implement a surprise-driven memory update rule inspired by modern neural memory architectures. This mechanism elegantly combines three fundamental concepts:
Surprise Metric (Prediction Error Gradient): The memory system attempts to predict value v from key k using the current memory state M. The "surprise" is quantified as the gradient of the associative memory loss function ||M @ k - v||² with respect to M. Mathematically, this gradient equals:
$$\text{surprise} = (M \cdot k - v) \otimes k^T$$
where ⊗ denotes the outer product. This measures how much the memory's prediction deviates from the expected value.
Momentum Accumulation: Rather than applying corrections impulsively, the system maintains a momentum matrix S that accumulates past surprises. The momentum update follows:
$$S_{new} = \eta \cdot S - \theta \cdot \text{surprise}$$
where η (eta) is the momentum decay factor and θ (theta) is the learning rate.
Forgetting Mechanism: To prevent memory saturation and allow the system to adapt to changing environments, a forget gate controlled by parameter α (alpha) applies weight decay:
$$M_{new} = (1 - \alpha) \cdot M + S_{new}$$
Your Task:
Implement the function update_neural_memory that computes the updated memory state M_new and momentum state S_new given:
M of shape (d, d)S of shape (d, d)k of length dv of length dθ (theta)η (eta)α (alpha)Return the results as a formatted string with values rounded to 2 decimal places.
M = [[1.0, 0.0], [0.0, 1.0]]
S = [[0.0, 0.0], [0.0, 0.0]]
k = [1.0, 0.0]
v = [2.0, 0.0]
theta = 0.1
eta = 0.9
alpha = 0.01M_new = [[1.09, 0.0], [0.0, 0.99]], S_new = [[0.1, 0.0], [0.0, 0.0]]Step 1: Compute the Prediction The memory attempts to predict v from k: prediction = M @ k = [[1, 0], [0, 1]] @ [1, 0] = [1, 0]
Step 2: Calculate Prediction Error error = prediction - v = [1, 0] - [2, 0] = [-1, 0]
Step 3: Compute the Surprise (Gradient) The surprise is the outer product of error and key: surprise = outer([-1, 0], [1, 0]) = [[-1, 0], [0, 0]]
Step 4: Update Momentum S_new = η × S - θ × surprise S_new = 0.9 × [[0, 0], [0, 0]] - 0.1 × [[-1, 0], [0, 0]] S_new = [[0, 0], [0, 0]] + [[0.1, 0], [0, 0]] = [[0.1, 0], [0, 0]]
Step 5: Update Memory with Forget Gate M_new = (1 - α) × M + S_new M_new = 0.99 × [[1, 0], [0, 1]] + [[0.1, 0], [0, 0]] M_new = [[0.99, 0], [0, 0.99]] + [[0.1, 0], [0, 0]] = [[1.09, 0], [0, 0.99]]
M = [[1.0, 1.0], [1.0, 1.0]]
S = [[0.1, 0.1], [0.1, 0.1]]
k = [1.0, 1.0]
v = [1.0, 1.0]
theta = 0.1
eta = 0.9
alpha = 0.1M_new = [[0.89, 0.89], [0.89, 0.89]], S_new = [[-0.01, -0.01], [-0.01, -0.01]]Step 1: Compute the Prediction prediction = M @ k = [[1, 1], [1, 1]] @ [1, 1] = [2, 2]
Step 2: Calculate Prediction Error error = prediction - v = [2, 2] - [1, 1] = [1, 1]
Step 3: Compute the Surprise surprise = outer([1, 1], [1, 1]) = [[1, 1], [1, 1]]
Step 4: Update Momentum S_new = 0.9 × [[0.1, 0.1], [0.1, 0.1]] - 0.1 × [[1, 1], [1, 1]] S_new = [[0.09, 0.09], [0.09, 0.09]] - [[0.1, 0.1], [0.1, 0.1]] S_new = [[-0.01, -0.01], [-0.01, -0.01]]
Step 5: Update Memory with Forget Gate M_new = (1 - 0.1) × [[1, 1], [1, 1]] + [[-0.01, -0.01], [-0.01, -0.01]] M_new = [[0.9, 0.9], [0.9, 0.9]] + [[-0.01, -0.01], [-0.01, -0.01]] M_new = [[0.89, 0.89], [0.89, 0.89]]
M = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
S = [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
k = [1.0, 0.0, 0.0]
v = [0.5, 0.0, 0.0]
theta = 0.2
eta = 0.8
alpha = 0.05M_new = [[0.85, 0.0, 0.0], [0.0, 0.95, 0.0], [0.0, 0.0, 0.95]], S_new = [[-0.1, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]Step 1: Compute the Prediction With a 3×3 identity memory matrix: prediction = M @ k = [1, 0, 0]
Step 2: Calculate Prediction Error error = [1, 0, 0] - [0.5, 0, 0] = [0.5, 0, 0] The memory over-predicted: it expected a higher value than the target.
Step 3: Compute the Surprise surprise = outer([0.5, 0, 0], [1, 0, 0]) = [[0.5, 0, 0], [0, 0, 0], [0, 0, 0]]
Step 4: Update Momentum S_new = 0.8 × [[0, 0, 0], [0, 0, 0], [0, 0, 0]] - 0.2 × [[0.5, 0, 0], [0, 0, 0], [0, 0, 0]] S_new = [[-0.1, 0, 0], [0, 0, 0], [0, 0, 0]]
Step 5: Update Memory M_new = 0.95 × identity_3x3 + S_new M_new = [[0.95, 0, 0], [0, 0.95, 0], [0, 0, 0.95]] + [[-0.1, 0, 0], [0, 0, 0], [0, 0, 0]] M_new = [[0.85, 0, 0], [0, 0.95, 0], [0, 0, 0.95]]
The memory correctly reduced its prediction capability for the first dimension.
Constraints