0/318

00:00:00

Description

Editorial

Surprise-Driven Neural Memory Update with Momentum

MEDIUM20 pts

Memory-augmented neural networks have emerged as a powerful paradigm for enabling models to learn and adapt continuously. A critical component of such architectures is the memory update mechanism, which determines how new information is integrated into the memory matrix while preserving relevant prior knowledge.

In this problem, you will implement a surprise-driven memory update rule inspired by modern neural memory architectures. This mechanism elegantly combines three fundamental concepts:

Surprise Metric (Prediction Error Gradient): The memory system attempts to predict value v from key k using the current memory state M. The "surprise" is quantified as the gradient of the associative memory loss function ||M @ k - v||² with respect to M. Mathematically, this gradient equals:

$$\text{surprise} = (M \cdot k - v) \otimes k^T$$

where ⊗ denotes the outer product. This measures how much the memory's prediction deviates from the expected value.
Momentum Accumulation: Rather than applying corrections impulsively, the system maintains a momentum matrix S that accumulates past surprises. The momentum update follows:

$$S_{new} = \eta \cdot S - \theta \cdot \text{surprise}$$

where η (eta) is the momentum decay factor and θ (theta) is the learning rate.
Forgetting Mechanism: To prevent memory saturation and allow the system to adapt to changing environments, a forget gate controlled by parameter α (alpha) applies weight decay:

$$M_{new} = (1 - \alpha) \cdot M + S_{new}$$

Your Task:

Implement the function update_neural_memory that computes the updated memory state M_new and momentum state S_new given:

Current memory matrix M of shape (d, d)
Current momentum matrix S of shape (d, d)
Key vector k of length d
Value vector v of length d
Learning rate θ (theta)
Momentum decay factor η (eta)
Forget gate coefficient α (alpha)

Return the results as a formatted string with values rounded to 2 decimal places.

Example

Input

M = [[1.0, 0.0], [0.0, 1.0]]
S = [[0.0, 0.0], [0.0, 0.0]]
k = [1.0, 0.0]
v = [2.0, 0.0]
theta = 0.1
eta = 0.9
alpha = 0.01

Output

M_new = [[1.09, 0.0], [0.0, 0.99]], S_new = [[0.1, 0.0], [0.0, 0.0]]

Explanation

Step 1: Compute the Prediction The memory attempts to predict v from k: prediction = M @ k = [[1, 0], [0, 1]] @ [1, 0] = [1, 0]

Step 2: Calculate Prediction Error error = prediction - v = [1, 0] - [2, 0] = [-1, 0]

Step 3: Compute the Surprise (Gradient) The surprise is the outer product of error and key: surprise = outer([-1, 0], [1, 0]) = [[-1, 0], [0, 0]]

Step 4: Update Momentum S_new = η × S - θ × surprise S_new = 0.9 × [[0, 0], [0, 0]] - 0.1 × [[-1, 0], [0, 0]] S_new = [[0, 0], [0, 0]] + [[0.1, 0], [0, 0]] = [[0.1, 0], [0, 0]]

Step 5: Update Memory with Forget Gate M_new = (1 - α) × M + S_new M_new = 0.99 × [[1, 0], [0, 1]] + [[0.1, 0], [0, 0]] M_new = [[0.99, 0], [0, 0.99]] + [[0.1, 0], [0, 0]] = [[1.09, 0], [0, 0.99]]

Example

Input

M = [[1.0, 1.0], [1.0, 1.0]]
S = [[0.1, 0.1], [0.1, 0.1]]
k = [1.0, 1.0]
v = [1.0, 1.0]
theta = 0.1
eta = 0.9
alpha = 0.1

Output

M_new = [[0.89, 0.89], [0.89, 0.89]], S_new = [[-0.01, -0.01], [-0.01, -0.01]]

Explanation

Step 1: Compute the Prediction prediction = M @ k = [[1, 1], [1, 1]] @ [1, 1] = [2, 2]

Step 2: Calculate Prediction Error error = prediction - v = [2, 2] - [1, 1] = [1, 1]

Step 3: Compute the Surprise surprise = outer([1, 1], [1, 1]) = [[1, 1], [1, 1]]

Step 4: Update Momentum S_new = 0.9 × [[0.1, 0.1], [0.1, 0.1]] - 0.1 × [[1, 1], [1, 1]] S_new = [[0.09, 0.09], [0.09, 0.09]] - [[0.1, 0.1], [0.1, 0.1]] S_new = [[-0.01, -0.01], [-0.01, -0.01]]

Step 5: Update Memory with Forget Gate M_new = (1 - 0.1) × [[1, 1], [1, 1]] + [[-0.01, -0.01], [-0.01, -0.01]] M_new = [[0.9, 0.9], [0.9, 0.9]] + [[-0.01, -0.01], [-0.01, -0.01]] M_new = [[0.89, 0.89], [0.89, 0.89]]

Example

Input

M = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
S = [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
k = [1.0, 0.0, 0.0]
v = [0.5, 0.0, 0.0]
theta = 0.2
eta = 0.8
alpha = 0.05

Output

M_new = [[0.85, 0.0, 0.0], [0.0, 0.95, 0.0], [0.0, 0.0, 0.95]], S_new = [[-0.1, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]

Explanation

Step 1: Compute the Prediction With a 3×3 identity memory matrix: prediction = M @ k = [1, 0, 0]

Step 2: Calculate Prediction Error error = [1, 0, 0] - [0.5, 0, 0] = [0.5, 0, 0] The memory over-predicted: it expected a higher value than the target.

Step 3: Compute the Surprise surprise = outer([0.5, 0, 0], [1, 0, 0]) = [[0.5, 0, 0], [0, 0, 0], [0, 0, 0]]

Step 4: Update Momentum S_new = 0.8 × [[0, 0, 0], [0, 0, 0], [0, 0, 0]] - 0.2 × [[0.5, 0, 0], [0, 0, 0], [0, 0, 0]] S_new = [[-0.1, 0, 0], [0, 0, 0], [0, 0, 0]]

Step 5: Update Memory M_new = 0.95 × identity_3x3 + S_new M_new = [[0.95, 0, 0], [0, 0.95, 0], [0, 0, 0.95]] + [[-0.1, 0, 0], [0, 0, 0], [0, 0, 0]] M_new = [[0.85, 0, 0], [0, 0.95, 0], [0, 0, 0.95]]

The memory correctly reduced its prediction capability for the first dimension.

Accepted0/0·0% Acceptance

Constraints

1 ≤ d ≤ 50 (dimension of the memory matrix)
M and S are square matrices of shape (d, d)
k and v are vectors of length d
-10³ ≤ M[i][j], S[i][j] ≤ 10³
-10³ ≤ k[i], v[i] ≤ 10³
0 < θ ≤ 1 (learning rate)
0 ≤ η < 1 (momentum decay factor)
0 ≤ α < 1 (forget gate coefficient)
All matrix dimensions are compatible for the specified operations
Output values should be rounded to 2 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

M =

[[1,0],[0,1]]

S =

[[0,0],[0,0]]

k =

[1,0]

v =

[2,0]

eta =

0.9

alpha =

0.01

theta =

0.1

Surprise-Driven Neural Memory Update with Momentum

Hints

Surprise-Driven Neural Memory Update with Momentum

Hints