0/318

00:00:00

Description

Editorial

Masked Gradient Descent for Localized Knowledge Editing

MEDIUM20 pts

In modern machine learning systems, particularly large language models (LLMs), there's often a need to selectively forget or retain specific knowledge without retraining the entire model. This technique, known as Selective Gradient Masking (SGM), enables fine-grained control over which parts of a neural network are updated during training.

The Concept of Localized Knowledge Editing

Neural network parameters can be partitioned into distinct groups based on their functional role:

Forget Parameters: Parameters that encode knowledge we want to modify or remove (e.g., outdated or harmful information)
Retain Parameters: Parameters that encode knowledge we want to preserve (e.g., factual or safe information)

The key insight is that by applying binary masks to gradients before the parameter update step, we can control which neurons are affected by each training batch.

The SGM Update Rule

Given a parameter vector θ, gradient vector g, a binary forget mask M (where 1 indicates forget parameters), and learning rate η, the update depends on the batch type:

For 'forget' batches (containing data we want the model to unlearn): $$\theta_{new} = \theta - \eta \cdot (M \odot g)$$

For 'retain' batches (containing data we want to preserve): $$\theta_{new} = \theta - \eta \cdot ((1 - M) \odot g)$$

For 'unlabeled' batches (general training data): $$\theta_{new} = \theta - \eta \cdot g$$

Where ⊙ denotes element-wise multiplication (Hadamard product).

Your Task

Implement the function masked_gradient_step that applies a single gradient descent update to a 1D parameter vector using the selective masking approach described above. Based on the batch_type, your function should:

'forget': Update only parameters where forget_mask[i] = 1 (mask the gradients for retain parameters to zero)
'retain': Update only parameters where forget_mask[i] = 0 (mask the gradients for forget parameters to zero)
'unlabeled': Update all parameters normally without any masking

Return the updated parameter vector after applying the masked gradient step.

Example

Input

params = [1.0, 1.0, 1.0, 1.0]
grad = [0.1, 0.2, 0.3, 0.4]
forget_mask = [1, 1, 0, 0]
lr = 0.1
batch_type = 'forget'

Output

[0.99, 0.98, 1.0, 1.0]

Explanation

The forget_mask indicates that the first two parameters are in the 'forget' group (mask value = 1) and the last two are in the 'retain' group (mask value = 0).

For a 'forget' batch, only forget parameters should be updated. The effective gradients become: • Masked grad = [0.1 × 1, 0.2 × 1, 0.3 × 0, 0.4 × 0] = [0.1, 0.2, 0.0, 0.0]

Applying the update θ_new = θ - lr × masked_grad: • θ₀ = 1.0 - 0.1 × 0.1 = 0.99 • θ₁ = 1.0 - 0.1 × 0.2 = 0.98 • θ₂ = 1.0 - 0.1 × 0.0 = 1.0 (unchanged) • θ₃ = 1.0 - 0.1 × 0.0 = 1.0 (unchanged)

Result: [0.99, 0.98, 1.0, 1.0]

Example

Input

params = [1.0, 1.0, 1.0, 1.0]
grad = [0.1, 0.2, 0.3, 0.4]
forget_mask = [1, 1, 0, 0]
lr = 0.1
batch_type = 'retain'

Output

[1.0, 1.0, 0.97, 0.96]

Explanation

For a 'retain' batch, only retain parameters (where forget_mask = 0) should be updated.

The retain mask is the complement of forget_mask: [0, 0, 1, 1] The effective gradients become: • Masked grad = [0.1 × 0, 0.2 × 0, 0.3 × 1, 0.4 × 1] = [0.0, 0.0, 0.3, 0.4]

Applying the update θ_new = θ - lr × masked_grad: • θ₀ = 1.0 - 0.1 × 0.0 = 1.0 (unchanged) • θ₁ = 1.0 - 0.1 × 0.0 = 1.0 (unchanged) • θ₂ = 1.0 - 0.1 × 0.3 = 0.97 • θ₃ = 1.0 - 0.1 × 0.4 = 0.96

Result: [1.0, 1.0, 0.97, 0.96]

Example

Input

params = [1.0, 1.0, 1.0, 1.0]
grad = [0.1, 0.2, 0.3, 0.4]
forget_mask = [1, 1, 0, 0]
lr = 0.1
batch_type = 'unlabeled'

Output

[0.99, 0.98, 0.97, 0.96]

Explanation

For an 'unlabeled' batch, all parameters are updated normally without any masking.

Applying the standard gradient descent update θ_new = θ - lr × grad: • θ₀ = 1.0 - 0.1 × 0.1 = 0.99 • θ₁ = 1.0 - 0.1 × 0.2 = 0.98 • θ₂ = 1.0 - 0.1 × 0.3 = 0.97 • θ₃ = 1.0 - 0.1 × 0.4 = 0.96

Result: [0.99, 0.98, 0.97, 0.96]

Accepted0/0·0% Acceptance

Constraints

1 ≤ length of params ≤ 10,000
Length of params, grad, and forget_mask are equal
-10⁶ ≤ params[i], grad[i] ≤ 10⁶
forget_mask[i] ∈ {0, 1}
0 < lr ≤ 1.0
batch_type ∈ {'forget', 'retain', 'unlabeled'}

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

lr =

0.1

grad =

[0.1,0.2,0.3,0.4]

params =

[1,1,1,1]

batch_type =

"forget"

forget_mask =

[1,1,0,0]

The Concept of Localized Knowledge Editing

Neural network parameters can be partitioned into distinct groups based on their functional role:

Forget Parameters: Parameters that encode knowledge we want to modify or remove (e.g., outdated or harmful information)

Retain Parameters: Parameters that encode knowledge we want to preserve (e.g., factual or safe information)

The key insight is that by applying binary masks to gradients before the parameter update step, we can control which neurons are affected by each training batch.

The SGM Update Rule

Given a parameter vector θ, gradient vector g, a binary forget mask M (where 1 indicates forget parameters), and learning rate η, the update depends on the batch type:

For 'forget' batches (containing data we want the model to unlearn): $$\theta_{new} = \theta - \eta \cdot (M \odot g)$$

For 'retain' batches (containing data we want to preserve): $$\theta_{new} = \theta - \eta \cdot ((1 - M) \odot g)$$

For 'unlabeled' batches (general training data): $$\theta_{new} = \theta - \eta \cdot g$$

Where ⊙ denotes element-wise multiplication (Hadamard product).

Your Task

'forget': Update only parameters where forget_mask[i] = 1 (mask the gradients for retain parameters to zero)

'retain': Update only parameters where forget_mask[i] = 0 (mask the gradients for forget parameters to zero)

'unlabeled': Update all parameters normally without any masking

Return the updated parameter vector after applying the masked gradient step.

Masked Gradient Descent for Localized Knowledge Editing

The Concept of Localized Knowledge Editing

The SGM Update Rule

Your Task

Hints

Masked Gradient Descent for Localized Knowledge Editing

The Concept of Localized Knowledge Editing

The SGM Update Rule

Your Task

Hints