0/318

00:00:00

Description

Editorial

Adaptive Gradient Parameter Updater

EASY10 pts

In the landscape of optimization algorithms for machine learning, adaptive learning rate methods represent a significant advancement over fixed-rate gradient descent. One of the pioneering techniques in this category adjusts the learning rate individually for each parameter based on the history of gradients observed during training.

The core insight is elegant: parameters that frequently receive large gradients should have their effective learning rate reduced, while parameters with infrequent or small gradients should maintain a larger learning rate. This per-parameter adaptation is achieved by tracking the cumulative sum of squared gradients.

Mathematical Formulation

Given the following inputs at iteration t:

θ (theta): The current parameter value(s)
g: The gradient of the loss function with respect to θ
G: The accumulated sum of squared gradients from all previous iterations
η (eta): The base learning rate (default: 0.01)
ε (epsilon): A small constant for numerical stability (default: 1e-8)

The update procedure follows these steps:

Step 1: Update the gradient accumulator $$G_{new} = G + g^2$$

Step 2: Compute the adaptive learning rate and update parameters $$\theta_{new} = \theta - \frac{\eta}{\sqrt{G_{new}} + \epsilon} \cdot g$$

The division by √(G + ε) is the key mechanism: as gradients accumulate over time, the effective learning rate for each parameter automatically decreases. This provides a form of automatic annealing that can be particularly beneficial for sparse features and non-convex optimization landscapes.

Your Task

Implement a function that performs a single parameter update step using this adaptive gradient method. Your implementation should:

Accept both scalar values (single parameter) and array inputs (multiple parameters) for the parameter, gradient, and accumulator
Properly handle the element-wise operations when working with arrays
Return a tuple containing the updated parameter(s) and the new accumulated squared gradient(s)
Use the default values: learning_rate = 0.01 and epsilon = 1e-8 if not specified
Handle the computation with appropriate numerical precision

Example

Input

parameter = 1.0, grad = 0.1, G = 1.0

Output

[0.999, 1.01]

Explanation

Step-by-step calculation with scalar values:

Given:

Current parameter θ = 1.0
Gradient g = 0.1
Accumulated squared gradients G = 1.0
Learning rate η = 0.01 (default)
Epsilon ε = 1e-8 (default)

Step 1: Update gradient accumulator G_new = G + g² = 1.0 + (0.1)² = 1.0 + 0.01 = 1.01

Step 2: Calculate adaptive learning rate Adaptive rate = η / (√G_new + ε) = 0.01 / (√1.01 + 1e-8) ≈ 0.01 / 1.005 ≈ 0.00995

Step 3: Update parameter θ_new = θ - adaptive_rate × g = 1.0 - 0.00995 × 0.1 ≈ 1.0 - 0.000995 ≈ 0.999

The function returns (0.999, 1.01) representing the updated parameter and new gradient accumulator.

Example

Input

parameter = [1.0, 2.0, 3.0], grad = [0.1, 0.2, 0.3], G = [0.0, 0.0, 0.0]

Output

[[0.99, 1.99, 2.99], [0.01, 0.04, 0.09]]

Explanation

Element-wise calculation with array inputs (starting from zero accumulation):

Given:

Parameters θ = [1.0, 2.0, 3.0]
Gradients g = [0.1, 0.2, 0.3]
Accumulated squared gradients G = [0.0, 0.0, 0.0]
Learning rate η = 0.01, Epsilon ε = 1e-8

Step 1: Update gradient accumulator (element-wise) G_new = G + g² = [0.0 + 0.01, 0.0 + 0.04, 0.0 + 0.09] = [0.01, 0.04, 0.09]

Step 2: Calculate element-wise updates

For θ[0] = 1.0:

Adaptive rate = 0.01 / (√0.01 + ε) = 0.01 / 0.1 = 0.1
θ_new[0] = 1.0 - 0.1 × 0.1 = 1.0 - 0.01 = 0.99

For θ[1] = 2.0:

Adaptive rate = 0.01 / (√0.04 + ε) = 0.01 / 0.2 = 0.05
θ_new[1] = 2.0 - 0.05 × 0.2 = 2.0 - 0.01 = 1.99

For θ[2] = 3.0:

Adaptive rate = 0.01 / (√0.09 + ε) = 0.01 / 0.3 ≈ 0.0333
θ_new[2] = 3.0 - 0.0333 × 0.3 ≈ 3.0 - 0.01 = 2.99

Result: Updated parameters = [0.99, 1.99, 2.99], Updated G = [0.01, 0.04, 0.09]

Example

Input

parameter = [0.5, -0.5, 1.0], grad = [0.5, -0.3, 0.1], G = [0.5, 0.25, 0.1]

Output

[[0.494, -0.495, 0.997], [0.75, 0.34, 0.11]]

Explanation

Calculation with pre-existing gradient history:

Given:

Parameters θ = [0.5, -0.5, 1.0]
Gradients g = [0.5, -0.3, 0.1]
Previous accumulated gradients G = [0.5, 0.25, 0.1]

Step 1: Update gradient accumulator G_new = [0.5 + 0.25, 0.25 + 0.09, 0.1 + 0.01] = [0.75, 0.34, 0.11]

Step 2: Compute updates with accumulated history

For θ[0] = 0.5 (g = 0.5, G_new = 0.75):

Adaptive rate = 0.01 / √0.75 ≈ 0.01155
θ_new[0] = 0.5 - 0.01155 × 0.5 ≈ 0.494

For θ[1] = -0.5 (g = -0.3, G_new = 0.34):

Adaptive rate = 0.01 / √0.34 ≈ 0.01715
θ_new[1] = -0.5 - 0.01715 × (-0.3) ≈ -0.5 + 0.005 ≈ -0.495

For θ[2] = 1.0 (g = 0.1, G_new = 0.11):

Adaptive rate = 0.01 / √0.11 ≈ 0.03015
θ_new[2] = 1.0 - 0.03015 × 0.1 ≈ 0.997

Notice how the negative gradient for θ[1] causes the parameter to move in the positive direction, and how larger accumulated gradients result in smaller effective step sizes.

Accepted0/0·0% Acceptance

Constraints

Parameters, gradients, and accumulators can be either scalars (float) or arrays (list of floats)
When inputs are arrays, they must all have the same length
-10⁶ ≤ parameter values ≤ 10⁶
-10⁶ ≤ gradient values ≤ 10⁶
0 ≤ G values (accumulated squared gradients are always non-negative)
Array lengths: 1 ≤ length ≤ 1000
Default learning rate η = 0.01
Default epsilon ε = 1e-8 for numerical stability

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

G =

grad =

0.1

epsilon =

1e-8

parameter =

learning_rate =

0.01

Mathematical Formulation

Given the following inputs at iteration t:

θ (theta): The current parameter value(s)

g: The gradient of the loss function with respect to θ

G: The accumulated sum of squared gradients from all previous iterations

η (eta): The base learning rate (default: 0.01)

ε (epsilon): A small constant for numerical stability (default: 1e-8)

The update procedure follows these steps:

Step 1: Update the gradient accumulator $$G_{new} = G + g^2$$

Step 2: Compute the adaptive learning rate and update parameters $$\theta_{new} = \theta - \frac{\eta}{\sqrt{G_{new}} + \epsilon} \cdot g$$

Your Task

Implement a function that performs a single parameter update step using this adaptive gradient method. Your implementation should:

Accept both scalar values (single parameter) and array inputs (multiple parameters) for the parameter, gradient, and accumulator

Properly handle the element-wise operations when working with arrays

Return a tuple containing the updated parameter(s) and the new accumulated squared gradient(s)

Use the default values: learning_rate = 0.01 and epsilon = 1e-8 if not specified

Handle the computation with appropriate numerical precision

Adaptive Gradient Parameter Updater

Mathematical Formulation

Your Task

Hints

Adaptive Gradient Parameter Updater

Mathematical Formulation

Your Task

Hints