0/318

00:00:00

Description

Editorial

Nesterov Accelerated Momentum Optimizer

EASY10 pts

In gradient-based optimization, the Nesterov Accelerated Gradient (NAG) method is a powerful enhancement to classical momentum optimization that provides faster convergence and better theoretical guarantees. Unlike standard momentum, which blindly follows the accumulated gradient direction, Nesterov momentum employs a look-ahead strategy that anticipates where the parameters will move before computing the gradient.

The Intuition Behind Nesterov Momentum

Imagine you're a skier descending a mountain. Standard momentum would have you continue in your current direction of travel while also considering the slope beneath your feet. Nesterov momentum is smarter: it first projects where you'll end up after applying your current momentum, then looks at the slope at that anticipated position to make a more informed adjustment. This "peek ahead" prevents overshooting and enables more responsive corrections.

Mathematical Formulation

The Nesterov Accelerated Gradient update consists of two key steps:

Step 1: Look-Ahead Position First, compute the anticipated position by projecting the current velocity: $$\theta_{\text{look-ahead}} = \theta_t + \mu \cdot v_t$$

Step 2: Compute Gradient at Look-Ahead Position Evaluate the gradient at this anticipated position: $$g = abla f(\theta_{\text{look-ahead}})$$

Step 3: Update Velocity and Parameters Update the velocity by incorporating both momentum and the gradient at the look-ahead position: $$v_{t+1} = \mu \cdot v_t - \alpha \cdot g$$

Update the parameters using the new velocity: $$\theta_{t+1} = \theta_t + v_{t+1}$$

Where:

θₜ is the current parameter value
vₜ is the current velocity (momentum accumulator)
μ is the momentum coefficient (typically 0.9)
α is the learning rate (typically 0.01 to 0.1)
∇f is the gradient function

Your Task

Implement the nesterov_momentum_update function that performs a single Nesterov momentum optimization step. Your function should:

Accept scalar or array-type parameters and velocities
Apply the look-ahead gradient computation
Return the updated parameter and velocity values
Use default values of learning_rate=0.1 and momentum=0.9

Round all output values to 4 decimal places for consistency.

Example

Input

parameter = 1.0
grad_fn = lambda x: x
velocity = 0.1

Output

[0.981, -0.019]

Explanation

Step-by-step calculation:

Look-ahead position: θ_lookahead = θ + μ × v = 1.0 + 0.9 × 0.1 = 1.09
Gradient at look-ahead position: g = grad_fn(1.09) = 1.09
Update velocity: v_new = μ × v - α × g = 0.9 × 0.1 - 0.1 × 1.09 = 0.09 - 0.109 = -0.019
Update parameter: θ_new = θ + v_new = 1.0 + (-0.019) = 0.981

Result: [0.981, -0.019]

Example

Input

parameter = 2.0
grad_fn = lambda x: x
velocity = 0.0

Output

[1.8, -0.2]

Explanation

Step-by-step calculation:

Look-ahead position: θ_lookahead = θ + μ × v = 2.0 + 0.9 × 0.0 = 2.0
Gradient at look-ahead position: g = grad_fn(2.0) = 2.0
Update velocity: v_new = μ × v - α × g = 0.9 × 0.0 - 0.1 × 2.0 = 0 - 0.2 = -0.2
Update parameter: θ_new = θ + v_new = 2.0 + (-0.2) = 1.8

Result: [1.8, -0.2]

Note: With zero initial velocity, the look-ahead position equals the current position, and the update behaves like a standard gradient descent step with momentum initialization.

Example

Input

parameter = [1.0, 2.0, 3.0]
grad_fn = lambda x: x
velocity = [0.1, 0.2, 0.3]

Output

[[0.981, 1.962, 2.943], [-0.019, -0.038, -0.057]]

Explanation

Vector computation (element-wise):

The same algorithm applies independently to each element:

For element 0:

Look-ahead: 1.0 + 0.9 × 0.1 = 1.09
Gradient: 1.09
New velocity: 0.9 × 0.1 - 0.1 × 1.09 = -0.019
New parameter: 1.0 + (-0.019) = 0.981

For element 1:

Look-ahead: 2.0 + 0.9 × 0.2 = 2.18
Gradient: 2.18
New velocity: 0.9 × 0.2 - 0.1 × 2.18 = -0.038
New parameter: 2.0 + (-0.038) = 1.962

For element 2:

Look-ahead: 3.0 + 0.9 × 0.3 = 3.27
Gradient: 3.27
New velocity: 0.9 × 0.3 - 0.1 × 3.27 = -0.057
New parameter: 3.0 + (-0.057) = 2.943

Result: [[0.981, 1.962, 2.943], [-0.019, -0.038, -0.057]]

Accepted0/0·0% Acceptance

Constraints

Parameters can be scalar (float) or vector (list of floats)
Velocity must have the same shape as parameters
-10⁶ ≤ parameter values ≤ 10⁶
-10⁶ ≤ velocity values ≤ 10⁶
Default learning_rate = 0.1
Default momentum = 0.9
Round output values to 4 decimal places
The gradient function will always be a valid Python callable

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

grad_fn =

"lambda x: x"

velocity =

0.1

parameter =

The Intuition Behind Nesterov Momentum

Mathematical Formulation

The Nesterov Accelerated Gradient update consists of two key steps:

Step 1: Look-Ahead Position First, compute the anticipated position by projecting the current velocity: $$\theta_{\text{look-ahead}} = \theta_t + \mu \cdot v_t$$

Step 2: Compute Gradient at Look-Ahead Position Evaluate the gradient at this anticipated position: $$g = abla f(\theta_{\text{look-ahead}})$$

Step 3: Update Velocity and Parameters Update the velocity by incorporating both momentum and the gradient at the look-ahead position: $$v_{t+1} = \mu \cdot v_t - \alpha \cdot g$$

Update the parameters using the new velocity: $$\theta_{t+1} = \theta_t + v_{t+1}$$

Where:

θₜ is the current parameter value

vₜ is the current velocity (momentum accumulator)

μ is the momentum coefficient (typically 0.9)

α is the learning rate (typically 0.01 to 0.1)

∇f is the gradient function

Your Task

Implement the nesterov_momentum_update function that performs a single Nesterov momentum optimization step. Your function should:

Accept scalar or array-type parameters and velocities

Apply the look-ahead gradient computation

Return the updated parameter and velocity values

Use default values of learning_rate=0.1 and momentum=0.9

Round all output values to 4 decimal places for consistency.

Nesterov Accelerated Momentum Optimizer

The Intuition Behind Nesterov Momentum

Mathematical Formulation

Your Task

Hints

Nesterov Accelerated Momentum Optimizer

The Intuition Behind Nesterov Momentum

Mathematical Formulation

Your Task

Hints