0/318

00:00:00

Description

Editorial

Gradient Verification via Finite Difference Approximation

MEDIUM30 pts

In machine learning and optimization, computing gradients correctly is absolutely critical. Whether you're training neural networks through backpropagation, implementing custom loss functions, or developing novel optimization algorithms, incorrect gradients lead to failed convergence, suboptimal solutions, or completely broken models.

The challenge is that analytical gradient computations can be complex, error-prone, and difficult to debug. A single sign error, missing derivative term, or incorrect chain rule application can silently corrupt your entire training process. This is where gradient verification becomes an indispensable debugging technique.

The Finite Difference Method

The centered finite difference approximation provides a reliable way to numerically estimate gradients. For a scalar-valued function ( f: \mathbb{R}^n \to \mathbb{R} ), the partial derivative with respect to the ( i )-th dimension can be approximated as:

$$\frac{\partial f}{\partial x_i} \approx \frac{f(x + \epsilon \cdot e_i) - f(x - \epsilon \cdot e_i)}{2\epsilon}$$

where ( e_i ) is the unit vector in the ( i )-th direction and ( \epsilon ) is a small perturbation value (typically ( 10^{-7} ) to ( 10^{-5} )).

This centered difference formula achieves second-order accuracy ( O(\epsilon^2) ), which is more accurate than the forward difference formula that has first-order accuracy ( O(\epsilon) ).

Quantifying Gradient Error

To compare numerical and analytical gradients robustly, we use the relative error metric that normalizes by the magnitude of both gradients:

$$\text{relative_error} = \frac{| abla_{\text{numerical}} - abla_{\text{analytical}}|2}{| abla{\text{numerical}}|2 + | abla{\text{analytical}}|_2}$$

This metric ranges from 0 (perfect match) to 1 (completely different gradients). A well-implemented gradient should achieve a relative error of less than ( 10^{-5} ) to ( 10^{-7} ).

Edge Case Handling: When both gradients are zero vectors, the relative error should be defined as 0 (since identical zero gradients represent a perfect match).

Your Task

Implement a function that computes the numerical gradient using the centered finite difference method and calculates the relative error between the numerical and provided analytical gradients.

Input Parameters:

f: A scalar-valued function ( f: \mathbb{R}^n \to \mathbb{R} )
x: A numpy array representing the evaluation point
analytical_grad: A numpy array representing the analytically computed gradient
epsilon: The perturbation magnitude (default: ( 10^{-7} ))

Output: Return a tuple containing:

numerical_grad: The numerically approximated gradient (same shape as x)
relative_error: A scalar quantifying the difference between gradients

Example

Input

f(x) = x[0]**2 + x[1]**2
x = [3.0, 4.0]
analytical_grad = [6.0, 8.0]
epsilon = 1e-7

Output

numerical_grad = [6.0, 8.0], relative_error ≈ 1e-9

Explanation

For the quadratic function f(x) = x₀² + x₁², the true gradient is ∇f = [2x₀, 2x₁].

At x = [3.0, 4.0], the analytical gradient is correctly given as [6.0, 8.0].

Computing the numerical gradient:

For x₀: • f([3 + ε, 4]) = (3 + ε)² + 16 = 9 + 6ε + ε² + 16 • f([3 - ε, 4]) = (3 - ε)² + 16 = 9 - 6ε + ε² + 16 • ∂f/∂x₀ ≈ ((9 + 6ε + ε² + 16) - (9 - 6ε + ε² + 16)) / (2ε) = 12ε / (2ε) = 6.0

For x₁: • f([3, 4 + ε]) = 9 + (4 + ε)² = 9 + 16 + 8ε + ε² • f([3, 4 - ε]) = 9 + (4 - ε)² = 9 + 16 - 8ε + ε² • ∂f/∂x₁ ≈ 16ε / (2ε) = 8.0

Since both gradients are [6.0, 8.0], the relative error is essentially 0 (limited only by floating-point precision rounding, approximately 1e-9).

Example

Input

f(x) = x[0] + x[1] + x[2]
x = [1.0, 2.0, 3.0]
analytical_grad = [1.0, 1.0, 1.0]
epsilon = 1e-7

Output

numerical_grad = [1.0, 1.0, 1.0], relative_error ≈ 1.9e-9

Explanation

For the linear function f(x) = x₀ + x₁ + x₂, the gradient is constant: ∇f = [1, 1, 1].

Computing the numerical gradient for each dimension:

For any dimension i: • f(x + ε·eᵢ) - f(x - ε·eᵢ) = (sum of components + ε) - (sum of components - ε) = 2ε • ∂f/∂xᵢ ≈ 2ε / (2ε) = 1.0

The numerical gradient matches [1.0, 1.0, 1.0] exactly. The relative error is at the noise floor of floating-point computation (approximately 1.9e-9).

Example

Input

f(x) = x[0]**3 + x[1]**3
x = [2.0, -1.0]
analytical_grad = [12.0, 3.0]
epsilon = 1e-7

Output

numerical_grad = [12.0, 3.0], relative_error ≈ 3e-10

Explanation

For the cubic function f(x) = x₀³ + x₁³, the gradient is ∇f = [3x₀², 3x₁²].

At x = [2.0, -1.0]: • ∂f/∂x₀ = 3 × (2.0)² = 12.0 • ∂f/∂x₁ = 3 × (-1.0)² = 3.0

Numerical verification using centered differences:

For x₀: • f([2 + ε, -1]) = (2 + ε)³ + (-1)³ = 8 + 12ε + 6ε² + ε³ - 1 • f([2 - ε, -1]) = (2 - ε)³ + (-1)³ = 8 - 12ε + 6ε² - ε³ - 1 • ∂f/∂x₀ ≈ (24ε + 2ε³) / (2ε) = 12 + ε² ≈ 12.0

For x₁: • f([2, -1 + ε]) = 8 + (-1 + ε)³ = 8 - 1 + 3ε - 3ε² + ε³ • f([2, -1 - ε]) = 8 + (-1 - ε)³ = 8 - 1 - 3ε - 3ε² - ε³ • ∂f/∂x₁ ≈ (6ε + 2ε³) / (2ε) = 3 + ε² ≈ 3.0

Both gradients agree, yielding an extremely small relative error (~3e-10), demonstrating correct analytical gradient implementation.

Accepted0/0·0% Acceptance

Constraints

1 ≤ dimension of x ≤ 1000
The function f must be differentiable and return a scalar value
-10⁹ ≤ x[i] ≤ 10⁹
-10⁹ ≤ analytical_grad[i] ≤ 10⁹
10⁻¹² ≤ epsilon ≤ 10⁻³
The analytical_grad array must have the same shape as x
Handle the edge case where both gradients are zero vectors (return relative_error = 0)

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

f =

"f(x) = x[0] + x[1] + x[2]"

x =

[1,2,3]

epsilon =

1e-7

analytical_grad =

[1,1,1]

The Finite Difference Method

$$\frac{\partial f}{\partial x_i} \approx \frac{f(x + \epsilon \cdot e_i) - f(x - \epsilon \cdot e_i)}{2\epsilon}$$

where ( e_i ) is the unit vector in the ( i )-th direction and ( \epsilon ) is a small perturbation value (typically ( 10^{-7} ) to ( 10^{-5} )).

This centered difference formula achieves second-order accuracy ( O(\epsilon^2) ), which is more accurate than the forward difference formula that has first-order accuracy ( O(\epsilon) ).

Quantifying Gradient Error

To compare numerical and analytical gradients robustly, we use the relative error metric that normalizes by the magnitude of both gradients:

$$\text{relative_error} = \frac{| abla_{\text{numerical}} - abla_{\text{analytical}}|2}{| abla{\text{numerical}}|2 + | abla{\text{analytical}}|_2}$$

This metric ranges from 0 (perfect match) to 1 (completely different gradients). A well-implemented gradient should achieve a relative error of less than ( 10^{-5} ) to ( 10^{-7} ).

Edge Case Handling: When both gradients are zero vectors, the relative error should be defined as 0 (since identical zero gradients represent a perfect match).

Your Task

Input Parameters:

f: A scalar-valued function ( f: \mathbb{R}^n \to \mathbb{R} )

x: A numpy array representing the evaluation point

analytical_grad: A numpy array representing the analytically computed gradient

epsilon: The perturbation magnitude (default: ( 10^{-7} ))

Output: Return a tuple containing:

numerical_grad: The numerically approximated gradient (same shape as x)

relative_error: A scalar quantifying the difference between gradients

Gradient Verification via Finite Difference Approximation

The Finite Difference Method

Quantifying Gradient Error

Your Task

Hints

Gradient Verification via Finite Difference Approximation

The Finite Difference Method

Quantifying Gradient Error

Your Task

Hints