Loading content...
In numerical optimization, finding the minimum of a function efficiently is a fundamental challenge. While first-order methods like gradient descent use only the slope (gradient) to guide the search, second-order methods incorporate curvature information through the Hessian matrix to achieve dramatically faster convergence.
The curvature-based optimization step (commonly known as the Newton step) computes the optimal update direction by considering both where the function is heading (gradient) and how fast that direction is changing (curvature). This approach approximates the function locally as a quadratic and jumps directly to its minimum.
The Algorithm:
Given a current position x₀, the gradient vector ∇f(x₀), and the Hessian matrix H(x₀), the optimal next position is computed as:
$$x_{\text{new}} = x_0 - H^{-1} \cdot abla f(x_0)$$
Where:
The Update Step:
The update direction Δx is found by solving the linear system:
$$H \cdot \Delta x = - abla f$$
Then the new position becomes: x_new = x₀ + Δx
Key Insight:
For quadratic functions, this method converges to the exact minimum in a single step, regardless of the starting point. For more complex functions, it achieves quadratic convergence near the optimum—meaning the number of correct digits roughly doubles with each iteration.
Your Task:
Implement a function that computes one step of this curvature-based optimization algorithm. Given the current position, the gradient at that position, and the Hessian matrix at that position, return the new position after applying the second-order update step.
x0 = [0.0, 0.0]
gradient_at_x0 = [-2.0, -4.0]
hessian_at_x0 = [[2.0, 0.0], [0.0, 2.0]][1.0, 2.0]Starting at position x₀ = [0, 0], we need to compute the curvature-based step.
Step 1: Compute the inverse of the Hessian The Hessian H = [[2, 0], [0, 2]] is a diagonal matrix. Its inverse is H⁻¹ = [[0.5, 0], [0, 0.5]]
Step 2: Compute the Newton direction Δx = -H⁻¹ · ∇f = -[[0.5, 0], [0, 0.5]] · [-2, -4] Δx = -[0.5×(-2) + 0×(-4), 0×(-2) + 0.5×(-4)] Δx = -[-1, -2] = [1, 2]
Step 3: Update the position x_new = x₀ + Δx = [0, 0] + [1, 2] = [1, 2]
Since the underlying function is quadratic (f(x,y) = (x-1)² + (y-2)²), the method converges to the exact minimum [1, 2] in a single step.
x0 = [5.0, 5.0]
gradient_at_x0 = [8.0, 6.0]
hessian_at_x0 = [[2.0, 0.0], [0.0, 2.0]][1.0, 2.0]Starting from a different position x₀ = [5, 5], we apply the same curvature-based approach.
Step 1: Invert the Hessian H⁻¹ = [[0.5, 0], [0, 0.5]] (same diagonal inverse)
Step 2: Compute the update direction Δx = -H⁻¹ · ∇f = -[[0.5, 0], [0, 0.5]] · [8, 6] Δx = -[4, 3] = [-4, -3]
Step 3: Apply the update x_new = [5, 5] + [-4, -3] = [1, 2]
Remarkably, even starting from a completely different location, the algorithm finds the same minimum [1, 2] in one step. This demonstrates the power of using curvature information for quadratic functions.
x0 = [0.0, 0.0, 0.0]
gradient_at_x0 = [-2.0, -4.0, -6.0]
hessian_at_x0 = [[2.0, 0.0, 0.0], [0.0, 2.0, 0.0], [0.0, 0.0, 2.0]][1.0, 2.0, 3.0]This example extends to 3 dimensions, starting at the origin x₀ = [0, 0, 0].
Step 1: Invert the 3×3 Hessian The Hessian H = [[2,0,0], [0,2,0], [0,0,2]] = 2·I (scaled identity matrix) Its inverse is H⁻¹ = [[0.5,0,0], [0,0.5,0], [0,0,0.5]]
Step 2: Compute the Newton direction Δx = -H⁻¹ · ∇f = -[[0.5,0,0], [0,0.5,0], [0,0,0.5]] · [-2, -4, -6] Δx = -[-1, -2, -3] = [1, 2, 3]
Step 3: Update position x_new = [0, 0, 0] + [1, 2, 3] = [1, 2, 3]
The algorithm correctly identifies the minimum of the 3D quadratic function f(x,y,z) = (x-1)² + (y-2)² + (z-3)² in a single iteration, demonstrating that the method scales naturally to higher dimensions.
Constraints