Loading content...
Partial derivatives tell us how a function changes along coordinate axes (the $x$, $y$, $z$ directions). But what if we want to know the rate of change in any direction—say, northeast at a 30° angle?
Directional derivatives answer this question. They generalize partial derivatives to arbitrary directions and reveal the deep connection between the gradient vector and rates of change.
By the end of this page, you'll compute directional derivatives in any direction, understand their relationship to the gradient, and apply this knowledge to optimization intuition.
Formal Definition:
The directional derivative of $f$ at point $\mathbf{a}$ in the direction of unit vector $\mathbf{u}$ is:
$$D_{\mathbf{u}} f(\mathbf{a}) = \lim_{h \to 0} \frac{f(\mathbf{a} + h\mathbf{u}) - f(\mathbf{a})}{h}$$
Key Requirements:
Connection to Partial Derivatives:
Partial derivatives are special cases:
For differentiable functions, there's a simple formula: $D_{\mathbf{u}} f = abla f \cdot \mathbf{u}$. This dot product elegantly connects directional derivatives to the gradient vector.
The Fundamental Formula:
$$D_{\mathbf{u}} f = abla f \cdot \mathbf{u} = | abla f| |\mathbf{u}| \cos\theta = | abla f| \cos\theta$$
where $\theta$ is the angle between $ abla f$ and $\mathbf{u}$.
Implications:
| Angle $\theta$ | $\cos\theta$ | $D_{\mathbf{u}}f$ | Meaning |
|---|---|---|---|
| $0°$ | $1$ | $| | |
| abla f|$ | Maximum (along gradient) | ||
| $90°$ | $0$ | $0$ | No change (along level set) |
| $180°$ | $-1$ | $-| | |
| abla f|$ | Maximum decrease | ||
| $45°$ | $0.707$ | $0.707| | |
| abla f|$ | Partial increase |
This proves: The gradient points in the direction of steepest increase, with magnitude equal to that steepest rate.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import numpy as np def directional_derivative(f, point, direction, h=1e-7): """ Compute directional derivative numerically. Direction will be normalized to unit vector. """ u = np.array(direction, dtype=float) u = u / np.linalg.norm(u) # Normalize point = np.array(point, dtype=float) return (f(point + h*u) - f(point - h*u)) / (2*h) def gradient(f, point, h=1e-7): """Compute gradient numerically.""" n = len(point) grad = np.zeros(n) point = np.array(point, dtype=float) for i in range(n): e = np.zeros(n) e[i] = 1 grad[i] = (f(point + h*e) - f(point - h*e)) / (2*h) return grad # Example: f(x,y) = x² + xy + y²def f(p): x, y = p[0], p[1] return x**2 + x*y + y**2 point = np.array([1.0, 2.0])grad = gradient(f, point)print(f"f(x,y) = x² + xy + y² at point (1, 2)")print(f"Gradient: {grad}")print(f"|∇f| = {np.linalg.norm(grad):.4f}") # Test various directionsdirections = [ ([1, 0], "East (+x)"), ([0, 1], "North (+y)"), ([1, 1], "Northeast"), (grad, "Gradient direction"), (-grad, "Anti-gradient"),] print("Directional Derivatives:")for d, name in directions: d = np.array(d, dtype=float) u = d / np.linalg.norm(d) # Numerical dd_num = directional_derivative(f, point, d) # Via gradient formula dd_formula = np.dot(grad, u) print(f" {name:20s}: numerical={dd_num:.4f}, formula={dd_formula:.4f}")Visualizing Directional Derivatives:
Imagine a 3D surface $z = f(x, y)$. At point $(a, b)$:
Physical Analogy:
You're standing on a hill at point $(a, b)$ with elevation $f(a, b)$:
If you move tangent to a level set (perpendicular to gradient), the directional derivative is zero—you're moving along a contour where $f$ is constant. This is why gradient is perpendicular to level sets.
Why Directional Derivatives Matter for ML:
Understanding Gradient Descent: The update $-\eta abla f$ is precisely the direction of maximum decrease
Momentum: Uses exponential average of gradients, creating a direction different from the current gradient
Natural Gradient: Uses Fisher information to define a Riemannian metric, changing what 'direction' means
Hessian-Free Optimization: Computes products $Hv$ (directional derivative of gradient) without explicit Hessian
You now understand rates of change in arbitrary directions. Next, we'll explore the Hessian matrix—second-order derivatives that capture curvature and enable faster, smarter optimization.