Differential Calculus - Learning Module

Loading content...

0/278

Directional Derivatives

Beyond Coordinate Axes

Partial derivatives tell us how a function changes along coordinate axes (the $x$, $y$, $z$ directions). But what if we want to know the rate of change in any direction—say, northeast at a 30° angle?

Directional derivatives answer this question. They generalize partial derivatives to arbitrary directions and reveal the deep connection between the gradient vector and rates of change.

What You Will Master

By the end of this page, you'll compute directional derivatives in any direction, understand their relationship to the gradient, and apply this knowledge to optimization intuition.

Directional Derivative Defined

Formal Definition:

The directional derivative of $f$ at point $\mathbf{a}$ in the direction of unit vector $\mathbf{u}$ is:

$$D_{\mathbf{u}} f(\mathbf{a}) = \lim_{h \to 0} \frac{f(\mathbf{a} + h\mathbf{u}) - f(\mathbf{a})}{h}$$

Key Requirements:

$\mathbf{u}$ must be a unit vector ($|\mathbf{u}| = 1$)
If $\mathbf{u}$ is not unit, normalize it: $\hat{\mathbf{u}} = \frac{\mathbf{u}}{|\mathbf{u}|}$

Connection to Partial Derivatives:

Partial derivatives are special cases:

$\frac{\partial f}{\partial x} = D_{\mathbf{e}_1} f$ where $\mathbf{e}_1 = [1, 0, 0, \ldots]^T$
$\frac{\partial f}{\partial y} = D_{\mathbf{e}_2} f$ where $\mathbf{e}_2 = [0, 1, 0, \ldots]^T$

The Gradient Formula

For differentiable functions, there's a simple formula: $D_{\mathbf{u}} f = abla f \cdot \mathbf{u}$. This dot product elegantly connects directional derivatives to the gradient vector.

The Gradient-Directional Derivative Relationship

The Fundamental Formula:

$$D_{\mathbf{u}} f = abla f \cdot \mathbf{u} = | abla f| |\mathbf{u}| \cos\theta = | abla f| \cos\theta$$

where $\theta$ is the angle between $ abla f$ and $\mathbf{u}$.

Implications:

Angle $\theta$	$\cos\theta$	$D_{\mathbf{u}}f$	Meaning
$0°$	$1$	$\
abla f\|$	Maximum (along gradient)
$90°$	$0$	$0$	No change (along level set)
$180°$	$-1$	$-\
abla f\|$	Maximum decrease
$45°$	$0.707$	$0.707\
abla f\|$	Partial increase

This proves: The gradient points in the direction of steepest increase, with magnitude equal to that steepest rate.

Directional Derivative Computation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
 
def directional_derivative(f, point, direction, h=1e-7):
    """
    Compute directional derivative numerically.
    Direction will be normalized to unit vector.
    """
    u = np.array(direction, dtype=float)
    u = u / np.linalg.norm(u)  # Normalize
    
    point = np.array(point, dtype=float)
    return (f(point + h*u) - f(point - h*u)) / (2*h)
 
def gradient(f, point, h=1e-7):
    """Compute gradient numerically."""
    n = len(point)
    grad = np.zeros(n)
    point = np.array(point, dtype=float)
    for i in range(n):
        e = np.zeros(n)
        e[i] = 1
        grad[i] = (f(point + h*e) - f(point - h*e)) / (2*h)
    return grad
 
# Example: f(x,y) = x² + xy + y²
def f(p):
    x, y = p[0], p[1]
    return x**2 + x*y + y**2
 
point = np.array([1.0, 2.0])
grad = gradient(f, point)
print(f"f(x,y) = x² + xy + y² at point (1, 2)")
print(f"Gradient: {grad}")
print(f"|∇f| = {np.linalg.norm(grad):.4f}
")
 
# Test various directions
directions = [
    ([1, 0], "East (+x)"),
    ([0, 1], "North (+y)"),
    ([1, 1], "Northeast"),
    (grad, "Gradient direction"),
    (-grad, "Anti-gradient"),
]
 
print("Directional Derivatives:")
for d, name in directions:
    d = np.array(d, dtype=float)
    u = d / np.linalg.norm(d)
    
    # Numerical
    dd_num = directional_derivative(f, point, d)
    
    # Via gradient formula
    dd_formula = np.dot(grad, u)
    
    print(f"  {name:20s}: numerical={dd_num:.4f}, formula={dd_formula:.4f}")

Geometric Interpretation

Visualizing Directional Derivatives:

Imagine a 3D surface $z = f(x, y)$. At point $(a, b)$:

Vertical plane through direction $\mathbf{u}$: Slice the surface with a vertical plane containing direction $\mathbf{u}$
Intersection curve: This creates a curve on the surface
Directional derivative: The slope of this curve at the point

Physical Analogy:

You're standing on a hill at point $(a, b)$ with elevation $f(a, b)$:

$\frac{\partial f}{\partial x}$: Slope if you walk due east
$\frac{\partial f}{\partial y}$: Slope if you walk due north
$D_{\mathbf{u}}f$: Slope if you walk in direction $\mathbf{u}$
$ abla f$: Direction of steepest uphill path
$- abla f$: Direction water would flow (steepest downhill)

Level Sets and Zero Directional Derivative

If you move tangent to a level set (perpendicular to gradient), the directional derivative is zero—you're moving along a contour where $f$ is constant. This is why gradient is perpendicular to level sets.

Applications in Machine Learning

Why Directional Derivatives Matter for ML:

Understanding Gradient Descent: The update $-\eta abla f$ is precisely the direction of maximum decrease
Momentum: Uses exponential average of gradients, creating a direction different from the current gradient
Natural Gradient: Uses Fisher information to define a Riemannian metric, changing what 'direction' means
Hessian-Free Optimization: Computes products $Hv$ (directional derivative of gradient) without explicit Hessian

Key Insight

•Gradient descent doesn't find custom direction
•It uses the provably optimal direction
•No direction decreases $f$ faster than $- abla f$

Advanced Methods

•Conjugate gradient: orthogonal directions
•Newton's method: adjusts for curvature
•Adam: adapts direction per-parameter

Summary: Rates of Change in Any Direction

Key Takeaways

•Directional derivative measures change in any direction — generalizes partial derivatives
•Formula: $D_{\mathbf{u}}f = abla f \cdot \mathbf{u}$ — elegant gradient connection
•Maximum in gradient direction — $| abla f|$ is the maximum rate of change
•Zero along level sets — perpendicular to gradient
•Proves gradient descent optimality — $- abla f$ is provably the best local direction

Directional Derivatives Mastered

You now understand rates of change in arbitrary directions. Next, we'll explore the Hessian matrix—second-order derivatives that capture curvature and enable faster, smarter optimization.