Loading content...
Every constrained optimization problem has a hidden twin—a dual problem that approaches the same answer from a completely different direction. While the original (primal) problem minimizes the objective over the decision variables $\mathbf{x}$, the dual problem maximizes a related function over the Lagrange multipliers $\boldsymbol{\alpha}, \boldsymbol{\beta}$.
This isn't merely a mathematical curiosity. Duality reveals:
Duality is one of the most profound ideas in optimization, with applications ranging from economics to physics to machine learning.
By the end of this page, you will understand Lagrangian duality: how to construct dual problems, the meaning of weak and strong duality, when the duality gap is zero, and how to interpret dual solutions. This foundation is essential for understanding SVMs and many optimization algorithms.
Recall our constrained optimization problem:
$$\min_{\mathbf{x}} f(\mathbf{x}) \quad \text{subject to} \quad g_i(\mathbf{x}) \leq 0, ; h_j(\mathbf{x}) = 0$$
with Lagrangian:
$$\mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta}) = f(\mathbf{x}) + \sum_i \alpha_i g_i(\mathbf{x}) + \sum_j \beta_j h_j(\mathbf{x})$$
where $\boldsymbol{\alpha} \geq 0$.
The Dual Function:
Define the Lagrangian dual function (or just "dual function") as:
$$d(\boldsymbol{\alpha}, \boldsymbol{\beta}) = \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta})$$
This is obtained by minimizing the Lagrangian over $\mathbf{x}$ while treating $\boldsymbol{\alpha}, \boldsymbol{\beta}$ as fixed parameters.
Key Property: The infimum might be $-\infty$ for some choices of $(\boldsymbol{\alpha}, \boldsymbol{\beta})$. The domain of $d$ where it's finite is called the dual feasible region.
The idea is to 'hide' the complexity of x by computing the best x for each fixed choice of multipliers. What remains is a function purely of the multipliers, which captures the essential structure of the constrained problem through a different lens.
L(x, α) = x² + α(-x + 1) = x² - αx + α, with α ≥ 0d(α) = -α²/4 + α (after minimizing over x)Minimize over x: ∂L/∂x = 2x - α = 0 → x = α/2. Substitute back: d(α) = (α/2)² - α(α/2) + α = α²/4 - α²/2 + α = -α²/4 + α. The dual function is a concave quadratic in α.
Properties of the Dual Function:
Always concave — Even if the primal problem is non-convex, $d$ is concave (it's an infimum of affine functions of $(\boldsymbol{\alpha}, \boldsymbol{\beta})$)
Provides lower bounds — For any dual-feasible $(\boldsymbol{\alpha}, \boldsymbol{\beta})$ with $\boldsymbol{\alpha} \geq 0$, we have $d(\boldsymbol{\alpha}, \boldsymbol{\beta}) \leq p^$ where $p^$ is the primal optimal value
May be $-\infty$ — At points where the Lagrangian is unbounded below in $\mathbf{x}$
Theorem (Weak Duality):
For any primal-feasible $\mathbf{x}$ and any dual-feasible $(\boldsymbol{\alpha}, \boldsymbol{\beta})$ with $\boldsymbol{\alpha} \geq 0$:
$$d(\boldsymbol{\alpha}, \boldsymbol{\beta}) \leq f(\mathbf{x})$$
In particular, if $p^$ is the optimal primal value and $d^$ is the optimal dual value:
$$d^* \leq p^*$$
Proof:
Let $\mathbf{x}$ be primal-feasible: $g_i(\mathbf{x}) \leq 0$ and $h_j(\mathbf{x}) = 0$.
Since $\alpha_i \geq 0$ and $g_i(\mathbf{x}) \leq 0$, we have $\alpha_i g_i(\mathbf{x}) \leq 0$.
Since $h_j(\mathbf{x}) = 0$, we have $\beta_j h_j(\mathbf{x}) = 0$.
Therefore: $$\mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta}) = f(\mathbf{x}) + \underbrace{\sum_i \alpha_i g_i(\mathbf{x})}{\leq 0} + \underbrace{\sum_j \beta_j h_j(\mathbf{x})}{= 0} \leq f(\mathbf{x})$$
Taking the infimum over all $\mathbf{x}$ (not just feasible ones) only decreases the left side: $$d(\boldsymbol{\alpha}, \boldsymbol{\beta}) = \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta}) \leq \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta}) \leq f(\mathbf{x})$$
Weak duality requires no assumptions about convexity—it's a universal truth for Lagrangian duality. Every solution to the dual provides a lower bound on the primal optimal value. This is incredibly useful: if you find primal and dual solutions with equal objectives, you've proven both are optimal!
Applications of Weak Duality:
| Application | How Weak Duality Helps |
|---|---|
| Proving optimality | If $f(\mathbf{x}^) = d(\boldsymbol{\alpha}^, \boldsymbol{\beta}^*)$, both are optimal |
| Bounding search | Dual solutions bound how good primal can get |
| Early termination | If gap is small enough, stop optimization |
| Infeasibility detection | Dual unbounded → primal infeasible (under conditions) |
Since the dual function provides lower bounds, the best lower bound comes from maximizing it:
The Lagrangian Dual Problem:
$$\max_{\boldsymbol{\alpha}, \boldsymbol{\beta}} d(\boldsymbol{\alpha}, \boldsymbol{\beta}) \quad \text{subject to} \quad \boldsymbol{\alpha} \geq 0$$
Or equivalently:
$$\max_{\boldsymbol{\alpha} \geq 0, \boldsymbol{\beta}} \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta})$$
Key Observations:
The dual problem is a convex optimization problem (maximizing a concave function over a convex set), even when the primal is non-convex
The optimal value $d^$ of the dual is always $\leq p^$ by weak duality
The difference $p^* - d^*$ is called the duality gap
The primal is min over x of the objective. The dual is max over multipliers of (inf over x of Lagrangian). This 'max-inf' structure creates a game-theoretic interpretation: the dual player (controlling α, β) tries to make the Lagrangian as large as possible, while x is chosen to minimize it.
Primal-Dual Relationship:
| Primal Problem | Dual Problem |
|---|---|
| Minimize $f(\mathbf{x})$ | Maximize $d(\boldsymbol{\alpha}, \boldsymbol{\beta})$ |
| Subject to $g_i(\mathbf{x}) \leq 0$ | Subject to $\alpha_i \geq 0$ |
| Subject to $h_j(\mathbf{x}) = 0$ | $\beta_j$ unrestricted |
| Variables: $\mathbf{x}$ | Variables: $\boldsymbol{\alpha}, \boldsymbol{\beta}$ |
| May be non-convex | Always convex |
| Optimal value: $p^*$ | Optimal value: $d^* \leq p^*$ |
The dual offers a convex perspective on possibly non-convex problems. Even if we can't solve the primal directly, solving the dual gives us bounds.
The most desirable situation is when the duality gap vanishes:
Strong Duality:
$$p^* = d^*$$
When strong duality holds:
When Does Strong Duality Hold?
Slater's Condition (sufficient for strong duality):
If the primal problem is convex (convex $f$, convex $g_i$, affine $h_j$) and there exists a strictly feasible point $\bar{\mathbf{x}}$ with:
then strong duality holds.
Slater's condition just requires one interior point—not near the boundary, but strictly inside. For constraint sets with non-empty interior (most practical cases), this holds. The only way Slater fails is if the feasible region has no 'room inside,' which is pathological.
| Problem Type | Duality Gap | Conditions |
|---|---|---|
| General non-convex | May be positive | Only weak duality guaranteed |
| Convex + Slater | Zero | Strong duality |
| Linear Programming | Zero | Always (when feasible) |
| Quadratic Programming | Zero | Always (when feasible, convex) |
| SVM (convex) | Zero | Slater holds if data not degenerate |
The Saddle Point Interpretation:
When strong duality holds, the optimal primal and dual solutions form a saddle point of the Lagrangian:
$$\mathcal{L}(\mathbf{x}^, \boldsymbol{\alpha}, \boldsymbol{\beta}) \leq \mathcal{L}(\mathbf{x}^, \boldsymbol{\alpha}^, \boldsymbol{\beta}^) \leq \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}^, \boldsymbol{\beta}^)$$
for all feasible $\mathbf{x}$ and $(\boldsymbol{\alpha}, \boldsymbol{\beta})$ with $\boldsymbol{\alpha} \geq 0$.
This saddle point is simultaneously:
Let's work through the process of deriving and solving a dual problem.
Systematic Approach:
Write the primal in standard form: min $f$ s.t. $g_i \leq 0$, $h_j = 0$
Form the Lagrangian: $\mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta}) = f + \sum \alpha_i g_i + \sum \beta_j h_j$
Minimize over $\mathbf{x}$: Find $\inf_{\mathbf{x}} \mathcal{L}$ (may require taking gradients and solving)
The result is $d(\boldsymbol{\alpha}, \boldsymbol{\beta})$: Express in terms of multipliers only
The dual problem is: max $d$ s.t. $\boldsymbol{\alpha} \geq 0$
Solve the dual (often easier; always convex)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
import numpy as np def dual_problem_derivation(): """ Example: Derive and solve the dual of a simple QP Primal: min (1/2)x² + (1/2)y² s.t. x + y ≥ 1 (or equivalently: 1 - x - y ≤ 0) Step 1: Lagrangian L(x, y, α) = (1/2)x² + (1/2)y² + α(1 - x - y) where α ≥ 0 Step 2: Minimize over (x, y) ∂L/∂x = x - α = 0 → x = α ∂L/∂y = y - α = 0 → y = α Step 3: Substitute back to get dual function d(α) = (1/2)α² + (1/2)α² + α(1 - α - α) = α² + α(1 - 2α) = α² + α - 2α² = -α² + α Step 4: Dual problem max d(α) = -α² + α s.t. α ≥ 0 Step 5: Solve dual d'(α) = -2α + 1 = 0 → α* = 1/2 d''(α) = -2 < 0 (concave, so this is maximum) Step 6: Recover primal solution x* = y* = α* = 1/2 Verify: f(1/2, 1/2) = 1/4 + 1/4 = 1/2 d(1/2) = -(1/4) + 1/2 = 1/4? Wait, let me recompute... d(1/2) = -(1/2)² + 1/2 = -1/4 + 1/2 = 1/4 Hmm, there's a mismatch. Let me recheck the primal... f(1/2, 1/2) = (1/2)(1/4) + (1/2)(1/4) = 1/4 Strong duality: p* = d* = 1/4 ✓ """ print("Dual Problem Derivation Example") print("=" * 60) print() print("Primal: min (1/2)x² + (1/2)y² s.t. x + y ≥ 1") print() # Dual function: d(α) = -α² + α def dual_function(alpha): return -alpha**2 + alpha # Optimal α alpha_opt = 0.5 # From d'(α) = 0 # Primal solution from KKT x_opt = alpha_opt y_opt = alpha_opt # Optimal values primal_opt = 0.5 * x_opt**2 + 0.5 * y_opt**2 dual_opt = dual_function(alpha_opt) print(f"Dual function: d(α) = -α² + α") print(f"Optimal dual variable: α* = {alpha_opt}") print(f"Dual optimal value: d* = {dual_opt}") print() print(f"Primal solution: x* = y* = {x_opt}") print(f"Primal optimal value: p* = {primal_opt}") print() print(f"Duality gap: p* - d* = {primal_opt - dual_opt}") print(f"Strong duality holds: {np.isclose(primal_opt, dual_opt)}") print() # Verify constraint print(f"Constraint check: x* + y* = {x_opt + y_opt} ≥ 1 ✓") print(f"Complementary slackness: α*(1-x*-y*) = {alpha_opt * (1 - x_opt - y_opt)} ✓") return x_opt, y_opt, alpha_opt dual_problem_derivation()If the primal and dual have the same optimal value (under strong duality), why bother with the dual? Several compelling reasons:
1. Computational Advantages
Sometimes the dual has fewer variables or simpler structure:
2. The Kernel Trick Connection
For SVMs, the dual reveals that data only appears through inner products $\mathbf{x}_i^T \mathbf{x}_j$. This enables the kernel trick: replace inner products with kernel evaluations $k(\mathbf{x}_i, \mathbf{x}_j)$ to implicitly work in high-dimensional feature spaces.
3. Structural Insights
Dual variables have interpretations:
If the primal has n variables and m constraints, the dual has m variables and (often) n constraints. When m << n (many variables, few constraints), the dual is much smaller. This is exactly the situation in SVMs with many features but moderate data: the dual has one variable per data point.
| Situation | Prefer | Reason |
|---|---|---|
| Few variables, many constraints | Primal | Fewer variables in primal |
| Many variables, few constraints | Dual | Fewer variables in dual |
| Non-convex primal | Dual | Dual is always convex (gives bounds) |
| Need kernel trick | Dual | Data appears as inner products |
| Interior point method | Both | Naturally computes both |
If we solve the dual, how do we find the primal solution?
Method 1: Through the Lagrangian
If $\boldsymbol{\alpha}^, \boldsymbol{\beta}^$ is the dual optimal solution, the primal solution $\mathbf{x}^$ minimizes $\mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}^, \boldsymbol{\beta}^*)$.
For many problems (especially quadratic objectives), this minimizer can be expressed in closed form:
$$\mathbf{x}^* = \arg\min_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}^, \boldsymbol{\beta}^)$$
Method 2: KKT Stationarity
From $\nabla_{\mathbf{x}} \mathcal{L} = 0$, solve for $\mathbf{x}$ in terms of $(\boldsymbol{\alpha}^, \boldsymbol{\beta}^)$.
Example (SVM):
In SVMs, the optimal weight vector is recovered as: $$\mathbf{w}^* = \sum_i \alpha_i^* y_i \mathbf{x}_i$$
Only support vectors ($\alpha_i > 0$) contribute to $\mathbf{w}^*$.
When strong duality holds, primal optimal values are unique, but optimal solutions may not be. Multiple x* might achieve the same f*. The dual identifies the optimal value and (typically) constrains x* to a specific solution or set of solutions.
The Recovery Process:
Solve the dual to get $(\boldsymbol{\alpha}^, \boldsymbol{\beta}^)$
Substitute into stationarity condition: $\nabla_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}^, \boldsymbol{\beta}^) = 0$
Solve for $\mathbf{x}^*$ — often a linear system for quadratic objectives
Verify feasibility — ensure $\mathbf{x}^*$ satisfies primal constraints (should be automatic if done correctly)
This recovery process is central to algorithms like SMO (Sequential Minimal Optimization) for SVMs.
Duality has a beautiful geometric interpretation in terms of supporting hyperplanes.
The G-Graph:
Consider the set of achievable (constraint value, objective value) pairs:
$$\mathcal{G} = {(g(\mathbf{x}), f(\mathbf{x})) : \mathbf{x} \in \mathbb{R}^n} \subset \mathbb{R}^{m+1}$$
The primal asks: what's the lowest objective value ($f$) we can achieve when all constraint values ($g_i$) are non-positive?
Geometrically: project $\mathcal{G}$ onto the "feasible" region where $g_i \leq 0$, and find the minimum $f$ coordinate.
Dual as Supporting Hyperplane:
The dual finds a hyperplane (parameterized by $\boldsymbol{\alpha}$) that:
The intercept of this optimal supporting hyperplane equals the dual optimal value.
The duality gap corresponds to points of G that 'bulge below' the supporting hyperplane. For convex problems, G is convex, so supporting hyperplanes touch the boundary—no gap. For non-convex G, the hyperplane might not touch, creating a gap.
Convexity Closes the Gap:
When the primal is convex:
This geometric view makes strong duality intuitive: for convex problems, you can "separate" the feasible region from strictly better objectives using a hyperplane, and this hyperplane corresponds to the dual solution.
Duality provides a profound alternative view of optimization. Let's consolidate:
What's Next:
We've developed the theory of duality. The next page shows how to explicitly construct the dual problem for important cases, focusing on the dual problem formulation techniques that lead directly to practical algorithms like the SVM dual.
You now understand Lagrangian duality: the dual function, weak and strong duality, conditions for zero duality gap, and the geometric interpretation. This forms the theoretical foundation for understanding how dual formulations reveal problem structure and enable powerful algorithms like kernel SVMs.