Constrained Optimization - Learning Module

Loading content...

0/278

Karush-Kuhn-Tucker (KKT) Conditions

Beyond Equality Constraints

Lagrange multipliers elegantly handle equality constraints like $g(\mathbf{x}) = 0$. But most real-world optimization problems involve inequalities: budgets that cannot be exceeded ($\text{cost} \leq \text{budget}$), physical limits that must be respected ($\text{pressure} \leq \text{max}$), or mathematical constraints that define feasibility ($y_i(\mathbf{w}^T\mathbf{x}_i + b) \geq 1$ in SVMs).

The Karush-Kuhn-Tucker (KKT) conditions generalize Lagrange multipliers to this broader setting. Named after Harold Kuhn and Albert Tucker (with earlier work by William Karush), these conditions form the theoretical backbone of modern constrained optimization—from interior point methods to the mathematical foundations of Support Vector Machines.

What You Will Learn

By the end of this page, you will understand KKT conditions completely: what they are, why they're necessary for optimality, how to apply them, and what complementary slackness means. You'll see how these conditions directly connect to the sparsity of support vectors in SVMs and the geometry of constrained optimization.

The General Constrained Optimization Problem

The general nonlinear programming problem can be written as:

$$\min_{\mathbf{x}} f(\mathbf{x})$$ $$\text{subject to:}$$ $$g_i(\mathbf{x}) \leq 0, \quad i = 1, \ldots, m \quad \text{(inequality constraints)}$$ $$h_j(\mathbf{x}) = 0, \quad j = 1, \ldots, p \quad \text{(equality constraints)}$$

where:

$f: \mathbb{R}^n \to \mathbb{R}$ is the objective function
$g_i: \mathbb{R}^n \to \mathbb{R}$ are inequality constraint functions
$h_j: \mathbb{R}^n \to \mathbb{R}$ are equality constraint functions

Why $g_i(\mathbf{x}) \leq 0$ Convention?

This form is standard. Any constraint can be rewritten to fit:

$g(\mathbf{x}) \geq c$ becomes $-(g(\mathbf{x}) - c) \leq 0$
$a \leq g(\mathbf{x}) \leq b$ becomes two constraints: $-g(\mathbf{x}) + a \leq 0$ and $g(\mathbf{x}) - b \leq 0$

Active vs. Inactive Constraints

An inequality constraint $g_i(\mathbf{x}) \leq 0$ is active (or binding) at point $\mathbf{x}^$ if $g_i(\mathbf{x}^) = 0$. It's inactive (or slack) if $g_i(\mathbf{x}^*) < 0$. This distinction is crucial: active constraints restrict the solution; inactive constraints don't affect it at all.

Visualizing Active and Inactive ConstraintsConsider minimizing f(x,y) = x + y subject to x² + y² ≤ 1 and x ≥ 0.

Input

Constraint 1: x² + y² - 1 ≤ 0 (unit circle), Constraint 2: -x ≤ 0 (right half-plane)

Output

Optimal point: (-1/√2, -1/√2) if unconstrained, but with x ≥ 0, optimal is (0, -1)

Explanation

At (0, -1): the circle constraint has g₁ = 0 + 1 - 1 = 0 (active), and the x ≥ 0 constraint has g₂ = -0 = 0 (also active). Both constraints are binding at the solution.

The Generalized Lagrangian

For the general problem with both inequality and equality constraints, we define the generalized Lagrangian:

$$\mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta}) = f(\mathbf{x}) + \sum_{i=1}^{m} \alpha_i g_i(\mathbf{x}) + \sum_{j=1}^{p} \beta_j h_j(\mathbf{x})$$

where:

$\alpha_i \geq 0$ are KKT multipliers for inequality constraints
$\beta_j$ are Lagrange multipliers for equality constraints (unrestricted in sign)

Critical Difference from Pure Lagrangian:

For inequality constraints, the multipliers $\alpha_i$ must be non-negative. This sign restriction captures the one-sided nature of inequalities and leads to complementary slackness.

Intuition for Sign Restriction

With an equality $h = 0$, the constraint 'pushes back' symmetrically in both directions, so β can be positive or negative. With an inequality $g ≤ 0$, the constraint only 'pushes' when you hit the boundary, and always pushes inward—hence α ≥ 0. If α were negative, the constraint would 'pull' you out of the feasible region, violating the inequality.

The Lagrangian Dual Function:

Define the dual function as:

$$d(\boldsymbol{\alpha}, \boldsymbol{\beta}) = \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta})$$

This function is obtained by minimizing the Lagrangian over $\mathbf{x}$ for fixed multipliers. The dual function provides lower bounds on the optimal value of the original (primal) problem when $\boldsymbol{\alpha} \geq 0$.

This duality perspective—optimizing over multipliers rather than original variables—is fundamental to understanding SVMs and will be developed fully in later pages.

The KKT Conditions: Necessary Conditions for Optimality

At a local minimum $\mathbf{x}^$ (subject to constraint qualification), there exist multipliers $\boldsymbol{\alpha}^ \geq 0$ and $\boldsymbol{\beta}^*$ such that:

1. Stationarity: $$ abla_\mathbf{x} \mathcal{L}(\mathbf{x}^, \boldsymbol{\alpha}^, \boldsymbol{\beta}^) = 0$$ $$\Leftrightarrow \quad abla f(\mathbf{x}^) + \sum_{i=1}^{m} \alpha_i^* abla g_i(\mathbf{x}^) + \sum_{j=1}^{p} \beta_j^ abla h_j(\mathbf{x}^*) = 0$$

2. Primal Feasibility: $$g_i(\mathbf{x}^) \leq 0 \quad \forall i = 1, \ldots, m$$ $$h_j(\mathbf{x}^) = 0 \quad \forall j = 1, \ldots, p$$

3. Dual Feasibility: $$\alpha_i^* \geq 0 \quad \forall i = 1, \ldots, m$$

4. Complementary Slackness: $$\alpha_i^* g_i(\mathbf{x}^*) = 0 \quad \forall i = 1, \ldots, m$$

The Four KKT Conditions Explained
Condition	Mathematical Form	Interpretation
Stationarity	$ abla_{\mathbf{x}} \mathcal{L} = 0$	Gradient of Lagrangian vanishes at optimum
Primal Feasibility	$g_i \leq 0, h_j = 0$	Solution satisfies all constraints
Dual Feasibility	$\alpha_i \geq 0$	Inequality multipliers are non-negative
Complementary Slackness	$\alpha_i g_i = 0$	Either constraint is active OR multiplier is zero

KKT Conditions are Necessary, Not Always Sufficient

A point satisfying KKT is a candidate for optimality, not necessarily an actual optimum (could be a saddle point). For convex problems (convex f, convex g's, affine h's), KKT conditions become both necessary AND sufficient—this is why convexity is so powerful.

Complementary Slackness: The Heart of KKT

Complementary slackness is perhaps the most subtle and important of the KKT conditions. It states:

$$\alpha_i^* g_i(\mathbf{x}^*) = 0 \quad \text{for each } i$$

Since $\alpha_i \geq 0$ and $g_i \leq 0$, this product can only be zero. But how it's zero reveals crucial information:

Case 1: Constraint is inactive ($g_i(\mathbf{x}^*) < 0$)

The constraint has "slack" — we're strictly inside the feasible region
Therefore $\alpha_i^* = 0$ (the multiplier must be zero)
Meaning: This constraint doesn't affect the solution

Case 2: Constraint is active ($g_i(\mathbf{x}^*) = 0$)

We're on the boundary of feasibility
$\alpha_i^*$ can be positive (constraint is "pushing")
Meaning: This constraint actively restricts the solution

Why 'Complementary'?

The name comes from the complementary relationship: either the constraint is active (g = 0) or the multiplier is zero (α = 0), but never both non-zero. They 'complement' each other like binary switches—when one is on, the other is off. We can also have both zero, but having both non-zero is impossible.

If g(x*) < 0 (Inactive)

•Constraint has 'slack'
•Solution is in the interior
•α* = 0 (multiplier vanishes)
•Constraint doesn't affect optimum
•Could remove constraint without changing solution

If g(x*) = 0 (Active)

•Constraint is 'binding'
•Solution is on the boundary
•α* ≥ 0 (may be positive)
•Constraint actively restricts optimum
•Relaxing constraint would improve objective

SVM Connection:

In Support Vector Machines, complementary slackness explains support vectors:

Data points far from the decision boundary have $\alpha_i = 0$ — they don't influence the hyperplane
Data points exactly on the margin have $\alpha_i > 0$ — these are the support vectors
The SVM solution depends only on support vectors, making it sparse and interpretable

This sparsity isn't an algorithmic trick—it's a direct consequence of KKT conditions!

Geometric Interpretation of KKT

The stationarity condition has a beautiful geometric meaning. Let's visualize it.

At a Constrained Minimum:

The stationarity condition says:

$$- abla f(\mathbf{x}^) = \sum_i \alpha_i^ abla g_i(\mathbf{x}^) + \sum_j \beta_j^ abla h_j(\mathbf{x}^*)$$

Interpretation:

$- abla f$ is the direction of steepest descent (the direction we'd like to move to decrease $f$)
$ abla g_i$ are outward normals to the inequality constraint boundaries
The equation says: the descent direction must be expressible as a combination of constraint normals

Why This Must Be True:

If $- abla f$ had any component along the constraint surface (tangent to active constraints), we could move in that direction, stay feasible, and decrease $f$—contradicting optimality.

So at the optimum, every direction that decreases $f$ must violate some constraint. The descent direction is "blocked" by the active constraints.

The Cone of Feasible Directions

The active constraint gradients define a 'cone' of forbidden directions. At a constrained minimum, the negative gradient of the objective must lie inside this cone. If it pointed outside, there would be a feasible descent direction, and we wouldn't be at a minimum.

Comparison with Unconstrained Optimization:

Unconstrained Minimum	Constrained Minimum (KKT)
$
abla f = 0$	$
abla f = -\sum \alpha_i
abla g_i - \sum \beta_j
abla h_j$
Gradient vanishes	Gradient balanced by constraint forces
Stationary in all directions	Stationary only in feasible directions
No constraint "pushback"	Active constraints "push" against descent

The key insight: at a constrained optimum, you're not at a critical point of $f$ — rather, the gradient of $f$ is exactly balanced by the "forces" from active constraints.

Solving Problems Using KKT Conditions

Let's develop a systematic approach to solving constrained optimization problems using KKT conditions.

General Strategy:

Write the Lagrangian with multipliers $\alpha_i \geq 0$ for inequalities and $\beta_j$ for equalities
Write all KKT conditions:
- Stationarity: $ abla_{\mathbf{x}} \mathcal{L} = 0$
- Primal feasibility: $g_i \leq 0$, $h_j = 0$
- Dual feasibility: $\alpha_i \geq 0$
- Complementary slackness: $\alpha_i g_i = 0$
Case analysis on complementary slackness: For each inequality constraint, consider both cases ($\alpha_i = 0$ or $g_i = 0$) and solve the resulting system
Check solutions: Verify primal and dual feasibility, discard invalid cases
Compare objectives: Evaluate $f$ at all valid KKT points to find the global optimum

Complete KKT ExampleMinimize f(x,y) = x² + y² subject to x + y ≥ 1

Input

Rewrite constraint as g(x,y) = 1 - x - y ≤ 0

Output

Optimal solution: x = y = 1/2 with α = 1

Explanation

We'll solve this step-by-step in the code below, showing how complementary slackness identifies whether the constraint is active.

kkt_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from scipy.optimize import minimize
 
def kkt_worked_example():
    """
    Minimize f(x,y) = x² + y² subject to x + y ≥ 1
    
    Standard form: min x² + y² s.t. g(x,y) = 1 - x - y ≤ 0
    
    Lagrangian: L(x,y,α) = x² + y² + α(1 - x - y)
    
    KKT Conditions:
    1. Stationarity:
       ∂L/∂x = 2x - α = 0  →  x = α/2
       ∂L/∂y = 2y - α = 0  →  y = α/2
    
    2. Primal Feasibility: 1 - x - y ≤ 0  →  x + y ≥ 1
    
    3. Dual Feasibility: α ≥ 0
    
    4. Complementary Slackness: α(1 - x - y) = 0
    
    Case Analysis:
    
    Case 1: α = 0
       Then x = y = 0
       Check primal: 0 + 0 = 0 ≥ 1? NO! 
       This case violates primal feasibility. INVALID.
    
    Case 2: α > 0 (complementary slackness forces g = 0)
       Then 1 - x - y = 0  →  x + y = 1
       From stationarity: x = y = α/2
       So: α/2 + α/2 = 1  →  α = 1
       Thus: x = y = 1/2, α = 1
       Check all conditions:
       - Stationarity: 2(1/2) - 1 = 0 ✓
       - Primal: 1/2 + 1/2 = 1 ✓
       - Dual: α = 1 ≥ 0 ✓
       - Comp. Slack.: 1 · 0 = 0 ✓
       ALL CONDITIONS SATISFIED!
    """
    
    print("KKT Worked Example")
    print("=" * 60)
    print("Problem: min x² + y² subject to x + y ≥ 1")
    print()
    
    # Analytical solution from KKT
    x_opt, y_opt = 0.5, 0.5
    alpha_opt = 1.0
    f_opt = x_opt**2 + y_opt**2
    
    print("Analytical Solution (KKT):")
    print(f"  Optimal point: ({x_opt}, {y_opt})")
    print(f"  Optimal value: f* = {f_opt}")
    print(f"  KKT multiplier: α* = {alpha_opt}")
    print()
    
    # Verify KKT conditions
    print("KKT Condition Verification:")
    print(f"  1. Stationarity: 2x - α = 2({x_opt}) - {alpha_opt} = {2*x_opt - alpha_opt}")
    print(f"                   2y - α = 2({y_opt}) - {alpha_opt} = {2*y_opt - alpha_opt}")
    print(f"  2. Primal Feas: x + y = {x_opt + y_opt} ≥ 1 ✓")
    print(f"  3. Dual Feas:   α = {alpha_opt} ≥ 0 ✓")
    constraint_value = 1 - x_opt - y_opt
    print(f"  4. Comp. Slack: α·g = {alpha_opt}·{constraint_value} = {alpha_opt * constraint_value} ✓")
    print()
    
    # Numerical verification
    def objective(xy):
        return xy[0]**2 + xy[1]**2
    
    def constraint(xy):
        return xy[0] + xy[1] - 1  # x + y >= 1
    
    result = minimize(
        objective,
        x0=[0, 0],
        constraints={'type': 'ineq', 'fun': constraint},
        method='SLSQP'
    )
    
    print("Numerical Verification (scipy):")
    print(f"  Optimal point: ({result.x[0]:.6f}, {result.x[1]:.6f})")
    print(f"  Optimal value: {result.fun:.6f}")
    
    return x_opt, y_opt, alpha_opt
 
kkt_worked_example()

Constraint Qualification Revisited

For KKT conditions to be necessary at a local minimum, a constraint qualification must hold. Several constraint qualifications exist, each with different strengths:

Linear Independence Constraint Qualification (LICQ):

The gradients of all active constraints are linearly independent at $\mathbf{x}^*$.

Slater's Condition (for convex problems):

There exists a strictly feasible point $\bar{\mathbf{x}}$ such that:

$g_i(\bar{\mathbf{x}}) < 0$ for all inequality constraints (strict inequality)
$h_j(\bar{\mathbf{x}}) = 0$ for all equality constraints

Slater's condition is particularly important because it's easy to verify and commonly satisfied in practice.

Common Constraint Qualifications
CQ Name	Requirement	When Used
LICQ	Active constraint gradients linearly independent	Most general; ensures unique multipliers
MFCQ	Constraint gradients span feasible directions	Weaker than LICQ; allows non-unique multipliers
Slater	Strictly feasible interior point exists	Convex problems; key for strong duality
Linearity	All constraints are affine	LP and QP; always satisfied if feasible

For Most ML Problems, CQ Holds

In machine learning, constraints are often linear (e.g., sum-to-one, budget constraints) or nice convex functions (e.g., norm bounds). These almost always satisfy Slater's condition—a strictly interior point exists. So for practical ML optimization, you can usually trust that KKT conditions are both necessary and identify the global optimum.

KKT and Convex Optimization

The power of KKT conditions is amplified dramatically for convex optimization problems.

Definition: Convex Optimization Problem

A problem is convex if:

$f(\mathbf{x})$ is a convex function
$g_i(\mathbf{x})$ are convex functions (so $g_i \leq 0$ defines a convex region)
$h_j(\mathbf{x})$ are affine (linear) functions

The Fundamental Theorem:

For convex problems satisfying Slater's condition:

KKT conditions are both necessary AND sufficient for global optimality.

This means:

If $(\mathbf{x}^, \boldsymbol{\alpha}^, \boldsymbol{\beta}^)$ satisfies KKT, then $\mathbf{x}^$ is a global minimum
If $\mathbf{x}^*$ is a global minimum, then KKT-satisfying multipliers exist

Why This Matters:

For convex problems, finding a KKT point = solving the optimization problem. There's no need to check second-order conditions or compare multiple local minima. This is what makes convex optimization so tractable.

Many ML Problems Are Convex

Linear regression, logistic regression, SVMs (both primal and dual), regularized least squares, and many other core ML methods are convex optimization problems. This convexity guarantees that gradient-based methods find global optima, and KKT analysis reveals the structure of solutions.

Strong Duality and KKT:

For convex problems with Slater's condition:

$$f(\mathbf{x}^) = \mathcal{L}(\mathbf{x}^, \boldsymbol{\alpha}^, \boldsymbol{\beta}^) = d(\boldsymbol{\alpha}^, \boldsymbol{\beta}^)$$

where $d$ is the dual function. The primal optimal value equals the dual optimal value—there's no "duality gap." This is called strong duality, and it enables us to solve the dual problem instead of the primal, often with significant computational benefits.

Summary: The KKT Conditions

We've developed the complete KKT framework for handling inequality constraints. Let's consolidate:

Key Takeaways

•KKT generalizes Lagrange multipliers — The framework handles both equality ($h = 0$) and inequality ($g \leq 0$) constraints through appropriate multiplier conventions.
•Four conditions define optimality — Stationarity, primal feasibility, dual feasibility, and complementary slackness together characterize constrained optima.
•Complementary slackness is crucial — It explains sparsity in SVMs (support vectors), identifies active constraints, and connects to sensitivity analysis.
•Geometric interpretation: descent blocked — At optimum, the descent direction lies in the cone spanned by active constraint gradients.
•Convexity elevates KKT to sufficiency — For convex problems, KKT conditions completely characterize global optima, making them solvable.
•Constraint qualification is required — LICQ or Slater's condition ensures KKT conditions are necessary; most ML problems satisfy these.

What's Next:

KKT conditions establish when a point is optimal. But they also hint at a profound perspective: instead of optimizing over $\mathbf{x}$, what if we optimize over the multipliers $\boldsymbol{\alpha}, \boldsymbol{\beta}$? This leads to duality theory—the subject of our next page. Duality reveals hidden structure in optimization problems and is the key to understanding why SVMs are so elegant.

Page Complete

You now understand the KKT conditions—the generalization of Lagrange multipliers to inequality-constrained problems. You can formulate KKT conditions, solve problems using case analysis on complementary slackness, and appreciate their special power for convex optimization. Next, we explore the dual perspective that makes this framework even more powerful.

Karush-Kuhn-Tucker (KKT) Conditions

Beyond Equality Constraints

What You Will Learn

The General Constrained Optimization Problem

The general nonlinear programming problem can be written as:

where:

$f: \mathbb{R}^n \to \mathbb{R}$ is the objective function
$g_i: \mathbb{R}^n \to \mathbb{R}$ are inequality constraint functions
$h_j: \mathbb{R}^n \to \mathbb{R}$ are equality constraint functions

Why $g_i(\mathbf{x}) \leq 0$ Convention?

This form is standard. Any constraint can be rewritten to fit:

$g(\mathbf{x}) \geq c$ becomes $-(g(\mathbf{x}) - c) \leq 0$
$a \leq g(\mathbf{x}) \leq b$ becomes two constraints: $-g(\mathbf{x}) + a \leq 0$ and $g(\mathbf{x}) - b \leq 0$

Active vs. Inactive Constraints

Visualizing Active and Inactive ConstraintsConsider minimizing f(x,y) = x + y subject to x² + y² ≤ 1 and x ≥ 0.

Input

Constraint 1: x² + y² - 1 ≤ 0 (unit circle), Constraint 2: -x ≤ 0 (right half-plane)

Output

Optimal point: (-1/√2, -1/√2) if unconstrained, but with x ≥ 0, optimal is (0, -1)

Explanation

At (0, -1): the circle constraint has g₁ = 0 + 1 - 1 = 0 (active), and the x ≥ 0 constraint has g₂ = -0 = 0 (also active). Both constraints are binding at the solution.

The Generalized Lagrangian

For the general problem with both inequality and equality constraints, we define the generalized Lagrangian:

$$\mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta}) = f(\mathbf{x}) + \sum_{i=1}^{m} \alpha_i g_i(\mathbf{x}) + \sum_{j=1}^{p} \beta_j h_j(\mathbf{x})$$

where:

$\alpha_i \geq 0$ are KKT multipliers for inequality constraints
$\beta_j$ are Lagrange multipliers for equality constraints (unrestricted in sign)

Critical Difference from Pure Lagrangian:

For inequality constraints, the multipliers $\alpha_i$ must be non-negative. This sign restriction captures the one-sided nature of inequalities and leads to complementary slackness.

Intuition for Sign Restriction

The Lagrangian Dual Function:

Define the dual function as:

$$d(\boldsymbol{\alpha}, \boldsymbol{\beta}) = \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \boldsymbol{\alpha}, \boldsymbol{\beta})$$

This duality perspective—optimizing over multipliers rather than original variables—is fundamental to understanding SVMs and will be developed fully in later pages.

The KKT Conditions: Necessary Conditions for Optimality

At a local minimum $\mathbf{x}^$ (subject to constraint qualification), there exist multipliers $\boldsymbol{\alpha}^ \geq 0$ and $\boldsymbol{\beta}^*$ such that:

2. Primal Feasibility: $$g_i(\mathbf{x}^) \leq 0 \quad \forall i = 1, \ldots, m$$ $$h_j(\mathbf{x}^) = 0 \quad \forall j = 1, \ldots, p$$

3. Dual Feasibility: $$\alpha_i^* \geq 0 \quad \forall i = 1, \ldots, m$$

4. Complementary Slackness: $$\alpha_i^* g_i(\mathbf{x}^*) = 0 \quad \forall i = 1, \ldots, m$$

The Four KKT Conditions Explained
Condition	Mathematical Form	Interpretation
Stationarity	$ abla_{\mathbf{x}} \mathcal{L} = 0$	Gradient of Lagrangian vanishes at optimum
Primal Feasibility	$g_i \leq 0, h_j = 0$	Solution satisfies all constraints
Dual Feasibility	$\alpha_i \geq 0$	Inequality multipliers are non-negative
Complementary Slackness	$\alpha_i g_i = 0$	Either constraint is active OR multiplier is zero

KKT Conditions are Necessary, Not Always Sufficient

Complementary Slackness: The Heart of KKT

Complementary slackness is perhaps the most subtle and important of the KKT conditions. It states:

$$\alpha_i^* g_i(\mathbf{x}^*) = 0 \quad \text{for each } i$$

Since $\alpha_i \geq 0$ and $g_i \leq 0$, this product can only be zero. But how it's zero reveals crucial information:

Case 1: Constraint is inactive ($g_i(\mathbf{x}^*) < 0$)

The constraint has "slack" — we're strictly inside the feasible region
Therefore $\alpha_i^* = 0$ (the multiplier must be zero)
Meaning: This constraint doesn't affect the solution

Case 2: Constraint is active ($g_i(\mathbf{x}^*) = 0$)

We're on the boundary of feasibility
$\alpha_i^*$ can be positive (constraint is "pushing")
Meaning: This constraint actively restricts the solution

Why 'Complementary'?

If g(x*) < 0 (Inactive)

•Constraint has 'slack'
•Solution is in the interior
•α* = 0 (multiplier vanishes)
•Constraint doesn't affect optimum
•Could remove constraint without changing solution

If g(x*) = 0 (Active)

•Constraint is 'binding'
•Solution is on the boundary
•α* ≥ 0 (may be positive)
•Constraint actively restricts optimum
•Relaxing constraint would improve objective

SVM Connection:

In Support Vector Machines, complementary slackness explains support vectors:

Data points far from the decision boundary have $\alpha_i = 0$ — they don't influence the hyperplane
Data points exactly on the margin have $\alpha_i > 0$ — these are the support vectors
The SVM solution depends only on support vectors, making it sparse and interpretable

This sparsity isn't an algorithmic trick—it's a direct consequence of KKT conditions!

Geometric Interpretation of KKT

The stationarity condition has a beautiful geometric meaning. Let's visualize it.

At a Constrained Minimum:

The stationarity condition says:

$$- abla f(\mathbf{x}^) = \sum_i \alpha_i^ abla g_i(\mathbf{x}^) + \sum_j \beta_j^ abla h_j(\mathbf{x}^*)$$

Interpretation:

$- abla f$ is the direction of steepest descent (the direction we'd like to move to decrease $f$)
$ abla g_i$ are outward normals to the inequality constraint boundaries
The equation says: the descent direction must be expressible as a combination of constraint normals

Why This Must Be True:

If $- abla f$ had any component along the constraint surface (tangent to active constraints), we could move in that direction, stay feasible, and decrease $f$—contradicting optimality.

So at the optimum, every direction that decreases $f$ must violate some constraint. The descent direction is "blocked" by the active constraints.

The Cone of Feasible Directions

Comparison with Unconstrained Optimization:

Unconstrained Minimum	Constrained Minimum (KKT)
$
abla f = 0$	$
abla f = -\sum \alpha_i
abla g_i - \sum \beta_j
abla h_j$
Gradient vanishes	Gradient balanced by constraint forces
Stationary in all directions	Stationary only in feasible directions
No constraint "pushback"	Active constraints "push" against descent

The key insight: at a constrained optimum, you're not at a critical point of $f$ — rather, the gradient of $f$ is exactly balanced by the "forces" from active constraints.

Solving Problems Using KKT Conditions

Let's develop a systematic approach to solving constrained optimization problems using KKT conditions.

General Strategy:

Write the Lagrangian with multipliers $\alpha_i \geq 0$ for inequalities and $\beta_j$ for equalities
Write all KKT conditions:
- Stationarity: $ abla_{\mathbf{x}} \mathcal{L} = 0$
- Primal feasibility: $g_i \leq 0$, $h_j = 0$
- Dual feasibility: $\alpha_i \geq 0$
- Complementary slackness: $\alpha_i g_i = 0$
Case analysis on complementary slackness: For each inequality constraint, consider both cases ($\alpha_i = 0$ or $g_i = 0$) and solve the resulting system
Check solutions: Verify primal and dual feasibility, discard invalid cases
Compare objectives: Evaluate $f$ at all valid KKT points to find the global optimum

Complete KKT ExampleMinimize f(x,y) = x² + y² subject to x + y ≥ 1

Input

Rewrite constraint as g(x,y) = 1 - x - y ≤ 0

Output

Optimal solution: x = y = 1/2 with α = 1

Explanation

We'll solve this step-by-step in the code below, showing how complementary slackness identifies whether the constraint is active.

kkt_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from scipy.optimize import minimize
 
def kkt_worked_example():
    """
    Minimize f(x,y) = x² + y² subject to x + y ≥ 1
    
    Standard form: min x² + y² s.t. g(x,y) = 1 - x - y ≤ 0
    
    Lagrangian: L(x,y,α) = x² + y² + α(1 - x - y)
    
    KKT Conditions:
    1. Stationarity:
       ∂L/∂x = 2x - α = 0  →  x = α/2
       ∂L/∂y = 2y - α = 0  →  y = α/2
    
    2. Primal Feasibility: 1 - x - y ≤ 0  →  x + y ≥ 1
    
    3. Dual Feasibility: α ≥ 0
    
    4. Complementary Slackness: α(1 - x - y) = 0
    
    Case Analysis:
    
    Case 1: α = 0
       Then x = y = 0
       Check primal: 0 + 0 = 0 ≥ 1? NO! 
       This case violates primal feasibility. INVALID.
    
    Case 2: α > 0 (complementary slackness forces g = 0)
       Then 1 - x - y = 0  →  x + y = 1
       From stationarity: x = y = α/2
       So: α/2 + α/2 = 1  →  α = 1
       Thus: x = y = 1/2, α = 1
       Check all conditions:
       - Stationarity: 2(1/2) - 1 = 0 ✓
       - Primal: 1/2 + 1/2 = 1 ✓
       - Dual: α = 1 ≥ 0 ✓
       - Comp. Slack.: 1 · 0 = 0 ✓
       ALL CONDITIONS SATISFIED!
    """
    
    print("KKT Worked Example")
    print("=" * 60)
    print("Problem: min x² + y² subject to x + y ≥ 1")
    print()
    
    # Analytical solution from KKT
    x_opt, y_opt = 0.5, 0.5
    alpha_opt = 1.0
    f_opt = x_opt**2 + y_opt**2
    
    print("Analytical Solution (KKT):")
    print(f"  Optimal point: ({x_opt}, {y_opt})")
    print(f"  Optimal value: f* = {f_opt}")
    print(f"  KKT multiplier: α* = {alpha_opt}")
    print()
    
    # Verify KKT conditions
    print("KKT Condition Verification:")
    print(f"  1. Stationarity: 2x - α = 2({x_opt}) - {alpha_opt} = {2*x_opt - alpha_opt}")
    print(f"                   2y - α = 2({y_opt}) - {alpha_opt} = {2*y_opt - alpha_opt}")
    print(f"  2. Primal Feas: x + y = {x_opt + y_opt} ≥ 1 ✓")
    print(f"  3. Dual Feas:   α = {alpha_opt} ≥ 0 ✓")
    constraint_value = 1 - x_opt - y_opt
    print(f"  4. Comp. Slack: α·g = {alpha_opt}·{constraint_value} = {alpha_opt * constraint_value} ✓")
    print()
    
    # Numerical verification
    def objective(xy):
        return xy[0]**2 + xy[1]**2
    
    def constraint(xy):
        return xy[0] + xy[1] - 1  # x + y >= 1
    
    result = minimize(
        objective,
        x0=[0, 0],
        constraints={'type': 'ineq', 'fun': constraint},
        method='SLSQP'
    )
    
    print("Numerical Verification (scipy):")
    print(f"  Optimal point: ({result.x[0]:.6f}, {result.x[1]:.6f})")
    print(f"  Optimal value: {result.fun:.6f}")
    
    return x_opt, y_opt, alpha_opt
 
kkt_worked_example()

Constraint Qualification Revisited

For KKT conditions to be necessary at a local minimum, a constraint qualification must hold. Several constraint qualifications exist, each with different strengths:

Linear Independence Constraint Qualification (LICQ):

The gradients of all active constraints are linearly independent at $\mathbf{x}^*$.

Slater's Condition (for convex problems):

There exists a strictly feasible point $\bar{\mathbf{x}}$ such that:

$g_i(\bar{\mathbf{x}}) < 0$ for all inequality constraints (strict inequality)
$h_j(\bar{\mathbf{x}}) = 0$ for all equality constraints

Slater's condition is particularly important because it's easy to verify and commonly satisfied in practice.

Common Constraint Qualifications
CQ Name	Requirement	When Used
LICQ	Active constraint gradients linearly independent	Most general; ensures unique multipliers
MFCQ	Constraint gradients span feasible directions	Weaker than LICQ; allows non-unique multipliers
Slater	Strictly feasible interior point exists	Convex problems; key for strong duality
Linearity	All constraints are affine	LP and QP; always satisfied if feasible

For Most ML Problems, CQ Holds

KKT and Convex Optimization

The power of KKT conditions is amplified dramatically for convex optimization problems.

Definition: Convex Optimization Problem

A problem is convex if:

$f(\mathbf{x})$ is a convex function
$g_i(\mathbf{x})$ are convex functions (so $g_i \leq 0$ defines a convex region)
$h_j(\mathbf{x})$ are affine (linear) functions

The Fundamental Theorem:

For convex problems satisfying Slater's condition:

KKT conditions are both necessary AND sufficient for global optimality.

This means:

If $(\mathbf{x}^, \boldsymbol{\alpha}^, \boldsymbol{\beta}^)$ satisfies KKT, then $\mathbf{x}^$ is a global minimum
If $\mathbf{x}^*$ is a global minimum, then KKT-satisfying multipliers exist

Why This Matters:

Many ML Problems Are Convex

Strong Duality and KKT:

For convex problems with Slater's condition:

$$f(\mathbf{x}^) = \mathcal{L}(\mathbf{x}^, \boldsymbol{\alpha}^, \boldsymbol{\beta}^) = d(\boldsymbol{\alpha}^, \boldsymbol{\beta}^)$$

Summary: The KKT Conditions

We've developed the complete KKT framework for handling inequality constraints. Let's consolidate:

Key Takeaways

•KKT generalizes Lagrange multipliers — The framework handles both equality ($h = 0$) and inequality ($g \leq 0$) constraints through appropriate multiplier conventions.
•Four conditions define optimality — Stationarity, primal feasibility, dual feasibility, and complementary slackness together characterize constrained optima.
•Complementary slackness is crucial — It explains sparsity in SVMs (support vectors), identifies active constraints, and connects to sensitivity analysis.
•Geometric interpretation: descent blocked — At optimum, the descent direction lies in the cone spanned by active constraint gradients.
•Convexity elevates KKT to sufficiency — For convex problems, KKT conditions completely characterize global optima, making them solvable.
•Constraint qualification is required — LICQ or Slater's condition ensures KKT conditions are necessary; most ML problems satisfy these.

What's Next:

Page Complete