Linear And Quadratic Discriminant Analysis - Learning Module

Loading content...

0/278

Decision Boundaries

The Geometry of Classification

At its core, classification partitions feature space into distinct regions—one for each class. The boundaries separating these regions are decision boundaries, and understanding their geometry is essential for interpreting classifier behavior, diagnosing problems, and building intuition about model assumptions.

In LDA and QDA, decision boundaries have precise mathematical forms: hyperplanes for LDA, quadric surfaces for QDA. These aren't arbitrary shapes but directly reflect the underlying Gaussian assumptions. A classifier's boundary reveals what it 'believes' about the data—where it's confident, where it's uncertain, and where assumptions may be violated.

This page develops a deep understanding of decision boundaries: their mathematical characterization, geometric properties, how to visualize and interpret them, and what they tell us about model behavior at different points in feature space.

What You Will Learn

By the end of this page, you will understand: the mathematical equations defining LDA and QDA boundaries, geometric properties of linear and quadratic boundaries, how priors shift boundaries, uncertainty near boundaries, visualization techniques, and how to interpret boundary shapes in relation to model assumptions.

Mathematical Characterization of Decision Boundaries

A decision boundary is the set of points where two or more classes have equal posterior probability. For a two-class problem, the boundary is:

$$\mathcal{B} = {x : P(Y = 1 | X = x) = P(Y = 2 | X = x)}$$

Equivalently, using discriminant functions:

$$\mathcal{B} = {x : \delta_1(x) = \delta_2(x)}$$

LDA boundary equation:

For LDA with shared covariance $\Sigma$:

$$\delta_1(x) - \delta_2(x) = (\mu_1 - \mu_2)^T\Sigma^{-1}x - \frac{1}{2}(\mu_1^T\Sigma^{-1}\mu_1 - \mu_2^T\Sigma^{-1}\mu_2) + \log\frac{\pi_1}{\pi_2} = 0$$

This can be written as:

$$w^T x + b = 0$$

where:

$w = \Sigma^{-1}(\mu_1 - \mu_2)$ is the normal vector to the hyperplane
$b = -\frac{1}{2}(\mu_1 + \mu_2)^T\Sigma^{-1}(\mu_1 - \mu_2) + \log\frac{\pi_1}{\pi_2}$ is the bias

This is a hyperplane in $\mathbb{R}^p$.

The Role of Covariance in LDA Boundaries

The normal vector $w = \Sigma^{-1}(\mu_1 - \mu_2)$ is not simply the direction between class means. The covariance inverse 'rotates and stretches' this direction based on feature correlations. In directions of high variance, the boundary rotates toward the higher-variance class; correlations affect the boundary angle.

QDA boundary equation:

For QDA with class-specific covariances $\Sigma_1, \Sigma_2$:

$$\delta_1(x) - \delta_2(x) = 0$$

Expanding:

$$-\frac{1}{2}x^T(\Sigma_1^{-1} - \Sigma_2^{-1})x + (\mu_1^T\Sigma_1^{-1} - \mu_2^T\Sigma_2^{-1})x + c = 0$$

where $c$ absorbs constant terms (means, determinants, priors).

This is a quadratic equation in $x$:

$$x^T A x + b^T x + c = 0$$

where:

$A = \frac{1}{2}(\Sigma_2^{-1} - \Sigma_1^{-1})$ is a symmetric matrix determining the quadratic form
$b = \Sigma_1^{-1}\mu_1 - \Sigma_2^{-1}\mu_2$ is the linear coefficient
$c$ is a scalar constant

This defines a quadric surface (conic section in 2D, quadric in higher dimensions).

Boundary Types in 2D Based on Matrix A's Properties
Eigenvalues of A	Boundary Type	Geometric Shape
Both positive or both negative	Ellipse	Closed curve, one class inside
One positive, one negative	Hyperbola	Two branches, open curve
One zero, one nonzero	Parabola	Single open curve
Both zero (A = 0)	Line (degenerate)	Reduces to LDA case

Geometric Properties of LDA Boundaries

LDA's linear boundaries have elegant geometric interpretations that provide deep insight into the classifier's behavior.

The boundary as a perpendicular bisector:

With equal priors ($\pi_1 = \pi_2$), the LDA boundary is the perpendicular bisector of the line segment connecting $\mu_1$ and $\mu_2$—but in Mahalanobis space, not Euclidean space.

In Mahalanobis space (after whitening by $\Sigma^{-1/2}$):

The boundary passes through the midpoint $\frac{1}{2}(\mu_1 + \mu_2)$
The boundary is perpendicular to the line connecting $\mu_1$ to $\mu_2$

In original space, this manifests as a hyperplane that may not appear perpendicular to ($\mu_1 - \mu_2$) if features are correlated.

Distance interpretation:

Classification is based on Mahalanobis distance to class means. For a point $x$:

$$d_M(x, \mu_k) = \sqrt{(x - \mu_k)^T\Sigma^{-1}(x - \mu_k)}$$

With equal priors, classification reduces to: assign to the class with smaller Mahalanobis distance. The boundary is the locus of points equidistant (in Mahalanobis terms) from both means.

Why Mahalanobis Distance Matters

Mahalanobis distance accounts for feature correlations and different variances. In Euclidean distance, an outlier in a high-variance direction seems far from both means. In Mahalanobis distance, such a point is appropriately recognized as still within the 'typical' range and classified based on mean proximity.

Effect of priors:

When priors are unequal, the boundary shifts:

If $\pi_1 > \pi_2$: boundary shifts toward class 2 (more of the space is assigned to the more common class 1)
The shift amount is proportional to $\log(\pi_1/\pi_2)$

Geometrically, the boundary no longer passes through the midpoint; it moves closer to the less common class. This implements Bayesian reasoning: without evidence from features, bet on the common class.

Multi-class boundaries:

With $K > 2$ classes:

There are $\binom{K}{2}$ pairwise boundaries (hyperplanes)
Each class's region is the intersection of half-spaces where that class beats all others
All boundaries pass through a common point only in special symmetric cases
The overall partition is a Voronoi diagram in Mahalanobis space

The decision regions are convex polytopes—a key property ensuring no 'islands' of one class inside another.

The discriminant direction:

For two classes, the most informative direction is $w = \Sigma^{-1}(\mu_1 - \mu_2)$. Projecting data onto this direction compresses the classification problem to 1D.

The projection $z = w^T x$ has the property that:

Class means are maximally separated relative to within-class variance
The 1D threshold classification is optimal (under LDA assumptions)

This is exactly Fisher's Linear Discriminant—the supervised projection that best separates classes.

Boundary margin:

Define the margin as the distance from a class mean to the boundary:

$$\text{margin}_k = \frac{|w^T\mu_k + b|}{|w|}$$

With equal priors, both classes have the same margin (the boundary is equidistant). The total 'gap' between class means projected onto the discriminant direction is:

$$\text{separation} = \frac{|w^T(\mu_1 - \mu_2)|}{|w|} = \frac{(\mu_1 - \mu_2)^T\Sigma^{-1}(\mu_1 - \mu_2)}{|\Sigma^{-1}(\mu_1 - \mu_2)|}$$

Larger separation means better class discrimination.

Geometric Properties of QDA Boundaries

QDA boundaries are more complex than LDA's hyperplanes, taking the form of quadric surfaces. Understanding these geometries helps interpret QDA behavior.

Quadric surfaces in 2D (conic sections):

In two dimensions, the QDA boundary $x^T A x + b^T x + c = 0$ is a conic section:

Ellipse (when $\det(A) > 0$, same-sign eigenvalues):
- One class is 'inside' the ellipse, one outside
- Occurs when one class has much smaller variance
- Example: detecting outliers, anomaly detection
Hyperbola (when $\det(A) < 0$, opposite-sign eigenvalues):
- Classes on opposite branches
- Occurs when classes have different orientations/correlations
- Example: classes spread in orthogonal directions
Parabola (when $\det(A) = 0$ but $A eq 0$):
- Asymmetric boundary
- Occurs in degenerate cases
Line (when $A \approx 0$):
- Degenerate quadric = hyperplane
- Occurs when $\Sigma_1 \approx \Sigma_2$ (QDA ≈ LDA)

Interpreting the A Matrix

The matrix $A = \frac{1}{2}(\Sigma_2^{-1} - \Sigma_1^{-1})$ captures how covariances differ. Its eigenvectors show the directions of maximum difference; its eigenvalues show the magnitude. Large eigenvalues mean strongly curved boundaries in those directions.

Quadric surfaces in higher dimensions:

In $p$ dimensions, the boundary $x^T A x + b^T x + c = 0$ is a quadric hypersurface:

Ellipsoid: All eigenvalues of $A$ have the same sign
Hyperboloid: Eigenvalues have mixed signs
Paraboloid: Some eigenvalues are zero
Cylinder: Embedded lower-dimensional quadric

Non-convexity and disconnected regions:

Unlike LDA, QDA decision regions can be:

Non-convex: A region can be 'waisted' or curved inward
Disconnected: A class can have multiple separate regions

This happens when one class's covariance is small in some directions—the boundary curves around the tight cluster, potentially isolating it or creating disjoint regions.

Effect of priors in QDA:

Priors shift the constant term $c$, which moves the entire quadric surface without changing its shape. For an elliptical boundary:

Higher prior for the 'inside' class: ellipse shrinks
Higher prior for the 'outside' class: ellipse expands

For hyperbolic boundaries, prior changes shift the balance between the two branches.

QDA Boundary Shapes and Their Interpretations
Shape	Data Characteristic	Classification Behavior
Ellipse (small inside)	One very compact class	Points inside belong to compact class
Ellipse (large inside)	One very diffuse class	Points inside belong to diffuse class
Hyperbola	Different correlation structures	Classes separated by curved front
Near-linear	Similar covariances	Behaves like LDA
Complex curved	Varied covariance differences	Captures nuanced structure

Confidence and Uncertainty Near Boundaries

The decision boundary is where classification is most uncertain—where posterior probabilities are exactly equal. Understanding how confidence varies near boundaries is crucial for reliability.

Posterior probabilities near the boundary:

The posterior probability for the two-class case follows a sigmoid curve as you move perpendicular to the boundary:

$$P(Y = 1 | X = x) = \frac{1}{1 + \exp(-(\delta_1(x) - \delta_2(x)))}$$

At the boundary: $P(Y = 1 | X) = 0.5$
Far from boundary (on class 1 side): $P(Y = 1 | X) \to 1$
Far from boundary (on class 2 side): $P(Y = 1 | X) \to 0$

The rate of transition from 0.5 to high confidence depends on class separation. Well-separated classes have sharp transitions; overlapping classes have gradual transitions.

Boundary Uncertainty ≠ Overall Uncertainty

A point near the boundary has low confidence between the two adjacent classes. But uncertainty can also arise far from boundaries—in regions of low density where the model has seen little data. LDA/QDA posterior probabilities don't account for this epistemic uncertainty; they assume the model is correct everywhere.

The confidence gradient:

Define the confidence score $s(x) = \max_k P(Y = k | X = x)$. The gradient of this score (in feature space) points away from boundaries:

Near boundaries: confidence is low (~$1/K$)
Far from all boundaries: confidence is high (~1)

For LDA, isoclines of constant confidence are parallel to the decision boundaries (since discriminants are linear). For QDA, isoclines are more complex curved surfaces.

Practical implications:

Prediction reliability: Points near boundaries should be flagged as uncertain
Active learning: Query labels for points near boundaries to most improve the model
Reject option: Abstain from prediction when $\max_k P(Y = k | X) < \tau$

Margin-based approaches:

Define the margin as the distance to the nearest decision boundary:

$$\text{margin}(x) = \min_{\text{boundaries } \mathcal{B}} d(x, \mathcal{B})$$

For LDA, this is easy to compute (perpendicular distance to hyperplane). For QDA, finding the closest point on a quadric is more involved.

Larger margins correlate with higher confidence and more reliable predictions.

Visualizing Decision Boundaries

Visualization brings abstract boundaries to life, revealing how classifiers partition feature space. Different techniques suit different purposes.

2D contour plots:

For two features:

Create a grid of points covering the feature space
Compute $\hat{y}$ or $P(Y = k | X)$ at each grid point
Plot filled contours or class labels

This shows the boundary as the interface between colored regions.

Decision surface plots (3D):

For two features showing a probability surface:

Compute $P(Y = 1 | X = (x_1, x_2))$ on a grid
Plot as a 3D surface with height = probability
The boundary is where the surface crosses $P = 0.5$

Projection methods for high dimensions:

When $p > 2$, directly visualizing boundaries is impossible. Strategies:

Project onto Fisher's discriminant directions (the most discriminative 2D subspace)
Use PCA to reduce to 2D (may not preserve boundary structure)
Pairwise feature plots (each plot shows one pair of features, all others marginalized)
UMAP/t-SNE (nonlinear embeddings that preserve some structure)

boundary_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import (
    LinearDiscriminantAnalysis, 
    QuadraticDiscriminantAnalysis
)
 
def plot_decision_boundary(X, y, model, ax=None, resolution=200):
    """
    Visualize decision boundary for a 2D classification problem.
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 8))
    
    # Create mesh grid
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, resolution),
        np.linspace(y_min, y_max, resolution)
    )
    
    # Get predictions on the grid
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot decision regions
    ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
    ax.contour(xx, yy, Z, colors='k', linewidths=2, linestyles='solid')
    
    # Plot data points
    scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', 
                          edgecolors='white', s=100)
    
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    
    return ax
 
 
def plot_posterior_surface(X, y, model, ax=None):
    """
    3D visualization of posterior probability surface.
    """
    from mpl_toolkits.mplot3d import Axes3D
    
    if ax is None:
        fig = plt.figure(figsize=(12, 8))
        ax = fig.add_subplot(111, projection='3d')
    
    # Create mesh grid
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, 100),
        np.linspace(y_min, y_max, 100)
    )
    
    # Get posterior probabilities
    proba = model.predict_proba(np.c_[xx.ravel(), yy.ravel()])
    Z = proba[:, 1].reshape(xx.shape)  # P(Y=class 1)
    
    # Plot surface
    ax.plot_surface(xx, yy, Z, cmap='coolwarm', alpha=0.7)
    
    # Add decision boundary plane at P=0.5
    ax.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
    
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_zlabel('P(Y=1|X)')
    
    return ax
 
 
def compare_lda_qda_boundaries(X, y):
    """
    Compare LDA and QDA decision boundaries side by side.
    """
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # Fit models
    lda = LinearDiscriminantAnalysis()
    qda = QuadraticDiscriminantAnalysis()
    lda.fit(X, y)
    qda.fit(X, y)
    
    # Plot LDA
    plot_decision_boundary(X, y, lda, ax=axes[0])
    axes[0].set_title('LDA: Linear Boundary', fontsize=14)
    
    # Plot QDA
    plot_decision_boundary(X, y, qda, ax=axes[1])
    axes[1].set_title('QDA: Quadratic Boundary', fontsize=14)
    
    plt.tight_layout()
    return fig

Visualization Best Practices

Always plot training data points on top of decision regions to assess fit quality. Look for: (1) misclassified training points near the boundary (expected), (2) misclassified points far from the boundary (potential problems), (3) boundary shape matching class distributions, (4) regions with no training data (low confidence extrapolation).

Boundary Interpretation and Diagnostics

Decision boundaries reveal much about both the data and the model. Learning to 'read' boundaries provides diagnostic insight.

Signs of good fit (LDA):

Boundary passes through the region where classes overlap
Few training points on the 'wrong' side of boundary
Boundary orientation aligns with apparent class separation direction
Classes appear roughly elliptical with similar spread on both sides

Signs of LDA misspecification:

Linear boundary cuts through one class's region (covariances likely differ)
Boundary seems 'tilted' relative to natural separation
One class has more misclassifications than another
Data appears to need a curved boundary

Signs of good QDA fit:

Curved boundary follows the natural interface between class distributions
Boundary shape consistent with observed covariance structure
Reasonable balance of errors across classes

Overfitting in QDA

With limited data, QDA boundaries can become overly complex—fitting noise rather than signal. Signs include: highly convoluted boundary, isolated 'islands' of one class inside another, perfect training accuracy but poor test accuracy. These suggest switching to LDA or regularization.

Feature importance from boundaries:

The boundary normal vector reveals feature importance:

For LDA with weights $w = \Sigma^{-1}(\mu_1 - \mu_2)$:

Large $|w_j|$: Feature $j$ strongly influences classification
Small $|w_j|$: Feature $j$ has little discriminative power
Sign of $w_j$: Indicates direction of effect

However, interpretation requires care:

Correlated features share importance
Scaling affects magnitude but not boundary shape
Importance in LDA differs from marginal class differences

Boundary sensitivity analysis:

How robust is the boundary to perturbations?

To data changes: Bootstrap the data and refit; stable boundary → robust classifier
To hyperparameters: Vary priors slightly; sensitive boundary → priors matter
To outliers: Remove extreme points and refit; movement → outlier influence

Methods like influence functions can quantify how each training point affects the boundary position.

Diagnostic Checklist for Decision Boundaries

•Is the boundary in the region of class overlap, or does it cut through one class?
•Are misclassification errors balanced across classes?
•Does the boundary complexity match data complexity (not too simple, not overfit)?
•Is the boundary stable under bootstrap resampling?
•Do feature coefficients have interpretable signs and magnitudes?
•Are there regions of feature space with no training data but confident predictions?

Multi-class Boundary Networks

With $K > 2$ classes, decision boundaries form a network of surfaces partitioning feature space into $K$ regions. Understanding this network structure adds insight.

Pairwise boundaries:

There are $\binom{K}{2} = \frac{K(K-1)}{2}$ pairwise boundaries between classes. Each boundary $\mathcal{B}_{kl}$ separates classes $k$ and $l$:

LDA: $\binom{K}{2}$ hyperplanes, with normal vectors $\Sigma^{-1}(\mu_k - \mu_l)$
QDA: $\binom{K}{2}$ quadric surfaces

The overall class $k$ region is:

$$\mathcal{R}_k = {x : \delta_k(x) > \delta_l(x) \text{ for all } l eq k}$$

This is the intersection of $K-1$ half-spaces (LDA) or $K-1$ quadric constraints (QDA).

Vertices, edges, and faces:

For LDA in $p$ dimensions:

Faces ($p-1$ dimensional): The pairwise boundary hyperplanes
Edges ($p-2$ dimensional): Intersections of pairs of boundaries
Vertices (0 dimensional): Intersections of $p$ boundaries

Vertices are points where multiple classes have equal posterior—typically where 3+ class regions meet.

The Centroid Property

For LDA with equal priors and classes in 'general position,' there exists a point equidistant (in Mahalanobis terms) from all class means—the centroid of the constellation in whitened space. All boundary hyperplanes pass through regions near this centroid.

Adjacency structure:

Not all class pairs share a 'visible' boundary. If class $m$ is 'between' classes $k$ and $l$, the $k$-$l$ boundary might not be reachable—you'd pass through class $m$ first.

The adjacency graph has:

Nodes = classes
Edges = pairs of classes that share a boundary region

This graph depends on the class configuration and may not be complete.

Probability transition along paths:

Moving from the center of class $k$'s region toward class $l$:

$$P(Y = k | X) \xrightarrow{\text{decreases}}$$ $$P(Y = l | X) \xrightarrow{\text{increases}}$$

At the boundary: $P(Y = k | X) = P(Y = l | X)$.

If another class $m$ intervenes, you cross the $k$-$m$ boundary first, then the $m$-$l$ boundary.

Implications for classification:

Multi-class classification isn't just pairwise comparison—regions interact
A point may be 'between' two classes but assigned to a third
The argmax rule resolves all pairwise comparisons simultaneously

Summary: Understanding Decision Boundaries

Key Takeaways

•LDA boundaries are hyperplanes with normal vector $\Sigma^{-1}(\mu_1 - \mu_2)$—the covariance-adjusted direction between means.
•QDA boundaries are quadric surfaces whose shape (ellipse, hyperbola, etc.) depends on how covariances differ.
•Priors shift boundaries toward less common classes without changing boundary shape.
•Posterior probabilities follow sigmoid curves perpendicular to boundaries; confidence increases with distance from boundary.
•Visualization techniques include 2D contour plots, 3D probability surfaces, and projection onto discriminant directions.
•Boundary diagnostics reveal model fit quality: position, shape, stability, and feature importance.
•Multi-class boundaries form networks of surfaces creating convex (LDA) or potentially non-convex (QDA) class regions.
•Reading boundaries provides insight into both data structure and model assumptions.

What's next:

We've now deeply explored LDA with shared covariance, QDA with class-specific covariances, and the geometry of their decision boundaries. The final topic in this module addresses a critical practical issue: Regularized Discriminant Analysis—methods that interpolate between LDA and QDA to handle high-dimensional settings where both pure methods struggle.

Page Complete

You now have a comprehensive understanding of decision boundaries in LDA and QDA: their mathematical forms, geometric properties, relationship to uncertainty, visualization techniques, and diagnostic interpretation. Next, we'll explore regularized discriminant analysis for robust classification in challenging settings.

Decision Boundaries

The Geometry of Classification

What You Will Learn

Mathematical Characterization of Decision Boundaries

A decision boundary is the set of points where two or more classes have equal posterior probability. For a two-class problem, the boundary is:

$$\mathcal{B} = {x : P(Y = 1 | X = x) = P(Y = 2 | X = x)}$$

Equivalently, using discriminant functions:

$$\mathcal{B} = {x : \delta_1(x) = \delta_2(x)}$$

LDA boundary equation:

For LDA with shared covariance $\Sigma$:

$$\delta_1(x) - \delta_2(x) = (\mu_1 - \mu_2)^T\Sigma^{-1}x - \frac{1}{2}(\mu_1^T\Sigma^{-1}\mu_1 - \mu_2^T\Sigma^{-1}\mu_2) + \log\frac{\pi_1}{\pi_2} = 0$$

This can be written as:

$$w^T x + b = 0$$

where:

$w = \Sigma^{-1}(\mu_1 - \mu_2)$ is the normal vector to the hyperplane
$b = -\frac{1}{2}(\mu_1 + \mu_2)^T\Sigma^{-1}(\mu_1 - \mu_2) + \log\frac{\pi_1}{\pi_2}$ is the bias

This is a hyperplane in $\mathbb{R}^p$.

The Role of Covariance in LDA Boundaries

QDA boundary equation:

For QDA with class-specific covariances $\Sigma_1, \Sigma_2$:

$$\delta_1(x) - \delta_2(x) = 0$$

Expanding:

$$-\frac{1}{2}x^T(\Sigma_1^{-1} - \Sigma_2^{-1})x + (\mu_1^T\Sigma_1^{-1} - \mu_2^T\Sigma_2^{-1})x + c = 0$$

where $c$ absorbs constant terms (means, determinants, priors).

This is a quadratic equation in $x$:

$$x^T A x + b^T x + c = 0$$

where:

$A = \frac{1}{2}(\Sigma_2^{-1} - \Sigma_1^{-1})$ is a symmetric matrix determining the quadratic form
$b = \Sigma_1^{-1}\mu_1 - \Sigma_2^{-1}\mu_2$ is the linear coefficient
$c$ is a scalar constant

This defines a quadric surface (conic section in 2D, quadric in higher dimensions).

Boundary Types in 2D Based on Matrix A's Properties
Eigenvalues of A	Boundary Type	Geometric Shape
Both positive or both negative	Ellipse	Closed curve, one class inside
One positive, one negative	Hyperbola	Two branches, open curve
One zero, one nonzero	Parabola	Single open curve
Both zero (A = 0)	Line (degenerate)	Reduces to LDA case

Geometric Properties of LDA Boundaries

LDA's linear boundaries have elegant geometric interpretations that provide deep insight into the classifier's behavior.

The boundary as a perpendicular bisector:

With equal priors ($\pi_1 = \pi_2$), the LDA boundary is the perpendicular bisector of the line segment connecting $\mu_1$ and $\mu_2$—but in Mahalanobis space, not Euclidean space.

In Mahalanobis space (after whitening by $\Sigma^{-1/2}$):

The boundary passes through the midpoint $\frac{1}{2}(\mu_1 + \mu_2)$
The boundary is perpendicular to the line connecting $\mu_1$ to $\mu_2$

In original space, this manifests as a hyperplane that may not appear perpendicular to ($\mu_1 - \mu_2$) if features are correlated.

Distance interpretation:

Classification is based on Mahalanobis distance to class means. For a point $x$:

$$d_M(x, \mu_k) = \sqrt{(x - \mu_k)^T\Sigma^{-1}(x - \mu_k)}$$

With equal priors, classification reduces to: assign to the class with smaller Mahalanobis distance. The boundary is the locus of points equidistant (in Mahalanobis terms) from both means.

Why Mahalanobis Distance Matters

Effect of priors:

When priors are unequal, the boundary shifts:

If $\pi_1 > \pi_2$: boundary shifts toward class 2 (more of the space is assigned to the more common class 1)
The shift amount is proportional to $\log(\pi_1/\pi_2)$

Multi-class boundaries:

With $K > 2$ classes:

There are $\binom{K}{2}$ pairwise boundaries (hyperplanes)
Each class's region is the intersection of half-spaces where that class beats all others
All boundaries pass through a common point only in special symmetric cases
The overall partition is a Voronoi diagram in Mahalanobis space

The decision regions are convex polytopes—a key property ensuring no 'islands' of one class inside another.

The discriminant direction:

For two classes, the most informative direction is $w = \Sigma^{-1}(\mu_1 - \mu_2)$. Projecting data onto this direction compresses the classification problem to 1D.

The projection $z = w^T x$ has the property that:

Class means are maximally separated relative to within-class variance
The 1D threshold classification is optimal (under LDA assumptions)

This is exactly Fisher's Linear Discriminant—the supervised projection that best separates classes.

Boundary margin:

Define the margin as the distance from a class mean to the boundary:

$$\text{margin}_k = \frac{|w^T\mu_k + b|}{|w|}$$

With equal priors, both classes have the same margin (the boundary is equidistant). The total 'gap' between class means projected onto the discriminant direction is:

$$\text{separation} = \frac{|w^T(\mu_1 - \mu_2)|}{|w|} = \frac{(\mu_1 - \mu_2)^T\Sigma^{-1}(\mu_1 - \mu_2)}{|\Sigma^{-1}(\mu_1 - \mu_2)|}$$

Larger separation means better class discrimination.

Geometric Properties of QDA Boundaries

QDA boundaries are more complex than LDA's hyperplanes, taking the form of quadric surfaces. Understanding these geometries helps interpret QDA behavior.

Quadric surfaces in 2D (conic sections):

In two dimensions, the QDA boundary $x^T A x + b^T x + c = 0$ is a conic section:

Ellipse (when $\det(A) > 0$, same-sign eigenvalues):
- One class is 'inside' the ellipse, one outside
- Occurs when one class has much smaller variance
- Example: detecting outliers, anomaly detection
Hyperbola (when $\det(A) < 0$, opposite-sign eigenvalues):
- Classes on opposite branches
- Occurs when classes have different orientations/correlations
- Example: classes spread in orthogonal directions
Parabola (when $\det(A) = 0$ but $A eq 0$):
- Asymmetric boundary
- Occurs in degenerate cases
Line (when $A \approx 0$):
- Degenerate quadric = hyperplane
- Occurs when $\Sigma_1 \approx \Sigma_2$ (QDA ≈ LDA)

Interpreting the A Matrix

Quadric surfaces in higher dimensions:

In $p$ dimensions, the boundary $x^T A x + b^T x + c = 0$ is a quadric hypersurface:

Ellipsoid: All eigenvalues of $A$ have the same sign
Hyperboloid: Eigenvalues have mixed signs
Paraboloid: Some eigenvalues are zero
Cylinder: Embedded lower-dimensional quadric

Non-convexity and disconnected regions:

Unlike LDA, QDA decision regions can be:

Non-convex: A region can be 'waisted' or curved inward
Disconnected: A class can have multiple separate regions

This happens when one class's covariance is small in some directions—the boundary curves around the tight cluster, potentially isolating it or creating disjoint regions.

Effect of priors in QDA:

Priors shift the constant term $c$, which moves the entire quadric surface without changing its shape. For an elliptical boundary:

Higher prior for the 'inside' class: ellipse shrinks
Higher prior for the 'outside' class: ellipse expands

For hyperbolic boundaries, prior changes shift the balance between the two branches.

QDA Boundary Shapes and Their Interpretations
Shape	Data Characteristic	Classification Behavior
Ellipse (small inside)	One very compact class	Points inside belong to compact class
Ellipse (large inside)	One very diffuse class	Points inside belong to diffuse class
Hyperbola	Different correlation structures	Classes separated by curved front
Near-linear	Similar covariances	Behaves like LDA
Complex curved	Varied covariance differences	Captures nuanced structure

Confidence and Uncertainty Near Boundaries

The decision boundary is where classification is most uncertain—where posterior probabilities are exactly equal. Understanding how confidence varies near boundaries is crucial for reliability.

Posterior probabilities near the boundary:

The posterior probability for the two-class case follows a sigmoid curve as you move perpendicular to the boundary:

$$P(Y = 1 | X = x) = \frac{1}{1 + \exp(-(\delta_1(x) - \delta_2(x)))}$$

At the boundary: $P(Y = 1 | X) = 0.5$
Far from boundary (on class 1 side): $P(Y = 1 | X) \to 1$
Far from boundary (on class 2 side): $P(Y = 1 | X) \to 0$

The rate of transition from 0.5 to high confidence depends on class separation. Well-separated classes have sharp transitions; overlapping classes have gradual transitions.

Boundary Uncertainty ≠ Overall Uncertainty

The confidence gradient:

Define the confidence score $s(x) = \max_k P(Y = k | X = x)$. The gradient of this score (in feature space) points away from boundaries:

Near boundaries: confidence is low (~$1/K$)
Far from all boundaries: confidence is high (~1)

For LDA, isoclines of constant confidence are parallel to the decision boundaries (since discriminants are linear). For QDA, isoclines are more complex curved surfaces.

Practical implications:

Prediction reliability: Points near boundaries should be flagged as uncertain
Active learning: Query labels for points near boundaries to most improve the model
Reject option: Abstain from prediction when $\max_k P(Y = k | X) < \tau$

Margin-based approaches:

Define the margin as the distance to the nearest decision boundary:

$$\text{margin}(x) = \min_{\text{boundaries } \mathcal{B}} d(x, \mathcal{B})$$

For LDA, this is easy to compute (perpendicular distance to hyperplane). For QDA, finding the closest point on a quadric is more involved.

Larger margins correlate with higher confidence and more reliable predictions.

Visualizing Decision Boundaries

Visualization brings abstract boundaries to life, revealing how classifiers partition feature space. Different techniques suit different purposes.

2D contour plots:

For two features:

Create a grid of points covering the feature space
Compute $\hat{y}$ or $P(Y = k | X)$ at each grid point
Plot filled contours or class labels

This shows the boundary as the interface between colored regions.

Decision surface plots (3D):

For two features showing a probability surface:

Compute $P(Y = 1 | X = (x_1, x_2))$ on a grid
Plot as a 3D surface with height = probability
The boundary is where the surface crosses $P = 0.5$

Projection methods for high dimensions:

When $p > 2$, directly visualizing boundaries is impossible. Strategies:

Project onto Fisher's discriminant directions (the most discriminative 2D subspace)
Use PCA to reduce to 2D (may not preserve boundary structure)
Pairwise feature plots (each plot shows one pair of features, all others marginalized)
UMAP/t-SNE (nonlinear embeddings that preserve some structure)

boundary_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import (
    LinearDiscriminantAnalysis, 
    QuadraticDiscriminantAnalysis
)
 
def plot_decision_boundary(X, y, model, ax=None, resolution=200):
    """
    Visualize decision boundary for a 2D classification problem.
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 8))
    
    # Create mesh grid
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, resolution),
        np.linspace(y_min, y_max, resolution)
    )
    
    # Get predictions on the grid
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot decision regions
    ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
    ax.contour(xx, yy, Z, colors='k', linewidths=2, linestyles='solid')
    
    # Plot data points
    scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', 
                          edgecolors='white', s=100)
    
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    
    return ax
 
 
def plot_posterior_surface(X, y, model, ax=None):
    """
    3D visualization of posterior probability surface.
    """
    from mpl_toolkits.mplot3d import Axes3D
    
    if ax is None:
        fig = plt.figure(figsize=(12, 8))
        ax = fig.add_subplot(111, projection='3d')
    
    # Create mesh grid
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, 100),
        np.linspace(y_min, y_max, 100)
    )
    
    # Get posterior probabilities
    proba = model.predict_proba(np.c_[xx.ravel(), yy.ravel()])
    Z = proba[:, 1].reshape(xx.shape)  # P(Y=class 1)
    
    # Plot surface
    ax.plot_surface(xx, yy, Z, cmap='coolwarm', alpha=0.7)
    
    # Add decision boundary plane at P=0.5
    ax.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
    
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_zlabel('P(Y=1|X)')
    
    return ax
 
 
def compare_lda_qda_boundaries(X, y):
    """
    Compare LDA and QDA decision boundaries side by side.
    """
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # Fit models
    lda = LinearDiscriminantAnalysis()
    qda = QuadraticDiscriminantAnalysis()
    lda.fit(X, y)
    qda.fit(X, y)
    
    # Plot LDA
    plot_decision_boundary(X, y, lda, ax=axes[0])
    axes[0].set_title('LDA: Linear Boundary', fontsize=14)
    
    # Plot QDA
    plot_decision_boundary(X, y, qda, ax=axes[1])
    axes[1].set_title('QDA: Quadratic Boundary', fontsize=14)
    
    plt.tight_layout()
    return fig

Visualization Best Practices

Boundary Interpretation and Diagnostics

Decision boundaries reveal much about both the data and the model. Learning to 'read' boundaries provides diagnostic insight.

Signs of good fit (LDA):

Boundary passes through the region where classes overlap
Few training points on the 'wrong' side of boundary
Boundary orientation aligns with apparent class separation direction
Classes appear roughly elliptical with similar spread on both sides

Signs of LDA misspecification:

Linear boundary cuts through one class's region (covariances likely differ)
Boundary seems 'tilted' relative to natural separation
One class has more misclassifications than another
Data appears to need a curved boundary

Signs of good QDA fit:

Curved boundary follows the natural interface between class distributions
Boundary shape consistent with observed covariance structure
Reasonable balance of errors across classes

Overfitting in QDA

Feature importance from boundaries:

The boundary normal vector reveals feature importance:

For LDA with weights $w = \Sigma^{-1}(\mu_1 - \mu_2)$:

Large $|w_j|$: Feature $j$ strongly influences classification
Small $|w_j|$: Feature $j$ has little discriminative power
Sign of $w_j$: Indicates direction of effect

However, interpretation requires care:

Correlated features share importance
Scaling affects magnitude but not boundary shape
Importance in LDA differs from marginal class differences

Boundary sensitivity analysis:

How robust is the boundary to perturbations?

To data changes: Bootstrap the data and refit; stable boundary → robust classifier
To hyperparameters: Vary priors slightly; sensitive boundary → priors matter
To outliers: Remove extreme points and refit; movement → outlier influence

Methods like influence functions can quantify how each training point affects the boundary position.

Diagnostic Checklist for Decision Boundaries

•Is the boundary in the region of class overlap, or does it cut through one class?
•Are misclassification errors balanced across classes?
•Does the boundary complexity match data complexity (not too simple, not overfit)?
•Is the boundary stable under bootstrap resampling?
•Do feature coefficients have interpretable signs and magnitudes?
•Are there regions of feature space with no training data but confident predictions?

Multi-class Boundary Networks

With $K > 2$ classes, decision boundaries form a network of surfaces partitioning feature space into $K$ regions. Understanding this network structure adds insight.

Pairwise boundaries:

There are $\binom{K}{2} = \frac{K(K-1)}{2}$ pairwise boundaries between classes. Each boundary $\mathcal{B}_{kl}$ separates classes $k$ and $l$:

LDA: $\binom{K}{2}$ hyperplanes, with normal vectors $\Sigma^{-1}(\mu_k - \mu_l)$
QDA: $\binom{K}{2}$ quadric surfaces

The overall class $k$ region is:

$$\mathcal{R}_k = {x : \delta_k(x) > \delta_l(x) \text{ for all } l eq k}$$

This is the intersection of $K-1$ half-spaces (LDA) or $K-1$ quadric constraints (QDA).

Vertices, edges, and faces:

For LDA in $p$ dimensions:

Faces ($p-1$ dimensional): The pairwise boundary hyperplanes
Edges ($p-2$ dimensional): Intersections of pairs of boundaries
Vertices (0 dimensional): Intersections of $p$ boundaries

Vertices are points where multiple classes have equal posterior—typically where 3+ class regions meet.

The Centroid Property

Adjacency structure:

Not all class pairs share a 'visible' boundary. If class $m$ is 'between' classes $k$ and $l$, the $k$-$l$ boundary might not be reachable—you'd pass through class $m$ first.

The adjacency graph has:

Nodes = classes
Edges = pairs of classes that share a boundary region

This graph depends on the class configuration and may not be complete.

Probability transition along paths:

Moving from the center of class $k$'s region toward class $l$:

$$P(Y = k | X) \xrightarrow{\text{decreases}}$$ $$P(Y = l | X) \xrightarrow{\text{increases}}$$

At the boundary: $P(Y = k | X) = P(Y = l | X)$.

If another class $m$ intervenes, you cross the $k$-$m$ boundary first, then the $m$-$l$ boundary.

Implications for classification:

Multi-class classification isn't just pairwise comparison—regions interact
A point may be 'between' two classes but assigned to a third
The argmax rule resolves all pairwise comparisons simultaneously

Summary: Understanding Decision Boundaries

Key Takeaways

•LDA boundaries are hyperplanes with normal vector $\Sigma^{-1}(\mu_1 - \mu_2)$—the covariance-adjusted direction between means.
•QDA boundaries are quadric surfaces whose shape (ellipse, hyperbola, etc.) depends on how covariances differ.
•Priors shift boundaries toward less common classes without changing boundary shape.
•Posterior probabilities follow sigmoid curves perpendicular to boundaries; confidence increases with distance from boundary.
•Visualization techniques include 2D contour plots, 3D probability surfaces, and projection onto discriminant directions.
•Boundary diagnostics reveal model fit quality: position, shape, stability, and feature importance.
•Multi-class boundaries form networks of surfaces creating convex (LDA) or potentially non-convex (QDA) class regions.
•Reading boundaries provides insight into both data structure and model assumptions.

What's next:

Page Complete