Loading content...
At its core, classification partitions feature space into distinct regions—one for each class. The boundaries separating these regions are decision boundaries, and understanding their geometry is essential for interpreting classifier behavior, diagnosing problems, and building intuition about model assumptions.
In LDA and QDA, decision boundaries have precise mathematical forms: hyperplanes for LDA, quadric surfaces for QDA. These aren't arbitrary shapes but directly reflect the underlying Gaussian assumptions. A classifier's boundary reveals what it 'believes' about the data—where it's confident, where it's uncertain, and where assumptions may be violated.
This page develops a deep understanding of decision boundaries: their mathematical characterization, geometric properties, how to visualize and interpret them, and what they tell us about model behavior at different points in feature space.
By the end of this page, you will understand: the mathematical equations defining LDA and QDA boundaries, geometric properties of linear and quadratic boundaries, how priors shift boundaries, uncertainty near boundaries, visualization techniques, and how to interpret boundary shapes in relation to model assumptions.
A decision boundary is the set of points where two or more classes have equal posterior probability. For a two-class problem, the boundary is:
$$\mathcal{B} = {x : P(Y = 1 | X = x) = P(Y = 2 | X = x)}$$
Equivalently, using discriminant functions:
$$\mathcal{B} = {x : \delta_1(x) = \delta_2(x)}$$
LDA boundary equation:
For LDA with shared covariance $\Sigma$:
$$\delta_1(x) - \delta_2(x) = (\mu_1 - \mu_2)^T\Sigma^{-1}x - \frac{1}{2}(\mu_1^T\Sigma^{-1}\mu_1 - \mu_2^T\Sigma^{-1}\mu_2) + \log\frac{\pi_1}{\pi_2} = 0$$
This can be written as:
$$w^T x + b = 0$$
where:
This is a hyperplane in $\mathbb{R}^p$.
The normal vector $w = \Sigma^{-1}(\mu_1 - \mu_2)$ is not simply the direction between class means. The covariance inverse 'rotates and stretches' this direction based on feature correlations. In directions of high variance, the boundary rotates toward the higher-variance class; correlations affect the boundary angle.
QDA boundary equation:
For QDA with class-specific covariances $\Sigma_1, \Sigma_2$:
$$\delta_1(x) - \delta_2(x) = 0$$
Expanding:
$$-\frac{1}{2}x^T(\Sigma_1^{-1} - \Sigma_2^{-1})x + (\mu_1^T\Sigma_1^{-1} - \mu_2^T\Sigma_2^{-1})x + c = 0$$
where $c$ absorbs constant terms (means, determinants, priors).
This is a quadratic equation in $x$:
$$x^T A x + b^T x + c = 0$$
where:
This defines a quadric surface (conic section in 2D, quadric in higher dimensions).
| Eigenvalues of A | Boundary Type | Geometric Shape |
|---|---|---|
| Both positive or both negative | Ellipse | Closed curve, one class inside |
| One positive, one negative | Hyperbola | Two branches, open curve |
| One zero, one nonzero | Parabola | Single open curve |
| Both zero (A = 0) | Line (degenerate) | Reduces to LDA case |
LDA's linear boundaries have elegant geometric interpretations that provide deep insight into the classifier's behavior.
The boundary as a perpendicular bisector:
With equal priors ($\pi_1 = \pi_2$), the LDA boundary is the perpendicular bisector of the line segment connecting $\mu_1$ and $\mu_2$—but in Mahalanobis space, not Euclidean space.
In Mahalanobis space (after whitening by $\Sigma^{-1/2}$):
In original space, this manifests as a hyperplane that may not appear perpendicular to ($\mu_1 - \mu_2$) if features are correlated.
Distance interpretation:
Classification is based on Mahalanobis distance to class means. For a point $x$:
$$d_M(x, \mu_k) = \sqrt{(x - \mu_k)^T\Sigma^{-1}(x - \mu_k)}$$
With equal priors, classification reduces to: assign to the class with smaller Mahalanobis distance. The boundary is the locus of points equidistant (in Mahalanobis terms) from both means.
Mahalanobis distance accounts for feature correlations and different variances. In Euclidean distance, an outlier in a high-variance direction seems far from both means. In Mahalanobis distance, such a point is appropriately recognized as still within the 'typical' range and classified based on mean proximity.
Effect of priors:
When priors are unequal, the boundary shifts:
Geometrically, the boundary no longer passes through the midpoint; it moves closer to the less common class. This implements Bayesian reasoning: without evidence from features, bet on the common class.
Multi-class boundaries:
With $K > 2$ classes:
The decision regions are convex polytopes—a key property ensuring no 'islands' of one class inside another.
The discriminant direction:
For two classes, the most informative direction is $w = \Sigma^{-1}(\mu_1 - \mu_2)$. Projecting data onto this direction compresses the classification problem to 1D.
The projection $z = w^T x$ has the property that:
This is exactly Fisher's Linear Discriminant—the supervised projection that best separates classes.
Boundary margin:
Define the margin as the distance from a class mean to the boundary:
$$\text{margin}_k = \frac{|w^T\mu_k + b|}{|w|}$$
With equal priors, both classes have the same margin (the boundary is equidistant). The total 'gap' between class means projected onto the discriminant direction is:
$$\text{separation} = \frac{|w^T(\mu_1 - \mu_2)|}{|w|} = \frac{(\mu_1 - \mu_2)^T\Sigma^{-1}(\mu_1 - \mu_2)}{|\Sigma^{-1}(\mu_1 - \mu_2)|}$$
Larger separation means better class discrimination.
QDA boundaries are more complex than LDA's hyperplanes, taking the form of quadric surfaces. Understanding these geometries helps interpret QDA behavior.
Quadric surfaces in 2D (conic sections):
In two dimensions, the QDA boundary $x^T A x + b^T x + c = 0$ is a conic section:
Ellipse (when $\det(A) > 0$, same-sign eigenvalues):
Hyperbola (when $\det(A) < 0$, opposite-sign eigenvalues):
Parabola (when $\det(A) = 0$ but $A eq 0$):
Line (when $A \approx 0$):
The matrix $A = \frac{1}{2}(\Sigma_2^{-1} - \Sigma_1^{-1})$ captures how covariances differ. Its eigenvectors show the directions of maximum difference; its eigenvalues show the magnitude. Large eigenvalues mean strongly curved boundaries in those directions.
Quadric surfaces in higher dimensions:
In $p$ dimensions, the boundary $x^T A x + b^T x + c = 0$ is a quadric hypersurface:
Non-convexity and disconnected regions:
Unlike LDA, QDA decision regions can be:
This happens when one class's covariance is small in some directions—the boundary curves around the tight cluster, potentially isolating it or creating disjoint regions.
Effect of priors in QDA:
Priors shift the constant term $c$, which moves the entire quadric surface without changing its shape. For an elliptical boundary:
For hyperbolic boundaries, prior changes shift the balance between the two branches.
| Shape | Data Characteristic | Classification Behavior |
|---|---|---|
| Ellipse (small inside) | One very compact class | Points inside belong to compact class |
| Ellipse (large inside) | One very diffuse class | Points inside belong to diffuse class |
| Hyperbola | Different correlation structures | Classes separated by curved front |
| Near-linear | Similar covariances | Behaves like LDA |
| Complex curved | Varied covariance differences | Captures nuanced structure |
The decision boundary is where classification is most uncertain—where posterior probabilities are exactly equal. Understanding how confidence varies near boundaries is crucial for reliability.
Posterior probabilities near the boundary:
The posterior probability for the two-class case follows a sigmoid curve as you move perpendicular to the boundary:
$$P(Y = 1 | X = x) = \frac{1}{1 + \exp(-(\delta_1(x) - \delta_2(x)))}$$
The rate of transition from 0.5 to high confidence depends on class separation. Well-separated classes have sharp transitions; overlapping classes have gradual transitions.
A point near the boundary has low confidence between the two adjacent classes. But uncertainty can also arise far from boundaries—in regions of low density where the model has seen little data. LDA/QDA posterior probabilities don't account for this epistemic uncertainty; they assume the model is correct everywhere.
The confidence gradient:
Define the confidence score $s(x) = \max_k P(Y = k | X = x)$. The gradient of this score (in feature space) points away from boundaries:
For LDA, isoclines of constant confidence are parallel to the decision boundaries (since discriminants are linear). For QDA, isoclines are more complex curved surfaces.
Practical implications:
Margin-based approaches:
Define the margin as the distance to the nearest decision boundary:
$$\text{margin}(x) = \min_{\text{boundaries } \mathcal{B}} d(x, \mathcal{B})$$
For LDA, this is easy to compute (perpendicular distance to hyperplane). For QDA, finding the closest point on a quadric is more involved.
Larger margins correlate with higher confidence and more reliable predictions.
Visualization brings abstract boundaries to life, revealing how classifiers partition feature space. Different techniques suit different purposes.
2D contour plots:
For two features:
This shows the boundary as the interface between colored regions.
Decision surface plots (3D):
For two features showing a probability surface:
Projection methods for high dimensions:
When $p > 2$, directly visualizing boundaries is impossible. Strategies:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.discriminant_analysis import ( LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis) def plot_decision_boundary(X, y, model, ax=None, resolution=200): """ Visualize decision boundary for a 2D classification problem. """ if ax is None: fig, ax = plt.subplots(figsize=(10, 8)) # Create mesh grid x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid( np.linspace(x_min, x_max, resolution), np.linspace(y_min, y_max, resolution) ) # Get predictions on the grid Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) # Plot decision regions ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm') ax.contour(xx, yy, Z, colors='k', linewidths=2, linestyles='solid') # Plot data points scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='white', s=100) ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') return ax def plot_posterior_surface(X, y, model, ax=None): """ 3D visualization of posterior probability surface. """ from mpl_toolkits.mplot3d import Axes3D if ax is None: fig = plt.figure(figsize=(12, 8)) ax = fig.add_subplot(111, projection='3d') # Create mesh grid x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid( np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100) ) # Get posterior probabilities proba = model.predict_proba(np.c_[xx.ravel(), yy.ravel()]) Z = proba[:, 1].reshape(xx.shape) # P(Y=class 1) # Plot surface ax.plot_surface(xx, yy, Z, cmap='coolwarm', alpha=0.7) # Add decision boundary plane at P=0.5 ax.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2) ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') ax.set_zlabel('P(Y=1|X)') return ax def compare_lda_qda_boundaries(X, y): """ Compare LDA and QDA decision boundaries side by side. """ fig, axes = plt.subplots(1, 2, figsize=(16, 6)) # Fit models lda = LinearDiscriminantAnalysis() qda = QuadraticDiscriminantAnalysis() lda.fit(X, y) qda.fit(X, y) # Plot LDA plot_decision_boundary(X, y, lda, ax=axes[0]) axes[0].set_title('LDA: Linear Boundary', fontsize=14) # Plot QDA plot_decision_boundary(X, y, qda, ax=axes[1]) axes[1].set_title('QDA: Quadratic Boundary', fontsize=14) plt.tight_layout() return figAlways plot training data points on top of decision regions to assess fit quality. Look for: (1) misclassified training points near the boundary (expected), (2) misclassified points far from the boundary (potential problems), (3) boundary shape matching class distributions, (4) regions with no training data (low confidence extrapolation).
Decision boundaries reveal much about both the data and the model. Learning to 'read' boundaries provides diagnostic insight.
Signs of good fit (LDA):
Signs of LDA misspecification:
Signs of good QDA fit:
With limited data, QDA boundaries can become overly complex—fitting noise rather than signal. Signs include: highly convoluted boundary, isolated 'islands' of one class inside another, perfect training accuracy but poor test accuracy. These suggest switching to LDA or regularization.
Feature importance from boundaries:
The boundary normal vector reveals feature importance:
For LDA with weights $w = \Sigma^{-1}(\mu_1 - \mu_2)$:
However, interpretation requires care:
Boundary sensitivity analysis:
How robust is the boundary to perturbations?
Methods like influence functions can quantify how each training point affects the boundary position.
With $K > 2$ classes, decision boundaries form a network of surfaces partitioning feature space into $K$ regions. Understanding this network structure adds insight.
Pairwise boundaries:
There are $\binom{K}{2} = \frac{K(K-1)}{2}$ pairwise boundaries between classes. Each boundary $\mathcal{B}_{kl}$ separates classes $k$ and $l$:
The overall class $k$ region is:
$$\mathcal{R}_k = {x : \delta_k(x) > \delta_l(x) \text{ for all } l eq k}$$
This is the intersection of $K-1$ half-spaces (LDA) or $K-1$ quadric constraints (QDA).
Vertices, edges, and faces:
For LDA in $p$ dimensions:
Vertices are points where multiple classes have equal posterior—typically where 3+ class regions meet.
For LDA with equal priors and classes in 'general position,' there exists a point equidistant (in Mahalanobis terms) from all class means—the centroid of the constellation in whitened space. All boundary hyperplanes pass through regions near this centroid.
Adjacency structure:
Not all class pairs share a 'visible' boundary. If class $m$ is 'between' classes $k$ and $l$, the $k$-$l$ boundary might not be reachable—you'd pass through class $m$ first.
The adjacency graph has:
This graph depends on the class configuration and may not be complete.
Probability transition along paths:
Moving from the center of class $k$'s region toward class $l$:
$$P(Y = k | X) \xrightarrow{\text{decreases}}$$ $$P(Y = l | X) \xrightarrow{\text{increases}}$$
At the boundary: $P(Y = k | X) = P(Y = l | X)$.
If another class $m$ intervenes, you cross the $k$-$m$ boundary first, then the $m$-$l$ boundary.
Implications for classification:
What's next:
We've now deeply explored LDA with shared covariance, QDA with class-specific covariances, and the geometry of their decision boundaries. The final topic in this module addresses a critical practical issue: Regularized Discriminant Analysis—methods that interpolate between LDA and QDA to handle high-dimensional settings where both pure methods struggle.
You now have a comprehensive understanding of decision boundaries in LDA and QDA: their mathematical forms, geometric properties, relationship to uncertainty, visualization techniques, and diagnostic interpretation. Next, we'll explore regularized discriminant analysis for robust classification in challenging settings.