Loading content...
Every classifier implicitly or explicitly divides the feature space into regions—territories where different classes reign. The boundaries between these territories are decision boundaries, the surfaces where the classifier's confidence shifts from one class to another.
In Gaussian Naive Bayes, these boundaries are not arbitrary. They emerge mathematically from the Gaussian assumptions: the means, variances, and priors we estimated from training data determine exactly where one class's region ends and another's begins.
Understanding decision boundaries provides profound insight into classifier behavior:
This page derives the decision boundary equations for Gaussian Naive Bayes, reveals when boundaries are linear versus quadratic, and develops intuition through visualization.
By the end of this page, you will understand: (1) the mathematical definition of decision boundaries, (2) derivation of the boundary equation for Gaussian NB, (3) conditions for linear versus quadratic boundaries, (4) the role of equal variances in producing linear boundaries, (5) visualization and interpretation of boundaries in 2D, and (6) how class priors shift boundaries.
Before deriving the specific form for Gaussian Naive Bayes, let us establish the general concepts.
A classifier assigns each point $\mathbf{x}$ in feature space to one of $K$ classes. This partitions the space into decision regions:
$$\mathcal{R}_k = {\mathbf{x} : \hat{y}(\mathbf{x}) = k}$$
The decision region for class $k$ is the set of all points that get classified as class $k$.
The decision boundary is the set of points where the classifier is indifferent between two or more classes—where the posterior probabilities (or scores) are equal:
$$\mathcal{B}_{jk} = {\mathbf{x} : P(y = j | \mathbf{x}) = P(y = k | \mathbf{x})}$$
For the boundary between classes $j$ and $k$.
In binary classification ($K = 2$), there's a single boundary separating classes 0 and 1:
$$\mathcal{B} = {\mathbf{x} : P(y = 1 | \mathbf{x}) = P(y = 0 | \mathbf{x}) = 0.5}$$
Points where $P(y = 1 | \mathbf{x}) > 0.5$ belong to class 1's region; otherwise class 0.
With $K > 2$ classes, boundaries become more complex:
In binary classification, the boundary is often characterized by the log-odds (logit): $\log\frac{P(y=1|\mathbf{x})}{P(y=0|\mathbf{x})} = 0$. Points with positive log-odds go to class 1; negative to class 0. This formulation connects Gaussian NB to logistic regression.
Let us derive the decision boundary equation for Gaussian Naive Bayes with two classes. The analysis extends naturally to multiple classes.
For classes 0 and 1 with:
The boundary is where: $$\log P(y = 1 | \mathbf{x}) = \log P(y = 0 | \mathbf{x})$$
Using Bayes' theorem: $$\log \pi_1 + \log f(\mathbf{x} | y = 1) = \log \pi_0 + \log f(\mathbf{x} | y = 0)$$
Substituting the Gaussian Naive Bayes factorization: $$\log \pi_1 + \sum_{j=1}^{d} \log f(x_j | y = 1) = \log \pi_0 + \sum_{j=1}^{d} \log f(x_j | y = 0)$$
Recall: $$\log f(x_j | y = k) = -\frac{(x_j - \mu_{jk})^2}{2\sigma^2_{jk}} - \frac{1}{2}\log(2\pi\sigma^2_{jk})$$
Substituting and rearranging: $$\log\frac{\pi_1}{\pi_0} + \sum_{j=1}^{d} \left[ -\frac{(x_j - \mu_{j1})^2}{2\sigma^2_{j1}} + \frac{(x_j - \mu_{j0})^2}{2\sigma^2_{j0}} - \frac{1}{2}\log\frac{\sigma^2_{j1}}{\sigma^2_{j0}} \right] = 0$$
Let's define: $$g(\mathbf{x}) = \log\frac{\pi_1}{\pi_0} + \sum_{j=1}^{d} \left[ \frac{(x_j - \mu_{j0})^2}{2\sigma^2_{j0}} - \frac{(x_j - \mu_{j1})^2}{2\sigma^2_{j1}} - \frac{1}{2}\log\frac{\sigma^2_{j1}}{\sigma^2_{j0}} \right]$$
Then:
Notice that $g(\mathbf{x})$ contains terms like $(x_j - \mu_{jk})^2$, which are quadratic in $x_j$. This means the boundary equation is generally quadratic in the features. However, under special conditions, the quadratic terms cancel, leaving a linear boundary.
The shape of the Gaussian NB decision boundary depends critically on whether variances differ across classes. This distinction has profound implications.
Expanding the boundary equation for feature $j$: $$\frac{(x_j - \mu_{j0})^2}{2\sigma^2_{j0}} - \frac{(x_j - \mu_{j1})^2}{2\sigma^2_{j1}}$$
$$= \frac{x_j^2 - 2x_j\mu_{j0} + \mu_{j0}^2}{2\sigma^2_{j0}} - \frac{x_j^2 - 2x_j\mu_{j1} + \mu_{j1}^2}{2\sigma^2_{j1}}$$
$$= x_j^2 \left( \frac{1}{2\sigma^2_{j0}} - \frac{1}{2\sigma^2_{j1}} \right) + x_j \left( \frac{\mu_{j1}}{\sigma^2_{j1}} - \frac{\mu_{j0}}{\sigma^2_{j0}} \right) + \text{constant}$$
When $\sigma^2_{j0} eq \sigma^2_{j1}$:
When $\sigma^2_{j0} = \sigma^2_{j1} = \sigma^2_j$ for all features:
Under equal variances: $$g(\mathbf{x}) = \log\frac{\pi_1}{\pi_0} + \sum_{j=1}^{d} \frac{(\mu_{j1} - \mu_{j0})x_j - \frac{1}{2}(\mu_{j1}^2 - \mu_{j0}^2)}{\sigma^2_j}$$
$$= \mathbf{w}^T\mathbf{x} + b$$
Where: $$w_j = \frac{\mu_{j1} - \mu_{j0}}{\sigma^2_j}, \quad b = \log\frac{\pi_1}{\pi_0} - \frac{1}{2}\sum_{j=1}^{d} \frac{\mu_{j1}^2 - \mu_{j0}^2}{\sigma^2_j}$$
| Condition | Boundary Shape | Equation Form | Complexity |
|---|---|---|---|
| $\sigma^2_{jk}$ varies by class | Quadratic | $x^T A x + b^T x + c = 0$ | Curved surfaces, ellipsoids |
| $\sigma^2_{jk}$ same for all classes | Linear | $w^T x + b = 0$ | Hyperplanes |
| Equal variance + equal priors | Linear (bisector) | $w^T x + b = 0$ | Perpendicular bisector |
Gaussian NB with equal variances produces linear boundaries similar to Linear Discriminant Analysis (LDA). The key difference is that NB assumes diagonal covariance (feature independence), while LDA allows full covariance. This connection is explored in detail in the next page.
Understanding decision boundaries geometrically provides crucial intuition for classifier behavior.
When both classes have equal variances and equal priors:
With unequal priors ($\pi_1 eq \pi_0$):
Intuitively: If class 1 is rare, you need stronger evidence (closer to class 1's mean) to predict it.
With different variances:
This creates interesting effects: a point far from both means may be classified as the high-variance class, because it "explains" outliers better.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import numpy as npfrom scipy import stats def analyze_boundary_geometry(): """ Analyze decision boundary geometry under different conditions. """ print("=" * 60) print("DECISION BOUNDARY GEOMETRY ANALYSIS") print("=" * 60) # Case 1: Equal variances, equal priors print("--- Case 1: Equal variances, equal priors ---") mu_0, sigma_0, pi_0 = 0, 1, 0.5 mu_1, sigma_1, pi_1 = 2, 1, 0.5 # Find boundary (where posteriors are equal) # For equal variance and prior, boundary is at midpoint boundary_case1 = (mu_0 + mu_1) / 2 print(f"Class 0: μ={mu_0}, σ={sigma_0}, π={pi_0}") print(f"Class 1: μ={mu_1}, σ={sigma_1}, π={pi_1}") print(f"Decision boundary: x = {boundary_case1:.4f}") print(f"This is the midpoint between means: ({mu_0} + {mu_1})/2 = {boundary_case1}") # Case 2: Equal variances, unequal priors print("--- Case 2: Equal variances, unequal priors ---") pi_0, pi_1 = 0.8, 0.2 # Boundary shifts toward lower-prior class # log(π_1/π_0) + (x-μ_0)²/(2σ²) - (x-μ_1)²/(2σ²) = 0 # For equal variance: boundary at x where log(π_1/π_0) = (x-μ_1)² - (x-μ_0)² / (2σ²) sigma_sq = sigma_0 ** 2 log_prior_ratio = np.log(pi_1 / pi_0) # Simplify: log_prior_ratio = [(x-μ_1)² - (x-μ_0)²] / (2σ²) # = [2x(μ_0 - μ_1) + μ_1² - μ_0²] / (2σ²) # Solve for x: # 2x(μ_0 - μ_1) = 2σ² * log_prior_ratio - (μ_1² - μ_0²) boundary_case2 = (sigma_sq * log_prior_ratio - (mu_1**2 - mu_0**2)/2) / (mu_0 - mu_1) print(f"Class 0: μ={mu_0}, σ={sigma_0}, π={pi_0}") print(f"Class 1: μ={mu_1}, σ={sigma_1}, π={pi_1}") print(f"Decision boundary: x = {boundary_case2:.4f}") print(f"Shifted toward class 1 (lower prior)") # Case 3: Unequal variances print("--- Case 3: Unequal variances ---") sigma_0, sigma_1 = 0.5, 2.0 pi_0, pi_1 = 0.5, 0.5 # Quadratic boundary: need to solve quadratic equation # Coefficient of x²: 1/(2σ_0²) - 1/(2σ_1²) # This is positive since σ_0 < σ_1, meaning parabola opens upward a = 1/(2*sigma_0**2) - 1/(2*sigma_1**2) b = mu_1/sigma_1**2 - mu_0/sigma_0**2 c = (mu_0**2/(2*sigma_0**2) - mu_1**2/(2*sigma_1**2) + np.log(pi_1/pi_0) - 0.5*np.log(sigma_1**2/sigma_0**2)) # Solve ax² + bx + c = 0 discriminant = b**2 - 4*a*c if discriminant >= 0: x1 = (-b - np.sqrt(discriminant)) / (2*a) x2 = (-b + np.sqrt(discriminant)) / (2*a) print(f"Class 0: μ={mu_0}, σ={sigma_0} (tighter)") print(f"Class 1: μ={mu_1}, σ={sigma_1} (wider)") print(f"Decision boundaries: x = {x1:.4f} and x = {x2:.4f}") print(f"Two boundaries because tighter class wins near its mean,") print(f"wider class wins in the tails.") # Verify by computing posteriors at boundary print("--- Verification at boundaries ---") for x in [x1, x2]: log_lik_0 = stats.norm(mu_0, sigma_0).logpdf(x) log_lik_1 = stats.norm(mu_1, sigma_1).logpdf(x) log_post_0 = log_lik_0 + np.log(pi_0) log_post_1 = log_lik_1 + np.log(pi_1) print(f"At x = {x:.4f}:") print(f" log P(0|x) ∝ {log_post_0:.4f}") print(f" log P(1|x) ∝ {log_post_1:.4f}") print(f" Difference: {abs(log_post_0 - log_post_1):.6f} (≈0 confirms boundary)") analyze_boundary_geometry()With more than two classes, decision boundaries form a more complex partition of feature space.
Between each pair of classes $(j, k)$, there is a boundary surface: $$\mathcal{B}_{jk} = {\mathbf{x} : g_j(\mathbf{x}) = g_k(\mathbf{x})}$$
Where $g_k(\mathbf{x}) = \log \pi_k + \log f(\mathbf{x} | y = k)$ is the log-posterior for class $k$.
The decision region for class $k$ is where $k$ beats all other classes: $$\mathcal{R}_k = {\mathbf{x} : g_k(\mathbf{x}) \geq g_j(\mathbf{x}) \text{ for all } j}$$
With $K$ classes:
Consider three classes in 2D with equal variances:
The boundaries form three lines (with equal priors):
These three lines meet at a triple point where all three posteriors are equal.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import numpy as npfrom itertools import combinations def analyze_multiclass_boundaries(): """ Analyze decision boundaries for multi-class Gaussian NB. """ print("=" * 60) print("MULTI-CLASS DECISION BOUNDARIES") print("=" * 60) # Three classes in 2D class_params = { 0: {'mu': np.array([-2, 0]), 'sigma': 1.0, 'prior': 1/3}, 1: {'mu': np.array([2, 0]), 'sigma': 1.0, 'prior': 1/3}, 2: {'mu': np.array([0, 2]), 'sigma': 1.0, 'prior': 1/3}, } print("Class parameters:") for k, params in class_params.items(): print(f" Class {k}: μ = {params['mu']}, σ = {params['sigma']}, π = {params['prior']:.3f}") def log_posterior(x, k): """Compute log-posterior (up to constant) for class k at point x.""" params = class_params[k] mu, sigma, prior = params['mu'], params['sigma'], params['prior'] # Log-likelihood (summed over features) log_lik = -np.sum((x - mu)**2) / (2 * sigma**2) log_lik -= len(mu) * np.log(2 * np.pi * sigma**2) / 2 return log_lik + np.log(prior) def classify(x): """Return predicted class for point x.""" posteriors = [log_posterior(x, k) for k in range(3)] return np.argmax(posteriors) # Find pairwise boundary equations print("--- Pairwise Boundaries (with equal σ and π) ---") print("For equal variances and priors, boundaries are perpendicular bisectors.") for (i, j) in combinations(range(3), 2): mu_i = class_params[i]['mu'] mu_j = class_params[j]['mu'] midpoint = (mu_i + mu_j) / 2 direction = mu_j - mu_i # Normal to boundary is direction between means # Boundary passes through midpoint print(f"Boundary {i}-{j}:") print(f" Midpoint: {midpoint}") print(f" Normal direction: {direction}") print(f" Equation: ({direction[0]:.1f})·x₁ + ({direction[1]:.1f})·x₂ = {np.dot(direction, midpoint):.1f}") # Find triple point (where all three posteriors are equal) print("--- Triple Point ---") # For equal σ and π, triple point is the centroid of the triangle centroid = np.mean([class_params[k]['mu'] for k in range(3)], axis=0) print(f"Triple point (centroid of means): {centroid}") # Verify all posteriors are equal at triple point posteriors = [log_posterior(centroid, k) for k in range(3)] print(f"Log-posteriors at triple point: {[f'{p:.4f}' for p in posteriors]}") print(f"All equal: {np.allclose(posteriors[0], posteriors[1]) and np.allclose(posteriors[1], posteriors[2])}") # Sample points and their classifications print("--- Sample Classifications ---") test_points = [ np.array([-3, 0]), np.array([3, 0]), np.array([0, 3]), np.array([0, 0]), centroid, ] for x in test_points: pred = classify(x) posteriors = [log_posterior(x, k) for k in range(3)] print(f"x = {x} → Class {pred} (log-posteriors: {[f'{p:.2f}' for p in posteriors]})") analyze_multiclass_boundaries()Two-dimensional visualizations provide invaluable insight into decision boundary behavior. Let's analyze how to create and interpret these visualizations.
Boundary location and shape:
Misclassified points:
Region structure:
Beyond the hard decision boundary, we can visualize probability contours:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
import numpy as npfrom scipy import statsfrom sklearn.naive_bayes import GaussianNB def create_boundary_visualization_data(): """ Create data and compute decision boundaries for visualization. Returns data that could be used with matplotlib or other plotting libraries. """ print("=" * 60) print("DECISION BOUNDARY VISUALIZATION DATA") print("=" * 60) # Generate 2D dataset with two classes np.random.seed(42) n_per_class = 100 # Class 0: centered at (2, 2) with variance 1 X_0 = np.random.randn(n_per_class, 2) * 1.0 + np.array([2, 2]) # Class 1: centered at (5, 5) with variance 1.5 X_1 = np.random.randn(n_per_class, 2) * 1.5 + np.array([5, 5]) X = np.vstack([X_0, X_1]) y = np.array([0] * n_per_class + [1] * n_per_class) # Fit Gaussian NB gnb = GaussianNB() gnb.fit(X, y) print(f"Fitted Gaussian NB:") print(f"Class 0: μ = {gnb.theta_[0].round(3)}, σ² = {gnb.var_[0].round(3)}") print(f"Class 1: μ = {gnb.theta_[1].round(3)}, σ² = {gnb.var_[1].round(3)}") # Create mesh for decision boundary x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid( np.linspace(x_min, x_max, 200), np.linspace(y_min, y_max, 200) ) # Predict on mesh mesh_points = np.c_[xx.ravel(), yy.ravel()] Z = gnb.predict(mesh_points).reshape(xx.shape) Z_proba = gnb.predict_proba(mesh_points)[:, 1].reshape(xx.shape) # Analyze boundary print("--- Boundary Analysis ---") # Find approximate boundary points (where probability ≈ 0.5) boundary_mask = np.abs(Z_proba - 0.5) < 0.02 boundary_points = mesh_points[boundary_mask.ravel()] if len(boundary_points) > 0: boundary_x1 = boundary_points[:, 0] boundary_x2 = boundary_points[:, 1] print(f"Approximate boundary x₁ range: [{boundary_x1.min():.2f}, {boundary_x1.max():.2f}]") print(f"Approximate boundary x₂ range: [{boundary_x2.min():.2f}, {boundary_x2.max():.2f}]") # Check if boundary is approximately linear if len(boundary_points) > 10: # Fit a line to boundary points A = np.column_stack([boundary_x1, np.ones_like(boundary_x1)]) slope, intercept = np.linalg.lstsq(A, boundary_x2, rcond=None)[0] residuals = boundary_x2 - (slope * boundary_x1 + intercept) r_squared = 1 - np.var(residuals) / np.var(boundary_x2) print(f"Linear fit to boundary: x₂ = {slope:.3f}·x₁ + {intercept:.3f}") print(f"R² of linear fit: {r_squared:.4f}") if r_squared > 0.99: print("→ Boundary is approximately LINEAR") else: print("→ Boundary is CURVED (quadratic form)") # Sample predictions print("--- Sample Predictions ---") sample_points = [ np.array([2, 2]), # Center of class 0 np.array([5, 5]), # Center of class 1 np.array([3.5, 3.5]), # Between classes np.array([1, 1]), # Deep in class 0 territory np.array([7, 7]), # Deep in class 1 territory ] for point in sample_points: pred = gnb.predict([point])[0] proba = gnb.predict_proba([point])[0] print(f"({point[0]:.1f}, {point[1]:.1f}) → Class {pred}, P(0)={proba[0]:.3f}, P(1)={proba[1]:.3f}") return { 'X': X, 'y': y, 'xx': xx, 'yy': yy, 'Z_class': Z, 'Z_proba': Z_proba, 'model': gnb } def analyze_variance_effect_on_boundary(): """ Show how different variances create curved boundaries. """ print("" + "=" * 60) print("EFFECT OF VARIANCE ON BOUNDARY SHAPE") print("=" * 60) np.random.seed(42) n = 100 # Case 1: Equal variances → Linear boundary print("--- Case 1: Equal Variances ---") X_eq = np.vstack([ np.random.randn(n, 2) * 1.0 + [0, 0], np.random.randn(n, 2) * 1.0 + [3, 3] ]) y_eq = np.array([0]*n + [1]*n) gnb_eq = GaussianNB() gnb_eq.fit(X_eq, y_eq) print(f"Class 0 variance: {gnb_eq.var_[0].round(4)}") print(f"Class 1 variance: {gnb_eq.var_[1].round(4)}") print(f"Variance ratio: {(gnb_eq.var_[1] / gnb_eq.var_[0]).round(3)}") print("Expected boundary: LINEAR (equal variances)") # Case 2: Unequal variances → Curved boundary print("--- Case 2: Unequal Variances ---") X_uneq = np.vstack([ np.random.randn(n, 2) * 0.5 + [0, 0], # Tight cluster np.random.randn(n, 2) * 2.0 + [3, 3] # Spread cluster ]) y_uneq = np.array([0]*n + [1]*n) gnb_uneq = GaussianNB() gnb_uneq.fit(X_uneq, y_uneq) print(f"Class 0 variance: {gnb_uneq.var_[0].round(4)}") print(f"Class 1 variance: {gnb_uneq.var_[1].round(4)}") print(f"Variance ratio: {(gnb_uneq.var_[1] / gnb_uneq.var_[0]).round(3)}") print("Expected boundary: QUADRATIC (unequal variances)") # Run demonstrationscreate_boundary_visualization_data()analyze_variance_effect_on_boundary()We have developed a comprehensive understanding of decision boundaries in Gaussian Naive Bayes. Let us consolidate the key concepts:
What's next:
The observation that equal variances produce linear boundaries hints at a deep connection between Gaussian Naive Bayes and another classical method: Linear Discriminant Analysis (LDA). The next page explores this connection in detail, revealing how both methods relate to each other and to Quadratic Discriminant Analysis (QDA).
You now understand the geometry of Gaussian Naive Bayes decision boundaries. The boundary shape—linear or quadratic—is determined by whether variances are equal across classes. This geometric insight helps predict classifier behavior and explains why Gaussian NB with equal variances behaves like linear classifiers. Next, we explore the formal connection to Discriminant Analysis.