Gaussian Naive Bayes - Learning Module

Loading content...

0/278

Decision Boundary

The Geography of Classification

Every classifier implicitly or explicitly divides the feature space into regions—territories where different classes reign. The boundaries between these territories are decision boundaries, the surfaces where the classifier's confidence shifts from one class to another.

In Gaussian Naive Bayes, these boundaries are not arbitrary. They emerge mathematically from the Gaussian assumptions: the means, variances, and priors we estimated from training data determine exactly where one class's region ends and another's begins.

Understanding decision boundaries provides profound insight into classifier behavior:

Where does the classifier make mistakes?
Why might two similar points receive different labels?
How do changes in parameters shift the boundaries?
What shape of boundary is appropriate for your problem?

This page derives the decision boundary equations for Gaussian Naive Bayes, reveals when boundaries are linear versus quadratic, and develops intuition through visualization.

What You Will Learn

By the end of this page, you will understand: (1) the mathematical definition of decision boundaries, (2) derivation of the boundary equation for Gaussian NB, (3) conditions for linear versus quadratic boundaries, (4) the role of equal variances in producing linear boundaries, (5) visualization and interpretation of boundaries in 2D, and (6) how class priors shift boundaries.

Definition of Decision Boundary

Before deriving the specific form for Gaussian Naive Bayes, let us establish the general concepts.

Classification Regions

A classifier assigns each point $\mathbf{x}$ in feature space to one of $K$ classes. This partitions the space into decision regions:

$$\mathcal{R}_k = {\mathbf{x} : \hat{y}(\mathbf{x}) = k}$$

The decision region for class $k$ is the set of all points that get classified as class $k$.

Decision Boundary

The decision boundary is the set of points where the classifier is indifferent between two or more classes—where the posterior probabilities (or scores) are equal:

$$\mathcal{B}_{jk} = {\mathbf{x} : P(y = j | \mathbf{x}) = P(y = k | \mathbf{x})}$$

For the boundary between classes $j$ and $k$.

Binary Classification

In binary classification ($K = 2$), there's a single boundary separating classes 0 and 1:

$$\mathcal{B} = {\mathbf{x} : P(y = 1 | \mathbf{x}) = P(y = 0 | \mathbf{x}) = 0.5}$$

Points where $P(y = 1 | \mathbf{x}) > 0.5$ belong to class 1's region; otherwise class 0.

Multi-class Classification

With $K > 2$ classes, boundaries become more complex:

There are ${K \choose 2} = \frac{K(K-1)}{2}$ pairwise boundaries
The overall decision surface is the union of all pairwise boundaries
Decision regions form a Voronoi-like tessellation of feature space

Log-Odds Formulation

In binary classification, the boundary is often characterized by the log-odds (logit): $\log\frac{P(y=1|\mathbf{x})}{P(y=0|\mathbf{x})} = 0$. Points with positive log-odds go to class 1; negative to class 0. This formulation connects Gaussian NB to logistic regression.

Deriving the Gaussian NB Boundary

Let us derive the decision boundary equation for Gaussian Naive Bayes with two classes. The analysis extends naturally to multiple classes.

Setup

For classes 0 and 1 with:

Priors: $\pi_0$, $\pi_1$
Feature means: $\mu_{j0}$, $\mu_{j1}$ for each feature $j$
Feature variances: $\sigma^2_{j0}$, $\sigma^2_{j1}$ for each feature $j$

Log-Posterior Difference

The boundary is where: $$\log P(y = 1 | \mathbf{x}) = \log P(y = 0 | \mathbf{x})$$

Using Bayes' theorem: $$\log \pi_1 + \log f(\mathbf{x} | y = 1) = \log \pi_0 + \log f(\mathbf{x} | y = 0)$$

Substituting the Gaussian Naive Bayes factorization: $$\log \pi_1 + \sum_{j=1}^{d} \log f(x_j | y = 1) = \log \pi_0 + \sum_{j=1}^{d} \log f(x_j | y = 0)$$

Expanding the Log-Gaussian

Recall: $$\log f(x_j | y = k) = -\frac{(x_j - \mu_{jk})^2}{2\sigma^2_{jk}} - \frac{1}{2}\log(2\pi\sigma^2_{jk})$$

Substituting and rearranging: $$\log\frac{\pi_1}{\pi_0} + \sum_{j=1}^{d} \left[ -\frac{(x_j - \mu_{j1})^2}{2\sigma^2_{j1}} + \frac{(x_j - \mu_{j0})^2}{2\sigma^2_{j0}} - \frac{1}{2}\log\frac{\sigma^2_{j1}}{\sigma^2_{j0}} \right] = 0$$

The General Boundary Equation

Let's define: $$g(\mathbf{x}) = \log\frac{\pi_1}{\pi_0} + \sum_{j=1}^{d} \left[ \frac{(x_j - \mu_{j0})^2}{2\sigma^2_{j0}} - \frac{(x_j - \mu_{j1})^2}{2\sigma^2_{j1}} - \frac{1}{2}\log\frac{\sigma^2_{j1}}{\sigma^2_{j0}} \right]$$

Then:

$g(\mathbf{x}) > 0 \Rightarrow$ classify as class 1
$g(\mathbf{x}) < 0 \Rightarrow$ classify as class 0
$g(\mathbf{x}) = 0 \Rightarrow$ decision boundary

Quadratic Form

Notice that $g(\mathbf{x})$ contains terms like $(x_j - \mu_{jk})^2$, which are quadratic in $x_j$. This means the boundary equation is generally quadratic in the features. However, under special conditions, the quadratic terms cancel, leaving a linear boundary.

Quadratic vs Linear Boundaries

The shape of the Gaussian NB decision boundary depends critically on whether variances differ across classes. This distinction has profound implications.

The Quadratic Terms

Expanding the boundary equation for feature $j$: $$\frac{(x_j - \mu_{j0})^2}{2\sigma^2_{j0}} - \frac{(x_j - \mu_{j1})^2}{2\sigma^2_{j1}}$$

$$= \frac{x_j^2 - 2x_j\mu_{j0} + \mu_{j0}^2}{2\sigma^2_{j0}} - \frac{x_j^2 - 2x_j\mu_{j1} + \mu_{j1}^2}{2\sigma^2_{j1}}$$

$$= x_j^2 \left( \frac{1}{2\sigma^2_{j0}} - \frac{1}{2\sigma^2_{j1}} \right) + x_j \left( \frac{\mu_{j1}}{\sigma^2_{j1}} - \frac{\mu_{j0}}{\sigma^2_{j0}} \right) + \text{constant}$$

Case 1: Different Variances (Quadratic Boundary)

When $\sigma^2_{j0} eq \sigma^2_{j1}$:

The coefficient of $x_j^2$ is non-zero
The boundary equation is quadratic in $x_j$
The decision boundary is a curved surface (hyperbola, ellipse, or parabola in 2D)

Case 2: Equal Variances (Linear Boundary)

When $\sigma^2_{j0} = \sigma^2_{j1} = \sigma^2_j$ for all features:

The coefficient of $x_j^2$ is $\frac{1}{2\sigma^2_j} - \frac{1}{2\sigma^2_j} = 0$
The quadratic terms cancel completely
The boundary equation becomes linear in $\mathbf{x}$
The decision boundary is a hyperplane

Under equal variances: $$g(\mathbf{x}) = \log\frac{\pi_1}{\pi_0} + \sum_{j=1}^{d} \frac{(\mu_{j1} - \mu_{j0})x_j - \frac{1}{2}(\mu_{j1}^2 - \mu_{j0}^2)}{\sigma^2_j}$$

$$= \mathbf{w}^T\mathbf{x} + b$$

Where: $$w_j = \frac{\mu_{j1} - \mu_{j0}}{\sigma^2_j}, \quad b = \log\frac{\pi_1}{\pi_0} - \frac{1}{2}\sum_{j=1}^{d} \frac{\mu_{j1}^2 - \mu_{j0}^2}{\sigma^2_j}$$

Boundary Type Summary
Condition	Boundary Shape	Equation Form	Complexity
$\sigma^2_{jk}$ varies by class	Quadratic	$x^T A x + b^T x + c = 0$	Curved surfaces, ellipsoids
$\sigma^2_{jk}$ same for all classes	Linear	$w^T x + b = 0$	Hyperplanes
Equal variance + equal priors	Linear (bisector)	$w^T x + b = 0$	Perpendicular bisector

Connection to Discriminant Analysis

Gaussian NB with equal variances produces linear boundaries similar to Linear Discriminant Analysis (LDA). The key difference is that NB assumes diagonal covariance (feature independence), while LDA allows full covariance. This connection is explored in detail in the next page.

Geometric Interpretation

Understanding decision boundaries geometrically provides crucial intuition for classifier behavior.

Equal Variances: Perpendicular Bisector

When both classes have equal variances and equal priors:

The boundary is the perpendicular bisector of the line connecting class means
Each class "wins" the half-space closest to its mean
Distance is measured in Mahalanobis distance (accounting for variance scaling)

Unequal Priors: Shifted Boundary

With unequal priors ($\pi_1 eq \pi_0$):

The boundary shifts toward the class with lower prior
The more-probable class "expands" its territory
A point equidistant from both means goes to the higher-prior class

Intuitively: If class 1 is rare, you need stronger evidence (closer to class 1's mean) to predict it.

Unequal Variances: Curved Boundaries

With different variances:

The class with smaller variance wins near its mean (higher density at peak)
The class with larger variance wins in the tails (density decays slower)
The boundary curves around the tighter-variance class

This creates interesting effects: a point far from both means may be classified as the high-variance class, because it "explains" outliers better.

decision_boundary_geometry.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
from scipy import stats
 
def analyze_boundary_geometry():
    """
    Analyze decision boundary geometry under different conditions.
    """
    print("=" * 60)
    print("DECISION BOUNDARY GEOMETRY ANALYSIS")
    print("=" * 60)
    
    # Case 1: Equal variances, equal priors
    print("
--- Case 1: Equal variances, equal priors ---")
    mu_0, sigma_0, pi_0 = 0, 1, 0.5
    mu_1, sigma_1, pi_1 = 2, 1, 0.5
    
    # Find boundary (where posteriors are equal)
    # For equal variance and prior, boundary is at midpoint
    boundary_case1 = (mu_0 + mu_1) / 2
    
    print(f"Class 0: μ={mu_0}, σ={sigma_0}, π={pi_0}")
    print(f"Class 1: μ={mu_1}, σ={sigma_1}, π={pi_1}")
    print(f"Decision boundary: x = {boundary_case1:.4f}")
    print(f"This is the midpoint between means: ({mu_0} + {mu_1})/2 = {boundary_case1}")
    
    # Case 2: Equal variances, unequal priors
    print("
--- Case 2: Equal variances, unequal priors ---")
    pi_0, pi_1 = 0.8, 0.2
    
    # Boundary shifts toward lower-prior class
    # log(π_1/π_0) + (x-μ_0)²/(2σ²) - (x-μ_1)²/(2σ²) = 0
    # For equal variance: boundary at x where log(π_1/π_0) = (x-μ_1)² - (x-μ_0)² / (2σ²)
    sigma_sq = sigma_0 ** 2
    log_prior_ratio = np.log(pi_1 / pi_0)
    # Simplify: log_prior_ratio = [(x-μ_1)² - (x-μ_0)²] / (2σ²)
    #         = [2x(μ_0 - μ_1) + μ_1² - μ_0²] / (2σ²)
    # Solve for x:
    # 2x(μ_0 - μ_1) = 2σ² * log_prior_ratio - (μ_1² - μ_0²)
    boundary_case2 = (sigma_sq * log_prior_ratio - (mu_1**2 - mu_0**2)/2) / (mu_0 - mu_1)
    
    print(f"Class 0: μ={mu_0}, σ={sigma_0}, π={pi_0}")
    print(f"Class 1: μ={mu_1}, σ={sigma_1}, π={pi_1}")
    print(f"Decision boundary: x = {boundary_case2:.4f}")
    print(f"Shifted toward class 1 (lower prior)")
    
    # Case 3: Unequal variances
    print("
--- Case 3: Unequal variances ---")
    sigma_0, sigma_1 = 0.5, 2.0
    pi_0, pi_1 = 0.5, 0.5
    
    # Quadratic boundary: need to solve quadratic equation
    # Coefficient of x²: 1/(2σ_0²) - 1/(2σ_1²)
    # This is positive since σ_0 < σ_1, meaning parabola opens upward
    
    a = 1/(2*sigma_0**2) - 1/(2*sigma_1**2)
    b = mu_1/sigma_1**2 - mu_0/sigma_0**2
    c = (mu_0**2/(2*sigma_0**2) - mu_1**2/(2*sigma_1**2) + 
         np.log(pi_1/pi_0) - 0.5*np.log(sigma_1**2/sigma_0**2))
    
    # Solve ax² + bx + c = 0
    discriminant = b**2 - 4*a*c
    if discriminant >= 0:
        x1 = (-b - np.sqrt(discriminant)) / (2*a)
        x2 = (-b + np.sqrt(discriminant)) / (2*a)
        print(f"Class 0: μ={mu_0}, σ={sigma_0} (tighter)")
        print(f"Class 1: μ={mu_1}, σ={sigma_1} (wider)")
        print(f"Decision boundaries: x = {x1:.4f} and x = {x2:.4f}")
        print(f"Two boundaries because tighter class wins near its mean,")
        print(f"wider class wins in the tails.")
    
    # Verify by computing posteriors at boundary
    print("
--- Verification at boundaries ---")
    for x in [x1, x2]:
        log_lik_0 = stats.norm(mu_0, sigma_0).logpdf(x)
        log_lik_1 = stats.norm(mu_1, sigma_1).logpdf(x)
        log_post_0 = log_lik_0 + np.log(pi_0)
        log_post_1 = log_lik_1 + np.log(pi_1)
        print(f"At x = {x:.4f}:")
        print(f"  log P(0|x) ∝ {log_post_0:.4f}")
        print(f"  log P(1|x) ∝ {log_post_1:.4f}")
        print(f"  Difference: {abs(log_post_0 - log_post_1):.6f} (≈0 confirms boundary)")
 
analyze_boundary_geometry()

Linear Boundaries Favor

•Classes with similar spread in all features
•Well-separated, linearly separable classes
•Simpler interpretation (hyperplane)
•Fewer parameters to estimate
•Lower variance, more stable predictions

Quadratic Boundaries Favor

•Classes with different spread patterns
•One tight cluster vs one diffuse class
•Complex, non-linear class separation
•More flexibility to fit data
•Better fit when variance genuinely differs

Multi-class Decision Boundaries

With more than two classes, decision boundaries form a more complex partition of feature space.

Pairwise Boundaries

Between each pair of classes $(j, k)$, there is a boundary surface: $$\mathcal{B}_{jk} = {\mathbf{x} : g_j(\mathbf{x}) = g_k(\mathbf{x})}$$

Where $g_k(\mathbf{x}) = \log \pi_k + \log f(\mathbf{x} | y = k)$ is the log-posterior for class $k$.

Decision Regions

The decision region for class $k$ is where $k$ beats all other classes: $$\mathcal{R}_k = {\mathbf{x} : g_k(\mathbf{x}) \geq g_j(\mathbf{x}) \text{ for all } j}$$

Boundary Structure

With $K$ classes:

There are ${K \choose 2} = \frac{K(K-1)}{2}$ pairwise boundaries
Not all pairwise boundaries reach the overall decision surface
Some boundaries may be "hidden" by other classes

Example: Three Classes

Consider three classes in 2D with equal variances:

Class 0 at $(-2, 0)$
Class 1 at $(2, 0)$
Class 2 at $(0, 2)$

The boundaries form three lines (with equal priors):

Boundary 0-1: Vertical line at $x_1 = 0$
Boundary 0-2: Line from upper-left to lower-right
Boundary 1-2: Line from upper-right to lower-left

These three lines meet at a triple point where all three posteriors are equal.

multiclass_boundaries.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
from itertools import combinations
 
def analyze_multiclass_boundaries():
    """
    Analyze decision boundaries for multi-class Gaussian NB.
    """
    print("=" * 60)
    print("MULTI-CLASS DECISION BOUNDARIES")
    print("=" * 60)
    
    # Three classes in 2D
    class_params = {
        0: {'mu': np.array([-2, 0]), 'sigma': 1.0, 'prior': 1/3},
        1: {'mu': np.array([2, 0]), 'sigma': 1.0, 'prior': 1/3},
        2: {'mu': np.array([0, 2]), 'sigma': 1.0, 'prior': 1/3},
    }
    
    print("
Class parameters:")
    for k, params in class_params.items():
        print(f"  Class {k}: μ = {params['mu']}, σ = {params['sigma']}, π = {params['prior']:.3f}")
    
    def log_posterior(x, k):
        """Compute log-posterior (up to constant) for class k at point x."""
        params = class_params[k]
        mu, sigma, prior = params['mu'], params['sigma'], params['prior']
        
        # Log-likelihood (summed over features)
        log_lik = -np.sum((x - mu)**2) / (2 * sigma**2)
        log_lik -= len(mu) * np.log(2 * np.pi * sigma**2) / 2
        
        return log_lik + np.log(prior)
    
    def classify(x):
        """Return predicted class for point x."""
        posteriors = [log_posterior(x, k) for k in range(3)]
        return np.argmax(posteriors)
    
    # Find pairwise boundary equations
    print("
--- Pairwise Boundaries (with equal σ and π) ---")
    print("For equal variances and priors, boundaries are perpendicular bisectors.")
    
    for (i, j) in combinations(range(3), 2):
        mu_i = class_params[i]['mu']
        mu_j = class_params[j]['mu']
        midpoint = (mu_i + mu_j) / 2
        direction = mu_j - mu_i
        
        # Normal to boundary is direction between means
        # Boundary passes through midpoint
        print(f"
Boundary {i}-{j}:")
        print(f"  Midpoint: {midpoint}")
        print(f"  Normal direction: {direction}")
        print(f"  Equation: ({direction[0]:.1f})·x₁ + ({direction[1]:.1f})·x₂ = {np.dot(direction, midpoint):.1f}")
    
    # Find triple point (where all three posteriors are equal)
    print("
--- Triple Point ---")
    # For equal σ and π, triple point is the centroid of the triangle
    centroid = np.mean([class_params[k]['mu'] for k in range(3)], axis=0)
    print(f"Triple point (centroid of means): {centroid}")
    
    # Verify all posteriors are equal at triple point
    posteriors = [log_posterior(centroid, k) for k in range(3)]
    print(f"Log-posteriors at triple point: {[f'{p:.4f}' for p in posteriors]}")
    print(f"All equal: {np.allclose(posteriors[0], posteriors[1]) and np.allclose(posteriors[1], posteriors[2])}")
    
    # Sample points and their classifications
    print("
--- Sample Classifications ---")
    test_points = [
        np.array([-3, 0]),
        np.array([3, 0]),
        np.array([0, 3]),
        np.array([0, 0]),
        centroid,
    ]
    
    for x in test_points:
        pred = classify(x)
        posteriors = [log_posterior(x, k) for k in range(3)]
        print(f"x = {x} → Class {pred} (log-posteriors: {[f'{p:.2f}' for p in posteriors]})")
 
analyze_multiclass_boundaries()

Visualization in Two Dimensions

Two-dimensional visualizations provide invaluable insight into decision boundary behavior. Let's analyze how to create and interpret these visualizations.

Creating a Decision Boundary Plot

Create a mesh grid covering the feature space
Classify each grid point using the trained model
Color regions by predicted class
Overlay training points to show data distribution
Draw contours at probability thresholds (optional)

What to Look For

Boundary location and shape:

Is the boundary where you expect (between class clusters)?
Is it linear or curved?
Does the curvature match variance differences?

Misclassified points:

Are there training points on the "wrong" side of the boundary?
Do they cluster near the boundary (borderline cases)?
Are there systematic patterns in errors?

Region structure:

Are decision regions convex?
Do any regions have disconnected pieces?
Is the boundary stable (not too wiggly)?

Probability Contours

Beyond the hard decision boundary, we can visualize probability contours:

Contours at $P(y=k|\mathbf{x}) = 0.5, 0.7, 0.9$
Show how confidence varies across feature space
Reveal the "soft" transition between classes

boundary_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import numpy as np
from scipy import stats
from sklearn.naive_bayes import GaussianNB
 
def create_boundary_visualization_data():
    """
    Create data and compute decision boundaries for visualization.
    Returns data that could be used with matplotlib or other plotting libraries.
    """
    print("=" * 60)
    print("DECISION BOUNDARY VISUALIZATION DATA")
    print("=" * 60)
    
    # Generate 2D dataset with two classes
    np.random.seed(42)
    n_per_class = 100
    
    # Class 0: centered at (2, 2) with variance 1
    X_0 = np.random.randn(n_per_class, 2) * 1.0 + np.array([2, 2])
    
    # Class 1: centered at (5, 5) with variance 1.5
    X_1 = np.random.randn(n_per_class, 2) * 1.5 + np.array([5, 5])
    
    X = np.vstack([X_0, X_1])
    y = np.array([0] * n_per_class + [1] * n_per_class)
    
    # Fit Gaussian NB
    gnb = GaussianNB()
    gnb.fit(X, y)
    
    print(f"
Fitted Gaussian NB:")
    print(f"Class 0: μ = {gnb.theta_[0].round(3)}, σ² = {gnb.var_[0].round(3)}")
    print(f"Class 1: μ = {gnb.theta_[1].round(3)}, σ² = {gnb.var_[1].round(3)}")
    
    # Create mesh for decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, 200),
        np.linspace(y_min, y_max, 200)
    )
    
    # Predict on mesh
    mesh_points = np.c_[xx.ravel(), yy.ravel()]
    Z = gnb.predict(mesh_points).reshape(xx.shape)
    Z_proba = gnb.predict_proba(mesh_points)[:, 1].reshape(xx.shape)
    
    # Analyze boundary
    print("
--- Boundary Analysis ---")
    
    # Find approximate boundary points (where probability ≈ 0.5)
    boundary_mask = np.abs(Z_proba - 0.5) < 0.02
    boundary_points = mesh_points[boundary_mask.ravel()]
    
    if len(boundary_points) > 0:
        boundary_x1 = boundary_points[:, 0]
        boundary_x2 = boundary_points[:, 1]
        
        print(f"Approximate boundary x₁ range: [{boundary_x1.min():.2f}, {boundary_x1.max():.2f}]")
        print(f"Approximate boundary x₂ range: [{boundary_x2.min():.2f}, {boundary_x2.max():.2f}]")
        
        # Check if boundary is approximately linear
        if len(boundary_points) > 10:
            # Fit a line to boundary points
            A = np.column_stack([boundary_x1, np.ones_like(boundary_x1)])
            slope, intercept = np.linalg.lstsq(A, boundary_x2, rcond=None)[0]
            
            residuals = boundary_x2 - (slope * boundary_x1 + intercept)
            r_squared = 1 - np.var(residuals) / np.var(boundary_x2)
            
            print(f"
Linear fit to boundary: x₂ = {slope:.3f}·x₁ + {intercept:.3f}")
            print(f"R² of linear fit: {r_squared:.4f}")
            
            if r_squared > 0.99:
                print("→ Boundary is approximately LINEAR")
            else:
                print("→ Boundary is CURVED (quadratic form)")
    
    # Sample predictions
    print("
--- Sample Predictions ---")
    sample_points = [
        np.array([2, 2]),   # Center of class 0
        np.array([5, 5]),   # Center of class 1
        np.array([3.5, 3.5]),  # Between classes
        np.array([1, 1]),   # Deep in class 0 territory
        np.array([7, 7]),   # Deep in class 1 territory
    ]
    
    for point in sample_points:
        pred = gnb.predict([point])[0]
        proba = gnb.predict_proba([point])[0]
        print(f"({point[0]:.1f}, {point[1]:.1f}) → Class {pred}, P(0)={proba[0]:.3f}, P(1)={proba[1]:.3f}")
    
    return {
        'X': X,
        'y': y,
        'xx': xx,
        'yy': yy,
        'Z_class': Z,
        'Z_proba': Z_proba,
        'model': gnb
    }
 
 
def analyze_variance_effect_on_boundary():
    """
    Show how different variances create curved boundaries.
    """
    print("
" + "=" * 60)
    print("EFFECT OF VARIANCE ON BOUNDARY SHAPE")
    print("=" * 60)
    
    np.random.seed(42)
    n = 100
    
    # Case 1: Equal variances → Linear boundary
    print("
--- Case 1: Equal Variances ---")
    X_eq = np.vstack([
        np.random.randn(n, 2) * 1.0 + [0, 0],
        np.random.randn(n, 2) * 1.0 + [3, 3]
    ])
    y_eq = np.array([0]*n + [1]*n)
    
    gnb_eq = GaussianNB()
    gnb_eq.fit(X_eq, y_eq)
    
    print(f"Class 0 variance: {gnb_eq.var_[0].round(4)}")
    print(f"Class 1 variance: {gnb_eq.var_[1].round(4)}")
    print(f"Variance ratio: {(gnb_eq.var_[1] / gnb_eq.var_[0]).round(3)}")
    print("Expected boundary: LINEAR (equal variances)")
    
    # Case 2: Unequal variances → Curved boundary
    print("
--- Case 2: Unequal Variances ---")
    X_uneq = np.vstack([
        np.random.randn(n, 2) * 0.5 + [0, 0],  # Tight cluster
        np.random.randn(n, 2) * 2.0 + [3, 3]   # Spread cluster
    ])
    y_uneq = np.array([0]*n + [1]*n)
    
    gnb_uneq = GaussianNB()
    gnb_uneq.fit(X_uneq, y_uneq)
    
    print(f"Class 0 variance: {gnb_uneq.var_[0].round(4)}")
    print(f"Class 1 variance: {gnb_uneq.var_[1].round(4)}")
    print(f"Variance ratio: {(gnb_uneq.var_[1] / gnb_uneq.var_[0]).round(3)}")
    print("Expected boundary: QUADRATIC (unequal variances)")
 
 
# Run demonstrations
create_boundary_visualization_data()
analyze_variance_effect_on_boundary()

Summary: Decision Boundaries

We have developed a comprehensive understanding of decision boundaries in Gaussian Naive Bayes. Let us consolidate the key concepts:

Key Concepts

•Decision boundary definition: Surface where posterior probabilities are equal between classes
•General form: Boundary equation derived from comparing log-posteriors across classes
•Quadratic terms: Arise from differing variances; coefficient is $(1/\sigma_0^2 - 1/\sigma_1^2)$
•Linear boundaries: Emerge when all variances are equal; quadratic terms cancel
•Prior effect: Unequal priors shift boundary toward lower-prior class
•Variance effect: Larger-variance class wins in tails; smaller-variance class wins near its mean
•Multi-class: Pairwise boundaries create Voronoi-like tessellation with potential triple points

What's next:

The observation that equal variances produce linear boundaries hints at a deep connection between Gaussian Naive Bayes and another classical method: Linear Discriminant Analysis (LDA). The next page explores this connection in detail, revealing how both methods relate to each other and to Quadratic Discriminant Analysis (QDA).

Page Complete

You now understand the geometry of Gaussian Naive Bayes decision boundaries. The boundary shape—linear or quadratic—is determined by whether variances are equal across classes. This geometric insight helps predict classifier behavior and explains why Gaussian NB with equal variances behaves like linear classifiers. Next, we explore the formal connection to Discriminant Analysis.

Decision Boundary

The Geography of Classification

Understanding decision boundaries provides profound insight into classifier behavior:

Where does the classifier make mistakes?
Why might two similar points receive different labels?
How do changes in parameters shift the boundaries?
What shape of boundary is appropriate for your problem?

This page derives the decision boundary equations for Gaussian Naive Bayes, reveals when boundaries are linear versus quadratic, and develops intuition through visualization.

What You Will Learn

Definition of Decision Boundary

Before deriving the specific form for Gaussian Naive Bayes, let us establish the general concepts.

Classification Regions

A classifier assigns each point $\mathbf{x}$ in feature space to one of $K$ classes. This partitions the space into decision regions:

$$\mathcal{R}_k = {\mathbf{x} : \hat{y}(\mathbf{x}) = k}$$

The decision region for class $k$ is the set of all points that get classified as class $k$.

Decision Boundary

The decision boundary is the set of points where the classifier is indifferent between two or more classes—where the posterior probabilities (or scores) are equal:

$$\mathcal{B}_{jk} = {\mathbf{x} : P(y = j | \mathbf{x}) = P(y = k | \mathbf{x})}$$

For the boundary between classes $j$ and $k$.

Binary Classification

In binary classification ($K = 2$), there's a single boundary separating classes 0 and 1:

$$\mathcal{B} = {\mathbf{x} : P(y = 1 | \mathbf{x}) = P(y = 0 | \mathbf{x}) = 0.5}$$

Points where $P(y = 1 | \mathbf{x}) > 0.5$ belong to class 1's region; otherwise class 0.

Multi-class Classification

With $K > 2$ classes, boundaries become more complex:

There are ${K \choose 2} = \frac{K(K-1)}{2}$ pairwise boundaries
The overall decision surface is the union of all pairwise boundaries
Decision regions form a Voronoi-like tessellation of feature space

Log-Odds Formulation

Deriving the Gaussian NB Boundary

Let us derive the decision boundary equation for Gaussian Naive Bayes with two classes. The analysis extends naturally to multiple classes.

Setup

For classes 0 and 1 with:

Priors: $\pi_0$, $\pi_1$
Feature means: $\mu_{j0}$, $\mu_{j1}$ for each feature $j$
Feature variances: $\sigma^2_{j0}$, $\sigma^2_{j1}$ for each feature $j$

Log-Posterior Difference

The boundary is where: $$\log P(y = 1 | \mathbf{x}) = \log P(y = 0 | \mathbf{x})$$

Using Bayes' theorem: $$\log \pi_1 + \log f(\mathbf{x} | y = 1) = \log \pi_0 + \log f(\mathbf{x} | y = 0)$$

Substituting the Gaussian Naive Bayes factorization: $$\log \pi_1 + \sum_{j=1}^{d} \log f(x_j | y = 1) = \log \pi_0 + \sum_{j=1}^{d} \log f(x_j | y = 0)$$

Expanding the Log-Gaussian

Recall: $$\log f(x_j | y = k) = -\frac{(x_j - \mu_{jk})^2}{2\sigma^2_{jk}} - \frac{1}{2}\log(2\pi\sigma^2_{jk})$$

The General Boundary Equation

Then:

$g(\mathbf{x}) > 0 \Rightarrow$ classify as class 1
$g(\mathbf{x}) < 0 \Rightarrow$ classify as class 0
$g(\mathbf{x}) = 0 \Rightarrow$ decision boundary

Quadratic Form

Quadratic vs Linear Boundaries

The shape of the Gaussian NB decision boundary depends critically on whether variances differ across classes. This distinction has profound implications.

The Quadratic Terms

Expanding the boundary equation for feature $j$: $$\frac{(x_j - \mu_{j0})^2}{2\sigma^2_{j0}} - \frac{(x_j - \mu_{j1})^2}{2\sigma^2_{j1}}$$

$$= \frac{x_j^2 - 2x_j\mu_{j0} + \mu_{j0}^2}{2\sigma^2_{j0}} - \frac{x_j^2 - 2x_j\mu_{j1} + \mu_{j1}^2}{2\sigma^2_{j1}}$$

$$= x_j^2 \left( \frac{1}{2\sigma^2_{j0}} - \frac{1}{2\sigma^2_{j1}} \right) + x_j \left( \frac{\mu_{j1}}{\sigma^2_{j1}} - \frac{\mu_{j0}}{\sigma^2_{j0}} \right) + \text{constant}$$

Case 1: Different Variances (Quadratic Boundary)

When $\sigma^2_{j0} eq \sigma^2_{j1}$:

The coefficient of $x_j^2$ is non-zero
The boundary equation is quadratic in $x_j$
The decision boundary is a curved surface (hyperbola, ellipse, or parabola in 2D)

Case 2: Equal Variances (Linear Boundary)

When $\sigma^2_{j0} = \sigma^2_{j1} = \sigma^2_j$ for all features:

The coefficient of $x_j^2$ is $\frac{1}{2\sigma^2_j} - \frac{1}{2\sigma^2_j} = 0$
The quadratic terms cancel completely
The boundary equation becomes linear in $\mathbf{x}$
The decision boundary is a hyperplane

Under equal variances: $$g(\mathbf{x}) = \log\frac{\pi_1}{\pi_0} + \sum_{j=1}^{d} \frac{(\mu_{j1} - \mu_{j0})x_j - \frac{1}{2}(\mu_{j1}^2 - \mu_{j0}^2)}{\sigma^2_j}$$

$$= \mathbf{w}^T\mathbf{x} + b$$

Where: $$w_j = \frac{\mu_{j1} - \mu_{j0}}{\sigma^2_j}, \quad b = \log\frac{\pi_1}{\pi_0} - \frac{1}{2}\sum_{j=1}^{d} \frac{\mu_{j1}^2 - \mu_{j0}^2}{\sigma^2_j}$$

Boundary Type Summary
Condition	Boundary Shape	Equation Form	Complexity
$\sigma^2_{jk}$ varies by class	Quadratic	$x^T A x + b^T x + c = 0$	Curved surfaces, ellipsoids
$\sigma^2_{jk}$ same for all classes	Linear	$w^T x + b = 0$	Hyperplanes
Equal variance + equal priors	Linear (bisector)	$w^T x + b = 0$	Perpendicular bisector

Connection to Discriminant Analysis

Geometric Interpretation

Understanding decision boundaries geometrically provides crucial intuition for classifier behavior.

Equal Variances: Perpendicular Bisector

When both classes have equal variances and equal priors:

The boundary is the perpendicular bisector of the line connecting class means
Each class "wins" the half-space closest to its mean
Distance is measured in Mahalanobis distance (accounting for variance scaling)

Unequal Priors: Shifted Boundary

With unequal priors ($\pi_1 eq \pi_0$):

The boundary shifts toward the class with lower prior
The more-probable class "expands" its territory
A point equidistant from both means goes to the higher-prior class

Intuitively: If class 1 is rare, you need stronger evidence (closer to class 1's mean) to predict it.

Unequal Variances: Curved Boundaries

With different variances:

The class with smaller variance wins near its mean (higher density at peak)
The class with larger variance wins in the tails (density decays slower)
The boundary curves around the tighter-variance class

This creates interesting effects: a point far from both means may be classified as the high-variance class, because it "explains" outliers better.

decision_boundary_geometry.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
from scipy import stats
 
def analyze_boundary_geometry():
    """
    Analyze decision boundary geometry under different conditions.
    """
    print("=" * 60)
    print("DECISION BOUNDARY GEOMETRY ANALYSIS")
    print("=" * 60)
    
    # Case 1: Equal variances, equal priors
    print("
--- Case 1: Equal variances, equal priors ---")
    mu_0, sigma_0, pi_0 = 0, 1, 0.5
    mu_1, sigma_1, pi_1 = 2, 1, 0.5
    
    # Find boundary (where posteriors are equal)
    # For equal variance and prior, boundary is at midpoint
    boundary_case1 = (mu_0 + mu_1) / 2
    
    print(f"Class 0: μ={mu_0}, σ={sigma_0}, π={pi_0}")
    print(f"Class 1: μ={mu_1}, σ={sigma_1}, π={pi_1}")
    print(f"Decision boundary: x = {boundary_case1:.4f}")
    print(f"This is the midpoint between means: ({mu_0} + {mu_1})/2 = {boundary_case1}")
    
    # Case 2: Equal variances, unequal priors
    print("
--- Case 2: Equal variances, unequal priors ---")
    pi_0, pi_1 = 0.8, 0.2
    
    # Boundary shifts toward lower-prior class
    # log(π_1/π_0) + (x-μ_0)²/(2σ²) - (x-μ_1)²/(2σ²) = 0
    # For equal variance: boundary at x where log(π_1/π_0) = (x-μ_1)² - (x-μ_0)² / (2σ²)
    sigma_sq = sigma_0 ** 2
    log_prior_ratio = np.log(pi_1 / pi_0)
    # Simplify: log_prior_ratio = [(x-μ_1)² - (x-μ_0)²] / (2σ²)
    #         = [2x(μ_0 - μ_1) + μ_1² - μ_0²] / (2σ²)
    # Solve for x:
    # 2x(μ_0 - μ_1) = 2σ² * log_prior_ratio - (μ_1² - μ_0²)
    boundary_case2 = (sigma_sq * log_prior_ratio - (mu_1**2 - mu_0**2)/2) / (mu_0 - mu_1)
    
    print(f"Class 0: μ={mu_0}, σ={sigma_0}, π={pi_0}")
    print(f"Class 1: μ={mu_1}, σ={sigma_1}, π={pi_1}")
    print(f"Decision boundary: x = {boundary_case2:.4f}")
    print(f"Shifted toward class 1 (lower prior)")
    
    # Case 3: Unequal variances
    print("
--- Case 3: Unequal variances ---")
    sigma_0, sigma_1 = 0.5, 2.0
    pi_0, pi_1 = 0.5, 0.5
    
    # Quadratic boundary: need to solve quadratic equation
    # Coefficient of x²: 1/(2σ_0²) - 1/(2σ_1²)
    # This is positive since σ_0 < σ_1, meaning parabola opens upward
    
    a = 1/(2*sigma_0**2) - 1/(2*sigma_1**2)
    b = mu_1/sigma_1**2 - mu_0/sigma_0**2
    c = (mu_0**2/(2*sigma_0**2) - mu_1**2/(2*sigma_1**2) + 
         np.log(pi_1/pi_0) - 0.5*np.log(sigma_1**2/sigma_0**2))
    
    # Solve ax² + bx + c = 0
    discriminant = b**2 - 4*a*c
    if discriminant >= 0:
        x1 = (-b - np.sqrt(discriminant)) / (2*a)
        x2 = (-b + np.sqrt(discriminant)) / (2*a)
        print(f"Class 0: μ={mu_0}, σ={sigma_0} (tighter)")
        print(f"Class 1: μ={mu_1}, σ={sigma_1} (wider)")
        print(f"Decision boundaries: x = {x1:.4f} and x = {x2:.4f}")
        print(f"Two boundaries because tighter class wins near its mean,")
        print(f"wider class wins in the tails.")
    
    # Verify by computing posteriors at boundary
    print("
--- Verification at boundaries ---")
    for x in [x1, x2]:
        log_lik_0 = stats.norm(mu_0, sigma_0).logpdf(x)
        log_lik_1 = stats.norm(mu_1, sigma_1).logpdf(x)
        log_post_0 = log_lik_0 + np.log(pi_0)
        log_post_1 = log_lik_1 + np.log(pi_1)
        print(f"At x = {x:.4f}:")
        print(f"  log P(0|x) ∝ {log_post_0:.4f}")
        print(f"  log P(1|x) ∝ {log_post_1:.4f}")
        print(f"  Difference: {abs(log_post_0 - log_post_1):.6f} (≈0 confirms boundary)")
 
analyze_boundary_geometry()

Linear Boundaries Favor

•Classes with similar spread in all features
•Well-separated, linearly separable classes
•Simpler interpretation (hyperplane)
•Fewer parameters to estimate
•Lower variance, more stable predictions

Quadratic Boundaries Favor

•Classes with different spread patterns
•One tight cluster vs one diffuse class
•Complex, non-linear class separation
•More flexibility to fit data
•Better fit when variance genuinely differs

Multi-class Decision Boundaries

With more than two classes, decision boundaries form a more complex partition of feature space.

Pairwise Boundaries

Between each pair of classes $(j, k)$, there is a boundary surface: $$\mathcal{B}_{jk} = {\mathbf{x} : g_j(\mathbf{x}) = g_k(\mathbf{x})}$$

Where $g_k(\mathbf{x}) = \log \pi_k + \log f(\mathbf{x} | y = k)$ is the log-posterior for class $k$.

Decision Regions

The decision region for class $k$ is where $k$ beats all other classes: $$\mathcal{R}_k = {\mathbf{x} : g_k(\mathbf{x}) \geq g_j(\mathbf{x}) \text{ for all } j}$$

Boundary Structure

With $K$ classes:

There are ${K \choose 2} = \frac{K(K-1)}{2}$ pairwise boundaries
Not all pairwise boundaries reach the overall decision surface
Some boundaries may be "hidden" by other classes

Example: Three Classes

Consider three classes in 2D with equal variances:

Class 0 at $(-2, 0)$
Class 1 at $(2, 0)$
Class 2 at $(0, 2)$

The boundaries form three lines (with equal priors):

Boundary 0-1: Vertical line at $x_1 = 0$
Boundary 0-2: Line from upper-left to lower-right
Boundary 1-2: Line from upper-right to lower-left

These three lines meet at a triple point where all three posteriors are equal.

multiclass_boundaries.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
from itertools import combinations
 
def analyze_multiclass_boundaries():
    """
    Analyze decision boundaries for multi-class Gaussian NB.
    """
    print("=" * 60)
    print("MULTI-CLASS DECISION BOUNDARIES")
    print("=" * 60)
    
    # Three classes in 2D
    class_params = {
        0: {'mu': np.array([-2, 0]), 'sigma': 1.0, 'prior': 1/3},
        1: {'mu': np.array([2, 0]), 'sigma': 1.0, 'prior': 1/3},
        2: {'mu': np.array([0, 2]), 'sigma': 1.0, 'prior': 1/3},
    }
    
    print("
Class parameters:")
    for k, params in class_params.items():
        print(f"  Class {k}: μ = {params['mu']}, σ = {params['sigma']}, π = {params['prior']:.3f}")
    
    def log_posterior(x, k):
        """Compute log-posterior (up to constant) for class k at point x."""
        params = class_params[k]
        mu, sigma, prior = params['mu'], params['sigma'], params['prior']
        
        # Log-likelihood (summed over features)
        log_lik = -np.sum((x - mu)**2) / (2 * sigma**2)
        log_lik -= len(mu) * np.log(2 * np.pi * sigma**2) / 2
        
        return log_lik + np.log(prior)
    
    def classify(x):
        """Return predicted class for point x."""
        posteriors = [log_posterior(x, k) for k in range(3)]
        return np.argmax(posteriors)
    
    # Find pairwise boundary equations
    print("
--- Pairwise Boundaries (with equal σ and π) ---")
    print("For equal variances and priors, boundaries are perpendicular bisectors.")
    
    for (i, j) in combinations(range(3), 2):
        mu_i = class_params[i]['mu']
        mu_j = class_params[j]['mu']
        midpoint = (mu_i + mu_j) / 2
        direction = mu_j - mu_i
        
        # Normal to boundary is direction between means
        # Boundary passes through midpoint
        print(f"
Boundary {i}-{j}:")
        print(f"  Midpoint: {midpoint}")
        print(f"  Normal direction: {direction}")
        print(f"  Equation: ({direction[0]:.1f})·x₁ + ({direction[1]:.1f})·x₂ = {np.dot(direction, midpoint):.1f}")
    
    # Find triple point (where all three posteriors are equal)
    print("
--- Triple Point ---")
    # For equal σ and π, triple point is the centroid of the triangle
    centroid = np.mean([class_params[k]['mu'] for k in range(3)], axis=0)
    print(f"Triple point (centroid of means): {centroid}")
    
    # Verify all posteriors are equal at triple point
    posteriors = [log_posterior(centroid, k) for k in range(3)]
    print(f"Log-posteriors at triple point: {[f'{p:.4f}' for p in posteriors]}")
    print(f"All equal: {np.allclose(posteriors[0], posteriors[1]) and np.allclose(posteriors[1], posteriors[2])}")
    
    # Sample points and their classifications
    print("
--- Sample Classifications ---")
    test_points = [
        np.array([-3, 0]),
        np.array([3, 0]),
        np.array([0, 3]),
        np.array([0, 0]),
        centroid,
    ]
    
    for x in test_points:
        pred = classify(x)
        posteriors = [log_posterior(x, k) for k in range(3)]
        print(f"x = {x} → Class {pred} (log-posteriors: {[f'{p:.2f}' for p in posteriors]})")
 
analyze_multiclass_boundaries()

Visualization in Two Dimensions

Two-dimensional visualizations provide invaluable insight into decision boundary behavior. Let's analyze how to create and interpret these visualizations.

Creating a Decision Boundary Plot

Create a mesh grid covering the feature space
Classify each grid point using the trained model
Color regions by predicted class
Overlay training points to show data distribution
Draw contours at probability thresholds (optional)

What to Look For

Boundary location and shape:

Is the boundary where you expect (between class clusters)?
Is it linear or curved?
Does the curvature match variance differences?

Misclassified points:

Are there training points on the "wrong" side of the boundary?
Do they cluster near the boundary (borderline cases)?
Are there systematic patterns in errors?

Region structure:

Are decision regions convex?
Do any regions have disconnected pieces?
Is the boundary stable (not too wiggly)?

Probability Contours

Beyond the hard decision boundary, we can visualize probability contours:

Contours at $P(y=k|\mathbf{x}) = 0.5, 0.7, 0.9$
Show how confidence varies across feature space
Reveal the "soft" transition between classes

boundary_visualization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import numpy as np
from scipy import stats
from sklearn.naive_bayes import GaussianNB
 
def create_boundary_visualization_data():
    """
    Create data and compute decision boundaries for visualization.
    Returns data that could be used with matplotlib or other plotting libraries.
    """
    print("=" * 60)
    print("DECISION BOUNDARY VISUALIZATION DATA")
    print("=" * 60)
    
    # Generate 2D dataset with two classes
    np.random.seed(42)
    n_per_class = 100
    
    # Class 0: centered at (2, 2) with variance 1
    X_0 = np.random.randn(n_per_class, 2) * 1.0 + np.array([2, 2])
    
    # Class 1: centered at (5, 5) with variance 1.5
    X_1 = np.random.randn(n_per_class, 2) * 1.5 + np.array([5, 5])
    
    X = np.vstack([X_0, X_1])
    y = np.array([0] * n_per_class + [1] * n_per_class)
    
    # Fit Gaussian NB
    gnb = GaussianNB()
    gnb.fit(X, y)
    
    print(f"
Fitted Gaussian NB:")
    print(f"Class 0: μ = {gnb.theta_[0].round(3)}, σ² = {gnb.var_[0].round(3)}")
    print(f"Class 1: μ = {gnb.theta_[1].round(3)}, σ² = {gnb.var_[1].round(3)}")
    
    # Create mesh for decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, 200),
        np.linspace(y_min, y_max, 200)
    )
    
    # Predict on mesh
    mesh_points = np.c_[xx.ravel(), yy.ravel()]
    Z = gnb.predict(mesh_points).reshape(xx.shape)
    Z_proba = gnb.predict_proba(mesh_points)[:, 1].reshape(xx.shape)
    
    # Analyze boundary
    print("
--- Boundary Analysis ---")
    
    # Find approximate boundary points (where probability ≈ 0.5)
    boundary_mask = np.abs(Z_proba - 0.5) < 0.02
    boundary_points = mesh_points[boundary_mask.ravel()]
    
    if len(boundary_points) > 0:
        boundary_x1 = boundary_points[:, 0]
        boundary_x2 = boundary_points[:, 1]
        
        print(f"Approximate boundary x₁ range: [{boundary_x1.min():.2f}, {boundary_x1.max():.2f}]")
        print(f"Approximate boundary x₂ range: [{boundary_x2.min():.2f}, {boundary_x2.max():.2f}]")
        
        # Check if boundary is approximately linear
        if len(boundary_points) > 10:
            # Fit a line to boundary points
            A = np.column_stack([boundary_x1, np.ones_like(boundary_x1)])
            slope, intercept = np.linalg.lstsq(A, boundary_x2, rcond=None)[0]
            
            residuals = boundary_x2 - (slope * boundary_x1 + intercept)
            r_squared = 1 - np.var(residuals) / np.var(boundary_x2)
            
            print(f"
Linear fit to boundary: x₂ = {slope:.3f}·x₁ + {intercept:.3f}")
            print(f"R² of linear fit: {r_squared:.4f}")
            
            if r_squared > 0.99:
                print("→ Boundary is approximately LINEAR")
            else:
                print("→ Boundary is CURVED (quadratic form)")
    
    # Sample predictions
    print("
--- Sample Predictions ---")
    sample_points = [
        np.array([2, 2]),   # Center of class 0
        np.array([5, 5]),   # Center of class 1
        np.array([3.5, 3.5]),  # Between classes
        np.array([1, 1]),   # Deep in class 0 territory
        np.array([7, 7]),   # Deep in class 1 territory
    ]
    
    for point in sample_points:
        pred = gnb.predict([point])[0]
        proba = gnb.predict_proba([point])[0]
        print(f"({point[0]:.1f}, {point[1]:.1f}) → Class {pred}, P(0)={proba[0]:.3f}, P(1)={proba[1]:.3f}")
    
    return {
        'X': X,
        'y': y,
        'xx': xx,
        'yy': yy,
        'Z_class': Z,
        'Z_proba': Z_proba,
        'model': gnb
    }
 
 
def analyze_variance_effect_on_boundary():
    """
    Show how different variances create curved boundaries.
    """
    print("
" + "=" * 60)
    print("EFFECT OF VARIANCE ON BOUNDARY SHAPE")
    print("=" * 60)
    
    np.random.seed(42)
    n = 100
    
    # Case 1: Equal variances → Linear boundary
    print("
--- Case 1: Equal Variances ---")
    X_eq = np.vstack([
        np.random.randn(n, 2) * 1.0 + [0, 0],
        np.random.randn(n, 2) * 1.0 + [3, 3]
    ])
    y_eq = np.array([0]*n + [1]*n)
    
    gnb_eq = GaussianNB()
    gnb_eq.fit(X_eq, y_eq)
    
    print(f"Class 0 variance: {gnb_eq.var_[0].round(4)}")
    print(f"Class 1 variance: {gnb_eq.var_[1].round(4)}")
    print(f"Variance ratio: {(gnb_eq.var_[1] / gnb_eq.var_[0]).round(3)}")
    print("Expected boundary: LINEAR (equal variances)")
    
    # Case 2: Unequal variances → Curved boundary
    print("
--- Case 2: Unequal Variances ---")
    X_uneq = np.vstack([
        np.random.randn(n, 2) * 0.5 + [0, 0],  # Tight cluster
        np.random.randn(n, 2) * 2.0 + [3, 3]   # Spread cluster
    ])
    y_uneq = np.array([0]*n + [1]*n)
    
    gnb_uneq = GaussianNB()
    gnb_uneq.fit(X_uneq, y_uneq)
    
    print(f"Class 0 variance: {gnb_uneq.var_[0].round(4)}")
    print(f"Class 1 variance: {gnb_uneq.var_[1].round(4)}")
    print(f"Variance ratio: {(gnb_uneq.var_[1] / gnb_uneq.var_[0]).round(3)}")
    print("Expected boundary: QUADRATIC (unequal variances)")
 
 
# Run demonstrations
create_boundary_visualization_data()
analyze_variance_effect_on_boundary()

Summary: Decision Boundaries

We have developed a comprehensive understanding of decision boundaries in Gaussian Naive Bayes. Let us consolidate the key concepts:

Key Concepts

•Decision boundary definition: Surface where posterior probabilities are equal between classes
•General form: Boundary equation derived from comparing log-posteriors across classes
•Quadratic terms: Arise from differing variances; coefficient is $(1/\sigma_0^2 - 1/\sigma_1^2)$
•Linear boundaries: Emerge when all variances are equal; quadratic terms cancel
•Prior effect: Unequal priors shift boundary toward lower-prior class
•Variance effect: Larger-variance class wins in tails; smaller-variance class wins near its mean
•Multi-class: Pairwise boundaries create Voronoi-like tessellation with potential triple points

What's next:

Page Complete