Voronoi Diagrams And Decision Surfaces - Learning Module

Loading content...

0/245

Decision Boundaries as K Changes

The Smoothing Parameter K

In 1-NN classification, every training point wields absolute authority over its Voronoi cell—a single point determines the fate of all queries within its region. This creates decision boundaries with maximum detail but also maximum sensitivity to noise. A single mislabeled training point carves out an "island" of incorrect predictions.

The parameter K transforms this picture fundamentally. When K = 3, each prediction requires agreement from a local committee. When K = 50, decisions reflect the consensus of a substantial neighborhood. As K increases, the jagged Voronoi-based boundaries smooth out, irregularities average away, and the decision surface approaches a simpler, more stable structure.

This page provides a rigorous geometric analysis of how decision boundaries evolve with K, revealing the deep connection between neighborhood size and the bias-variance tradeoff—not as an abstract statistical concept, but as a visible, tangible transformation of decision geometry.

What You Will Learn

By the end of this page, you will understand how decision boundaries transition from Voronoi tessellations (K=1) to progressively smoother surfaces as K increases, the geometric interpretation of bias-variance tradeoff in KNN, edge effects and boundary behavior, and practical implications for K selection.

From Voronoi to K-Order Voronoi Diagrams

The classical Voronoi diagram captures 1-NN regions. For general K, we need a richer mathematical structure: the order-k Voronoi diagram.

Definition (Order-k Voronoi Diagram): Given a point set P with n sites, the order-k Voronoi region for a subset S ⊆ P with |S| = k is:

$$V_k(S) = {x \in \mathbb{R}^d : \max_{p \in S} |x - p| \leq \min_{q \in P \setminus S} |x - q|}$$

In words, V_k(S) contains all points whose k nearest neighbors are exactly the sites in S. The collection of all such regions for all possible k-subsets forms the order-k Voronoi diagram.

Key Properties:

Refinement: The order-k diagram refines the order-(k-1) diagram. As k increases, regions become progressively more fragmented before eventually simplifying.
Combinatorial explosion: There are C(n, k) possible k-subsets, so potentially C(n, k) regions—though many are empty. The actual complexity is O(k(n-k)) faces.
1-NN recovery: The order-1 Voronoi diagram is exactly the standard Voronoi diagram.
Boundary structure: Boundaries now separate regions with different k-nearest neighbor sets, not just different single nearest neighbors.

Practical Interpretation

For K-NN classification, we don't need the full order-k Voronoi structure. We only care about the majority class among k neighbors. This creates a coarser partition: all regions where class A has the majority merge together. The effective decision boundary is much simpler than the full order-k diagram.

The Classification Perspective:

Let's denote the k-NN decision region for class c as:

$$R_k(c) = {x \in \mathbb{R}^d : \text{majority of k nearest neighbors have label } c}$$

Unlike 1-NN where R₁(c) is a union of complete Voronoi cells, R_k(c) is a union of order-k regions where class c wins the vote. The boundaries of R_k(c) are subsets of the order-k Voronoi boundaries—but simplified by ignoring within-class boundaries.

This gives us a critical insight: as K increases, the number of distinct decision regions tends to decrease. Small isolated pockets of one class get "outvoted" by surrounding neighbors of another class, causing those regions to flip.

Visualizing Boundary Evolution with K

The most instructive way to understand K's effect is through direct visualization. Let's construct a systematic comparison showing how the same training data produces increasingly smooth decision boundaries as K grows.

decision_boundary_k_evolution.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from matplotlib.colors import ListedColormap
 
def visualize_knn_boundary_evolution():
    """
    Visualize how K-NN decision boundaries evolve as K increases.
    
    Key observations:
    - K=1: Jagged Voronoi boundaries, highly local
    - K=3,5: Moderate smoothing, noise islands start disappearing
    - K=15,25: Significant smoothing, approaching linear-like boundaries
    - K=large: Eventual convergence to majority class everywhere
    """
    np.random.seed(42)
    
    # Generate two-class spiral-like data with some noise
    n_samples = 200
    
    # Class 0: Upper-left cluster with some scattered points
    n0 = n_samples // 2
    theta0 = np.random.uniform(0, 2*np.pi, n0)
    r0 = np.random.uniform(0, 1.5, n0)
    X0 = np.column_stack([
        r0 * np.cos(theta0) + 1,
        r0 * np.sin(theta0) + 2
    ])
    # Add noise points (potential outliers)
    X0_noise = np.array([[3.5, 1.5], [2.8, 0.8], [0.5, 0.5]])
    X0 = np.vstack([X0, X0_noise])
    y0 = np.zeros(len(X0), dtype=int)
    
    # Class 1: Lower-right cluster
    n1 = n_samples // 2
    theta1 = np.random.uniform(0, 2*np.pi, n1)
    r1 = np.random.uniform(0, 1.5, n1)
    X1 = np.column_stack([
        r1 * np.cos(theta1) + 3,
        r1 * np.sin(theta1) + 1
    ])
    # Add noise points
    X1_noise = np.array([[0.8, 2.5], [1.2, 0.3]])
    X1 = np.vstack([X1, X1_noise])
    y1 = np.ones(len(X1), dtype=int)
    
    X = np.vstack([X0, X1])
    y = np.hstack([y0, y1])
    
    # Create visualization grid
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                         np.linspace(y_min, y_max, 300))
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    
    # K values to visualize
    k_values = [1, 3, 7, 15, 31, 61]
    
    # Color schemes
    colors = ['#3498db', '#e74c3c']
    cmap_light = ListedColormap(['#AED6F1', '#F5B7B1'])
    cmap_bold = ListedColormap(colors)
    
    # Create subplot grid
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    axes = axes.flatten()
    
    for idx, k in enumerate(k_values):
        ax = axes[idx]
        
        # Train K-NN classifier
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X, y)
        
        # Predict on grid
        Z = knn.predict(grid_points).reshape(xx.shape)
        
        # Plot decision regions
        ax.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.8)
        ax.contour(xx, yy, Z, levels=[0.5], colors='black', 
                  linewidths=2.5, linestyles='-')
        
        # Plot training points
        for cls in [0, 1]:
            mask = y == cls
            ax.scatter(X[mask, 0], X[mask, 1], c=colors[cls],
                      edgecolor='white', linewidth=0.8, s=40, zorder=5)
        
        # Annotate noise points that are "islands"
        if k == 1:
            ax.annotate('Noise\nisland', xy=(3.5, 1.5), xytext=(4.2, 2.3),
                       fontsize=9, arrowprops=dict(arrowstyle='->', color='gray'),
                       color='gray')
        
        ax.set_title(f'K = {k}', fontsize=14, fontweight='bold')
        ax.set_xlim(x_min, x_max)
        ax.set_ylim(y_min, y_max)
        ax.set_aspect('equal')
        ax.set_xlabel('Feature 1')
        ax.set_ylabel('Feature 2')
    
    plt.suptitle('K-NN Decision Boundary Evolution', 
                fontsize=16, fontweight='bold', y=1.02)
    plt.tight_layout()
    
    return fig, X, y
 
def compute_boundary_complexity(X, y, k_values, grid_resolution=200):
    """
    Quantify decision boundary complexity as a function of K.
    
    Complexity metric: Total length of decision boundary (contour perimeter).
    As K increases, boundaries smooth and total length decreases.
    """
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, grid_resolution),
        np.linspace(y_min, y_max, grid_resolution)
    )
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    
    complexities = []
    
    for k in k_values:
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X, y)
        Z = knn.predict(grid_points).reshape(xx.shape)
        
        # Estimate boundary length via gradient
        # Points where prediction changes between adjacent grid cells
        boundary_h = np.abs(np.diff(Z, axis=1)).sum()
        boundary_v = np.abs(np.diff(Z, axis=0)).sum()
        total_boundary = boundary_h + boundary_v
        
        complexities.append(total_boundary)
    
    return complexities
 
if __name__ == "__main__":
    fig, X, y = visualize_knn_boundary_evolution()
    plt.savefig("knn_boundary_evolution.png", dpi=150, bbox_inches='tight')
    plt.show()
    
    # Compute complexity curve
    k_range = list(range(1, 81, 2))
    complexities = compute_boundary_complexity(X, y, k_range)
    
    plt.figure(figsize=(10, 5))
    plt.plot(k_range, complexities, 'o-', linewidth=2, markersize=6)
    plt.xlabel('K (Number of Neighbors)', fontsize=12)
    plt.ylabel('Boundary Complexity (Perimeter Units)', fontsize=12)
    plt.title('Decision Boundary Complexity vs K', fontsize=14, fontweight='bold')
    plt.grid(True, alpha=0.3)
    plt.savefig("boundary_complexity_vs_k.png", dpi=150)
    plt.show()

Observations from the Visualization:

K = 1 (Voronoi regime): Every training point controls its region. Noise points create isolated "islands" of incorrect predictions. The boundary is maximally complex—every edge of the class-boundary portion of the Voronoi diagram appears.
K = 3-7 (Local smoothing): Small isolated regions disappear. A noise point from class A surrounded by class B neighbors loses its island. The boundary starts to follow the "main" class structure.
K = 15-31 (Moderate smoothing): The boundary becomes visibly smoother. Small protrusions and indentations are averaged away. The overall shape reflects the class distribution's central tendency.
K → n (Over-smoothing): As K approaches n, larger regions that should belong to one class start flipping. Eventually, when K = n, every point is classified as the majority class—the boundary vanishes entirely.

This progression illustrates the bias-variance tradeoff geometrically: low K means low bias (boundary can capture fine detail) but high variance (sensitive to individual points); high K means high bias (boundary forced toward simplicity) but low variance (stable across different training samples).

Mathematical Analysis of Boundary Smoothing

Let's formalize the intuition that larger K produces smoother boundaries. We analyze the relationship between K, neighborhood radius, and decision surface properties.

Neighborhood Radius and K:

For a query point q, let r_k(q) denote the distance to the k-th nearest neighbor. This defines the k-neighborhood ball:

$$B_k(q) = {x \in \mathbb{R}^d : |x - q| \leq r_k(q)}$$

As K increases, r_k(q) increases (we must reach further to gather K neighbors). The prediction at q depends on all training points within B_k(q)—a larger spatial average.

Local Averaging Interpretation:

The k-NN prediction can be viewed as local averaging:

$$\hat{y}(q) = \text{mode}{y_i : x_i \in \text{KNN}(q)}$$

For regression, this becomes:

$$\hat{f}(q) = \frac{1}{k} \sum_{i \in \text{KNN}(q)} y_i$$

Larger K means averaging over a larger spatial region. This is equivalent to spatial low-pass filtering—high-frequency variations (sharp boundary changes) are attenuated, leaving only low-frequency structure (broad class regions).

Connection to Kernel Smoothing

K-NN is related to kernel density estimation with a "box kernel" that assigns equal weight to all K neighbors and zero weight to all others. Weighted K-NN with distance-based weights creates smoother kernel functions. The effective bandwidth is proportional to r_k(q)—varying spatially unlike fixed-bandwidth kernels.

Characterizing Boundary Curvature:

For a binary classification problem, the decision boundary is the set of points where the K-NN vote is tied or nearly tied. Around such a boundary point q:

With K = 1, the boundary is the perpendicular bisector between two training points of opposite classes—a straight line segment.
With larger K, the boundary position at q depends on the distribution of classes in B_k(q). The boundary bends toward regions where the minority class is dense.

Quantifying Smoothness:

We can characterize boundary smoothness through its curvature. For a 2D decision boundary parameterized by arc length s as (x(s), y(s)), curvature is:

$$\kappa(s) = \left|\frac{d\theta}{ds}\right| = \frac{|x'y'' - y'x''|}{(x'^2 + y'^2)^{3/2}}$$

The total curvature ∫|κ|ds measures boundary complexity. Empirically, this decreases monotonically with K (for reasonable K < n/2).

The Boundary Width Effect:

With probability-based voting (soft K-NN), we can define a "boundary width"—the region where neither class has overwhelming majority. This width scales approximately as:

$$w(K) \propto \sqrt{K} \cdot (\text{local density})^{-1/d}$$

Larger K creates wider, more diffuse boundaries rather than sharp transitions.

Boundary Properties vs K
Property	Small K	Large K	Scaling Behavior
Boundary length	Long (follows Voronoi)	Short (smooth curve)	Decreases then stabilizes
Total curvature	High (sharp corners)	Low (gentle bends)	Monotonically decreases
Boundary width	Sharp (single edge)	Diffuse (uncertainty zone)	Increases with √K
Number of regions	Many (isolated islands)	Few (major regions only)	Decreases with K
Sensitivity to noise	High (single points matter)	Low (averaged out)	Decreases as 1/K

The Bias-Variance Tradeoff in Geometric Terms

The bias-variance tradeoff is often presented abstractly, but K-NN provides a beautifully geometric interpretation. Here we make the tradeoff visible and quantifiable.

Variance: Sensitivity to Training Set

Consider training on different random samples from the same distribution. The variance of a classifier at point q is:

$$\text{Var}[\hat{y}(q)] = \mathbb{E}_{\text{train}}[(\hat{y}(q) - \mathbb{E}[\hat{y}(q)])^2]$$

Geometric interpretation: With K = 1, if q lies near the boundary between classes, which single training point is closest varies substantially across samples. The decision at q flip-flops. With large K, the vote aggregates many neighbors—individual point variation averages out.

Bias: Deviation from True Boundary

$$\text{Bias}[\hat{y}(q)] = \mathbb{E}[\hat{y}(q)] - y^*(q)$$

where y*(q) is the true (Bayes optimal) label.

Geometric interpretation: With K = 1, if training density is sufficient, the boundary closely tracks the true class interface. With large K, the boundary is pulled toward the majority class in the neighborhood—if class A is locally sparse, the boundary invades into A's territory, creating systematic bias.

bias_variance_knn_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from matplotlib.colors import ListedColormap
 
def visualize_bias_variance_geometric(n_experiments=50):
    """
    Visualize the bias-variance tradeoff geometrically by:
    1. Generating multiple training samples from the same distribution
    2. Training K-NN with different K values
    3. Showing individual decision boundaries (variance) and average boundary (bias)
    
    Key insight: 
    - Low K: Boundaries vary wildly across experiments (high variance)
    - High K: Boundaries are stable but may miss true boundary (high bias)
    """
    np.random.seed(0)
    
    # True boundary: diagonal line y = x + noise
    def true_class(x1, x2):
        return (x2 > x1).astype(int)
    
    def generate_training_data(n_samples=100):
        X = np.random.uniform(-2, 2, (n_samples, 2))
        y = true_class(X[:, 0], X[:, 1])
        # Add 10% label noise
        flip_mask = np.random.random(n_samples) < 0.1
        y[flip_mask] = 1 - y[flip_mask]
        return X, y
    
    # Grid for predictions
    xx, yy = np.meshgrid(np.linspace(-2, 2, 150), np.linspace(-2, 2, 150))
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    
    k_values = [1, 5, 21, 51]
    
    fig, axes = plt.subplots(2, len(k_values), figsize=(16, 8))
    
    for col, k in enumerate(k_values):
        # Collect predictions across experiments
        all_predictions = []
        
        for exp in range(n_experiments):
            X_train, y_train = generate_training_data(100)
            knn = KNeighborsClassifier(n_neighbors=k)
            knn.fit(X_train, y_train)
            Z = knn.predict(grid_points).reshape(xx.shape)
            all_predictions.append(Z)
        
        all_predictions = np.array(all_predictions)
        
        # Top row: Individual experiment boundaries (variance visualization)
        ax_var = axes[0, col]
        ax_var.set_title(f'K = {k}: Individual Boundaries', fontsize=11, fontweight='bold')
        
        # Plot a few individual boundaries
        for i in range(min(8, n_experiments)):
            ax_var.contour(xx, yy, all_predictions[i], levels=[0.5], 
                          colors='gray', linewidths=0.5, alpha=0.4)
        
        # True boundary (dashed line y = x)
        ax_var.plot([-2, 2], [-2, 2], 'g--', linewidth=2.5, 
                   label='True boundary' if col == 0 else '')
        
        ax_var.set_xlim(-2, 2)
        ax_var.set_ylim(-2, 2)
        ax_var.set_aspect('equal')
        if col == 0:
            ax_var.set_ylabel('Variance View', fontsize=10)
        
        # Bottom row: Average prediction (bias visualization)
        ax_bias = axes[1, col]
        ax_bias.set_title(f'K = {k}: Average Prediction', fontsize=11, fontweight='bold')
        
        # Average prediction probability
        avg_prediction = all_predictions.mean(axis=0)
        
        # Color by average prediction (continuous)
        contourf = ax_bias.contourf(xx, yy, avg_prediction, 
                                    levels=np.linspace(0, 1, 21),
                                    cmap='RdBu_r', alpha=0.8)
        
        # Average decision boundary (where avg = 0.5)
        ax_bias.contour(xx, yy, avg_prediction, levels=[0.5], 
                       colors='black', linewidths=2.5)
        
        # True boundary
        ax_bias.plot([-2, 2], [-2, 2], 'g--', linewidth=2.5)
        
        ax_bias.set_xlim(-2, 2)
        ax_bias.set_ylim(-2, 2)
        ax_bias.set_aspect('equal')
        if col == 0:
            ax_bias.set_ylabel('Bias View', fontsize=10)
        
        # Compute variance (standard deviation of predictions at each point)
        pred_variance = all_predictions.std(axis=0)
        avg_variance = pred_variance.mean()
        
        # Compute bias (deviation of average boundary from true boundary)
        # At points where true class is 1, bias = avg_prediction - 1
        true_labels = true_class(xx, yy)
        bias_magnitude = np.abs(avg_prediction - true_labels).mean()
        
        ax_bias.text(0.02, 0.98, f'Avg Var: {avg_variance:.3f}\nAvg |Bias|: {bias_magnitude:.3f}',
                    transform=ax_bias.transAxes, fontsize=9, verticalalignment='top',
                    bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    plt.tight_layout()
    fig.legend(['True boundary'], loc='upper center', ncol=1, fontsize=10, 
              bbox_to_anchor=(0.5, 1.02))
    
    return fig
 
if __name__ == "__main__":
    fig = visualize_bias_variance_geometric()
    plt.savefig("bias_variance_knn_geometric.png", dpi=150, bbox_inches='tight')
    plt.show()

Low K (High Variance)

•Decision boundary varies dramatically across training samples
•Boundaries follow noise—random fluctuations treated as signal
•Variance dominates the error
•Overfitting to specific training set
•"Wiggly" boundaries that don't generalize

High K (High Bias)

•Decision boundary is stable but systematically wrong
•Boundaries "straightened out" even when true boundary is curved
•Bias dominates the error
•Underfitting—unable to capture true complexity
•"Stiff" boundaries that miss detail

The Optimal K:

The optimal K minimizes total error: Bias² + Variance + irreducible noise. This optimal point shifts depending on:

Sample size n: Larger n allows smaller K (more training points to average over)
Dimensionality d: Higher d typically requires larger K (curse of dimensionality makes nearest neighbors less reliable)
True boundary complexity: Complex boundaries need smaller K to capture detail
Noise level: Higher noise warrants larger K for averaging

Rules of thumb like K = √n are starting points, but cross-validation remains essential for finding the true optimum.

Edge Effects and Boundary Behavior

Not all regions of feature space behave equally under K-NN. Boundaries, sparse regions, and edges exhibit distinctive behaviors that affect both accuracy and interpretation.

Near Class Boundaries:

Points near the true class boundary experience the most significant smoothing effect:

With K = 1, the predicted boundary closely tracks individual training points
With larger K, the neighborhood increasingly contains both classes
The predicted class depends on the local class ratio within B_k(q)

If the true boundary is complex (highly curved), larger K "straightens" it, pushing predictions toward whichever class is locally more numerous. This creates boundary bias: systematic misclassification of minority class points in boundary regions.

At Data Edges (Sparse Regions):

Near the edges of the training data distribution, K-NN faces unique challenges:

Asymmetric neighborhoods: K neighbors may all come from one direction, not balanced around q
Extrapolation danger: Beyond the training data convex hull, all neighbors are "far" and their labels become less reliable
Unbounded Voronoi cells: For K = 1, edge points have infinite Voronoi cells; predictions extend indefinitely into unexplored space

edge_effects_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from scipy.spatial import ConvexHull
 
def visualize_edge_effects():
    """
    Demonstrate edge effects in K-NN classification:
    1. Asymmetric neighborhoods at data boundaries
    2. Extrapolation behavior outside convex hull
    3. How K affects predictions in sparse regions
    """
    np.random.seed(42)
    
    # Generate data with clear boundary and sparse edges
    n_samples = 100
    
    # Class 0: Dense cluster
    X0 = np.random.randn(n_samples, 2) * 0.5 + np.array([0, 0])
    y0 = np.zeros(n_samples, dtype=int)
    
    # Class 1: Scattered points forming an arc
    theta = np.linspace(np.pi/4, 3*np.pi/4, n_samples)
    r = 3 + np.random.randn(n_samples) * 0.3
    X1 = np.column_stack([r * np.cos(theta), r * np.sin(theta)])
    y1 = np.ones(n_samples, dtype=int)
    
    X = np.vstack([X0, X1])
    y = np.hstack([y0, y1])
    
    # Compute convex hull
    hull = ConvexHull(X)
    
    # Query points: Inside, on edge, outside
    query_inside = np.array([[1.5, 1.5]])
    query_edge = np.array([[-2.5, 2.0]])
    query_outside = np.array([[0, 4.5]])
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    k_values = [3, 11, 31]
    
    for ax, k in zip(axes, k_values):
        ax.set_title(f'K = {k}: Edge Effects', fontsize=12, fontweight='bold')
        
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X, y)
        
        # Plot decision boundary
        xx, yy = np.meshgrid(np.linspace(-4, 4, 200), np.linspace(-2, 5, 200))
        Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
        ax.contourf(xx, yy, Z, alpha=0.3, cmap='RdBu_r')
        ax.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
        
        # Plot training points
        ax.scatter(X0[:, 0], X0[:, 1], c='blue', s=30, alpha=0.6, label='Class 0')
        ax.scatter(X1[:, 0], X1[:, 1], c='red', s=30, alpha=0.6, label='Class 1')
        
        # Plot convex hull
        for simplex in hull.simplices:
            ax.plot(X[simplex, 0], X[simplex, 1], 'k--', linewidth=1, alpha=0.5)
        
        # Query points with neighborhood visualization
        for q, color, name in [(query_inside, 'green', 'Inside'),
                               (query_edge, 'orange', 'Edge'),
                               (query_outside, 'purple', 'Outside')]:
            
            # Find k nearest neighbors
            distances, indices = knn.kneighbors(q)
            max_dist = distances[0, -1]
            
            # Draw neighborhood circle
            circle = plt.Circle(q[0], max_dist, fill=False, 
                               color=color, linewidth=2, linestyle='-')
            ax.add_patch(circle)
            
            # Mark query point
            ax.scatter(q[0, 0], q[0, 1], c=color, s=200, marker='*', 
                      edgecolor='black', linewidth=1.5, zorder=10,
                      label=f'{name}: r_k={max_dist:.2f}')
            
            # Lines to nearest neighbors
            for idx in indices[0]:
                ax.plot([q[0, 0], X[idx, 0]], [q[0, 1], X[idx, 1]], 
                       color=color, linewidth=0.5, alpha=0.5)
        
        ax.set_xlim(-4, 4)
        ax.set_ylim(-2, 5)
        ax.set_aspect('equal')
        ax.legend(loc='lower right', fontsize=8)
    
    plt.tight_layout()
    return fig
 
if __name__ == "__main__":
    fig = visualize_edge_effects()
    plt.savefig("edge_effects_knn.png", dpi=150, bbox_inches='tight')
    plt.show()

Key Observations:

Neighborhood radius varies spatially: In dense regions, r_k is small; in sparse regions, r_k is large. This creates adaptive bandwidth across the feature space.
Outside the convex hull: K-NN extends its predictions infinitely. There's no "uncertainty" signal—predictions look as confident outside the data as inside, despite being extrapolations.
Asymmetric voting: At edges, neighbors are not symmetrically distributed around the query. The vote is dominated by whichever class occupies the populated direction.

Implications for Practice:

Monitor predictions near edges: High confidence far from training data is a red flag
Consider rejection options: For queries where r_k exceeds some threshold, abstain or flag uncertainty
Density-weighted variants: Weight neighbors by local density to correct for asymmetry

Extrapolation Warning

K-NN provides no natural measure of confidence that reflects distance from training data. A query point far outside the training distribution receives a prediction just as confidently as a point at the centroid. Always pair K-NN with density estimation or conformal prediction to detect out-of-distribution queries.

Practical K Selection Strategies

Having understood the geometric implications of K, we now address the practical question: how do we select K for a specific problem?

Rule of Thumb Starting Points:

K = √n: A classic heuristic. For n = 1000, start with K ≈ 31.
K = odd number for binary classification: Avoids ties in voting.
K ∈ [3, 10] for small datasets: Conservative range that avoids extreme underfitting or overfitting.

Cross-Validation for K Selection:

The principled approach: evaluate multiple K values using cross-validation and select the one with best validation performance.

k_selection_cross_validation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.datasets import make_classification
 
def select_k_with_cross_validation():
    """
    Demonstrate cross-validation based K selection.
    
    Shows:
    1. CV accuracy curve across K values
    2. Standard error bands for uncertainty
    3. The "elbow" pattern and optimal K identification
    """
    # Generate synthetic dataset
    X, y = make_classification(
        n_samples=500, n_features=10, n_informative=6,
        n_redundant=2, n_clusters_per_class=2, 
        flip_y=0.1, random_state=42
    )
    
    # K values to evaluate
    k_values = list(range(1, 52, 2))  # Odd values only
    
    # Cross-validation setup
    cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
    
    # Store results
    mean_scores = []
    std_scores = []
    
    for k in k_values:
        knn = KNeighborsClassifier(n_neighbors=k)
        scores = cross_val_score(knn, X, y, cv=cv, scoring='accuracy')
        mean_scores.append(scores.mean())
        std_scores.append(scores.std())
    
    mean_scores = np.array(mean_scores)
    std_scores = np.array(std_scores)
    
    # Find optimal K (within 1 std of best)
    best_idx = np.argmax(mean_scores)
    best_k = k_values[best_idx]
    best_score = mean_scores[best_idx]
    
    # "1 standard error rule": Simplest model within 1 SE of best
    threshold = best_score - std_scores[best_idx]
    one_se_candidates = [k for k, (m, s) in zip(k_values, zip(mean_scores, std_scores)) 
                         if m >= threshold]
    one_se_k = max(one_se_candidates)  # Largest K meeting threshold (simplest)
    
    # Visualization
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Left: CV curve with confidence bands
    ax1 = axes[0]
    ax1.plot(k_values, mean_scores, 'b-', linewidth=2, marker='o', 
            markersize=5, label='Mean CV Accuracy')
    ax1.fill_between(k_values, mean_scores - std_scores, mean_scores + std_scores,
                    alpha=0.2, color='blue', label='±1 Std Dev')
    
    ax1.axvline(x=best_k, color='green', linestyle='--', linewidth=2,
               label=f'Best K = {best_k}')
    ax1.axvline(x=one_se_k, color='orange', linestyle=':', linewidth=2,
               label=f'1-SE K = {one_se_k}')
    ax1.axhline(y=threshold, color='gray', linestyle=':', alpha=0.5)
    
    ax1.set_xlabel('K (Number of Neighbors)', fontsize=12)
    ax1.set_ylabel('Cross-Validation Accuracy', fontsize=12)
    ax1.set_title('K Selection via Cross-Validation', fontsize=14, fontweight='bold')
    ax1.legend(loc='lower right')
    ax1.grid(True, alpha=0.3)
    
    # Right: Variance of predictions
    ax2 = axes[1]
    ax2.plot(k_values, std_scores, 'r-', linewidth=2, marker='s', markersize=5)
    ax2.set_xlabel('K (Number of Neighbors)', fontsize=12)
    ax2.set_ylabel('Standard Deviation of CV Scores', fontsize=12)
    ax2.set_title('Prediction Variance vs K', fontsize=14, fontweight='bold')
    ax2.grid(True, alpha=0.3)
    
    # Annotate
    ax2.annotate('High variance\n(sensitive to folds)', 
                xy=(k_values[0], std_scores[0]), 
                xytext=(10, std_scores[0] + 0.01),
                fontsize=10, ha='left',
                arrowprops=dict(arrowstyle='->', color='gray'))
    
    plt.tight_layout()
    
    print(f"Best K (highest mean accuracy): {best_k} with accuracy {best_score:.4f}")
    print(f"1-SE K (simplest within 1 SE): {one_se_k}")
    
    return fig, best_k, one_se_k
 
if __name__ == "__main__":
    fig, best_k, one_se_k = select_k_with_cross_validation()
    plt.savefig("k_selection_cv.png", dpi=150, bbox_inches='tight')
    plt.show()

K Selection Guidelines

•Start with √n as initial guess: Provides reasonable starting point for search range.
•Use cross-validation with stratified folds: Ensures each fold has representative class distribution.
•Apply "1 standard error rule": Select the simplest model (largest K) within 1 SE of the best—reduces overfitting to validation set.
•Check the full curve shape: Sharp decline suggests underfitting; high plateau suggests robustness; rising variance suggests noise sensitivity.
•Consider computational constraints: Larger K means more distance computations for prediction. For real-time systems, smaller K may be forced.
•Account for class imbalance: With imbalanced classes, K should not exceed the minority class size (else minority never wins).

Summary: K Transforms Decision Geometry

The neighborhood size K is not just a hyperparameter—it's a geometric transformation knob that reshapes the entire decision surface.

Key Takeaways

•Voronoi to smooth: K = 1 produces Voronoi-based boundaries; as K increases, boundaries progressively smooth via local averaging.
•Bias-variance visible: Low K creates high-variance (wiggly) boundaries; high K creates high-bias (oversimplified) boundaries. The optimal K balances both.
•Order-k Voronoi structure: Mathematically, K-NN decisions follow order-k Voronoi regions, though class-majority simplifies the structure.
•Edge effects matter: At data boundaries and sparse regions, K affects neighborhood asymmetry and extrapolation behavior.
•Select K empirically: Cross-validation with the 1-SE rule provides principled K selection; rules of thumb (K = √n) are starting points only.
•Computational implication: K trades prediction complexity (more neighbors to query) for decision simplicity (smoother boundaries).

What's Next:

We have seen that K = 1 produces Voronoi cells. The next page explores this connection in depth: how exactly does the Voronoi structure manifest as 1-NN regions, what are the implications for classification error, and how can we leverage this understanding for algorithm design?

Page Complete

You now understand how K transforms KNN decision boundaries from jagged Voronoi tessellations to smooth averaged surfaces. This geometric perspective makes the bias-variance tradeoff tangible and guides practical K selection.