Loading content...
In 1-NN classification, every training point wields absolute authority over its Voronoi cell—a single point determines the fate of all queries within its region. This creates decision boundaries with maximum detail but also maximum sensitivity to noise. A single mislabeled training point carves out an "island" of incorrect predictions.
The parameter K transforms this picture fundamentally. When K = 3, each prediction requires agreement from a local committee. When K = 50, decisions reflect the consensus of a substantial neighborhood. As K increases, the jagged Voronoi-based boundaries smooth out, irregularities average away, and the decision surface approaches a simpler, more stable structure.
This page provides a rigorous geometric analysis of how decision boundaries evolve with K, revealing the deep connection between neighborhood size and the bias-variance tradeoff—not as an abstract statistical concept, but as a visible, tangible transformation of decision geometry.
By the end of this page, you will understand how decision boundaries transition from Voronoi tessellations (K=1) to progressively smoother surfaces as K increases, the geometric interpretation of bias-variance tradeoff in KNN, edge effects and boundary behavior, and practical implications for K selection.
The classical Voronoi diagram captures 1-NN regions. For general K, we need a richer mathematical structure: the order-k Voronoi diagram.
Definition (Order-k Voronoi Diagram): Given a point set P with n sites, the order-k Voronoi region for a subset S ⊆ P with |S| = k is:
$$V_k(S) = {x \in \mathbb{R}^d : \max_{p \in S} |x - p| \leq \min_{q \in P \setminus S} |x - q|}$$
In words, V_k(S) contains all points whose k nearest neighbors are exactly the sites in S. The collection of all such regions for all possible k-subsets forms the order-k Voronoi diagram.
Key Properties:
Refinement: The order-k diagram refines the order-(k-1) diagram. As k increases, regions become progressively more fragmented before eventually simplifying.
Combinatorial explosion: There are C(n, k) possible k-subsets, so potentially C(n, k) regions—though many are empty. The actual complexity is O(k(n-k)) faces.
1-NN recovery: The order-1 Voronoi diagram is exactly the standard Voronoi diagram.
Boundary structure: Boundaries now separate regions with different k-nearest neighbor sets, not just different single nearest neighbors.
For K-NN classification, we don't need the full order-k Voronoi structure. We only care about the majority class among k neighbors. This creates a coarser partition: all regions where class A has the majority merge together. The effective decision boundary is much simpler than the full order-k diagram.
The Classification Perspective:
Let's denote the k-NN decision region for class c as:
$$R_k(c) = {x \in \mathbb{R}^d : \text{majority of k nearest neighbors have label } c}$$
Unlike 1-NN where R₁(c) is a union of complete Voronoi cells, R_k(c) is a union of order-k regions where class c wins the vote. The boundaries of R_k(c) are subsets of the order-k Voronoi boundaries—but simplified by ignoring within-class boundaries.
This gives us a critical insight: as K increases, the number of distinct decision regions tends to decrease. Small isolated pockets of one class get "outvoted" by surrounding neighbors of another class, causing those regions to flip.
The most instructive way to understand K's effect is through direct visualization. Let's construct a systematic comparison showing how the same training data produces increasingly smooth decision boundaries as K grows.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.neighbors import KNeighborsClassifierfrom matplotlib.colors import ListedColormap def visualize_knn_boundary_evolution(): """ Visualize how K-NN decision boundaries evolve as K increases. Key observations: - K=1: Jagged Voronoi boundaries, highly local - K=3,5: Moderate smoothing, noise islands start disappearing - K=15,25: Significant smoothing, approaching linear-like boundaries - K=large: Eventual convergence to majority class everywhere """ np.random.seed(42) # Generate two-class spiral-like data with some noise n_samples = 200 # Class 0: Upper-left cluster with some scattered points n0 = n_samples // 2 theta0 = np.random.uniform(0, 2*np.pi, n0) r0 = np.random.uniform(0, 1.5, n0) X0 = np.column_stack([ r0 * np.cos(theta0) + 1, r0 * np.sin(theta0) + 2 ]) # Add noise points (potential outliers) X0_noise = np.array([[3.5, 1.5], [2.8, 0.8], [0.5, 0.5]]) X0 = np.vstack([X0, X0_noise]) y0 = np.zeros(len(X0), dtype=int) # Class 1: Lower-right cluster n1 = n_samples // 2 theta1 = np.random.uniform(0, 2*np.pi, n1) r1 = np.random.uniform(0, 1.5, n1) X1 = np.column_stack([ r1 * np.cos(theta1) + 3, r1 * np.sin(theta1) + 1 ]) # Add noise points X1_noise = np.array([[0.8, 2.5], [1.2, 0.3]]) X1 = np.vstack([X1, X1_noise]) y1 = np.ones(len(X1), dtype=int) X = np.vstack([X0, X1]) y = np.hstack([y0, y1]) # Create visualization grid x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300), np.linspace(y_min, y_max, 300)) grid_points = np.c_[xx.ravel(), yy.ravel()] # K values to visualize k_values = [1, 3, 7, 15, 31, 61] # Color schemes colors = ['#3498db', '#e74c3c'] cmap_light = ListedColormap(['#AED6F1', '#F5B7B1']) cmap_bold = ListedColormap(colors) # Create subplot grid fig, axes = plt.subplots(2, 3, figsize=(15, 10)) axes = axes.flatten() for idx, k in enumerate(k_values): ax = axes[idx] # Train K-NN classifier knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X, y) # Predict on grid Z = knn.predict(grid_points).reshape(xx.shape) # Plot decision regions ax.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.8) ax.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2.5, linestyles='-') # Plot training points for cls in [0, 1]: mask = y == cls ax.scatter(X[mask, 0], X[mask, 1], c=colors[cls], edgecolor='white', linewidth=0.8, s=40, zorder=5) # Annotate noise points that are "islands" if k == 1: ax.annotate('Noise\nisland', xy=(3.5, 1.5), xytext=(4.2, 2.3), fontsize=9, arrowprops=dict(arrowstyle='->', color='gray'), color='gray') ax.set_title(f'K = {k}', fontsize=14, fontweight='bold') ax.set_xlim(x_min, x_max) ax.set_ylim(y_min, y_max) ax.set_aspect('equal') ax.set_xlabel('Feature 1') ax.set_ylabel('Feature 2') plt.suptitle('K-NN Decision Boundary Evolution', fontsize=16, fontweight='bold', y=1.02) plt.tight_layout() return fig, X, y def compute_boundary_complexity(X, y, k_values, grid_resolution=200): """ Quantify decision boundary complexity as a function of K. Complexity metric: Total length of decision boundary (contour perimeter). As K increases, boundaries smooth and total length decreases. """ x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid( np.linspace(x_min, x_max, grid_resolution), np.linspace(y_min, y_max, grid_resolution) ) grid_points = np.c_[xx.ravel(), yy.ravel()] complexities = [] for k in k_values: knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X, y) Z = knn.predict(grid_points).reshape(xx.shape) # Estimate boundary length via gradient # Points where prediction changes between adjacent grid cells boundary_h = np.abs(np.diff(Z, axis=1)).sum() boundary_v = np.abs(np.diff(Z, axis=0)).sum() total_boundary = boundary_h + boundary_v complexities.append(total_boundary) return complexities if __name__ == "__main__": fig, X, y = visualize_knn_boundary_evolution() plt.savefig("knn_boundary_evolution.png", dpi=150, bbox_inches='tight') plt.show() # Compute complexity curve k_range = list(range(1, 81, 2)) complexities = compute_boundary_complexity(X, y, k_range) plt.figure(figsize=(10, 5)) plt.plot(k_range, complexities, 'o-', linewidth=2, markersize=6) plt.xlabel('K (Number of Neighbors)', fontsize=12) plt.ylabel('Boundary Complexity (Perimeter Units)', fontsize=12) plt.title('Decision Boundary Complexity vs K', fontsize=14, fontweight='bold') plt.grid(True, alpha=0.3) plt.savefig("boundary_complexity_vs_k.png", dpi=150) plt.show()Observations from the Visualization:
K = 1 (Voronoi regime): Every training point controls its region. Noise points create isolated "islands" of incorrect predictions. The boundary is maximally complex—every edge of the class-boundary portion of the Voronoi diagram appears.
K = 3-7 (Local smoothing): Small isolated regions disappear. A noise point from class A surrounded by class B neighbors loses its island. The boundary starts to follow the "main" class structure.
K = 15-31 (Moderate smoothing): The boundary becomes visibly smoother. Small protrusions and indentations are averaged away. The overall shape reflects the class distribution's central tendency.
K → n (Over-smoothing): As K approaches n, larger regions that should belong to one class start flipping. Eventually, when K = n, every point is classified as the majority class—the boundary vanishes entirely.
This progression illustrates the bias-variance tradeoff geometrically: low K means low bias (boundary can capture fine detail) but high variance (sensitive to individual points); high K means high bias (boundary forced toward simplicity) but low variance (stable across different training samples).
Let's formalize the intuition that larger K produces smoother boundaries. We analyze the relationship between K, neighborhood radius, and decision surface properties.
Neighborhood Radius and K:
For a query point q, let r_k(q) denote the distance to the k-th nearest neighbor. This defines the k-neighborhood ball:
$$B_k(q) = {x \in \mathbb{R}^d : |x - q| \leq r_k(q)}$$
As K increases, r_k(q) increases (we must reach further to gather K neighbors). The prediction at q depends on all training points within B_k(q)—a larger spatial average.
Local Averaging Interpretation:
The k-NN prediction can be viewed as local averaging:
$$\hat{y}(q) = \text{mode}{y_i : x_i \in \text{KNN}(q)}$$
For regression, this becomes:
$$\hat{f}(q) = \frac{1}{k} \sum_{i \in \text{KNN}(q)} y_i$$
Larger K means averaging over a larger spatial region. This is equivalent to spatial low-pass filtering—high-frequency variations (sharp boundary changes) are attenuated, leaving only low-frequency structure (broad class regions).
K-NN is related to kernel density estimation with a "box kernel" that assigns equal weight to all K neighbors and zero weight to all others. Weighted K-NN with distance-based weights creates smoother kernel functions. The effective bandwidth is proportional to r_k(q)—varying spatially unlike fixed-bandwidth kernels.
Characterizing Boundary Curvature:
For a binary classification problem, the decision boundary is the set of points where the K-NN vote is tied or nearly tied. Around such a boundary point q:
With K = 1, the boundary is the perpendicular bisector between two training points of opposite classes—a straight line segment.
With larger K, the boundary position at q depends on the distribution of classes in B_k(q). The boundary bends toward regions where the minority class is dense.
Quantifying Smoothness:
We can characterize boundary smoothness through its curvature. For a 2D decision boundary parameterized by arc length s as (x(s), y(s)), curvature is:
$$\kappa(s) = \left|\frac{d\theta}{ds}\right| = \frac{|x'y'' - y'x''|}{(x'^2 + y'^2)^{3/2}}$$
The total curvature ∫|κ|ds measures boundary complexity. Empirically, this decreases monotonically with K (for reasonable K < n/2).
The Boundary Width Effect:
With probability-based voting (soft K-NN), we can define a "boundary width"—the region where neither class has overwhelming majority. This width scales approximately as:
$$w(K) \propto \sqrt{K} \cdot (\text{local density})^{-1/d}$$
Larger K creates wider, more diffuse boundaries rather than sharp transitions.
| Property | Small K | Large K | Scaling Behavior |
|---|---|---|---|
| Boundary length | Long (follows Voronoi) | Short (smooth curve) | Decreases then stabilizes |
| Total curvature | High (sharp corners) | Low (gentle bends) | Monotonically decreases |
| Boundary width | Sharp (single edge) | Diffuse (uncertainty zone) | Increases with √K |
| Number of regions | Many (isolated islands) | Few (major regions only) | Decreases with K |
| Sensitivity to noise | High (single points matter) | Low (averaged out) | Decreases as 1/K |
The bias-variance tradeoff is often presented abstractly, but K-NN provides a beautifully geometric interpretation. Here we make the tradeoff visible and quantifiable.
Variance: Sensitivity to Training Set
Consider training on different random samples from the same distribution. The variance of a classifier at point q is:
$$\text{Var}[\hat{y}(q)] = \mathbb{E}_{\text{train}}[(\hat{y}(q) - \mathbb{E}[\hat{y}(q)])^2]$$
Geometric interpretation: With K = 1, if q lies near the boundary between classes, which single training point is closest varies substantially across samples. The decision at q flip-flops. With large K, the vote aggregates many neighbors—individual point variation averages out.
Bias: Deviation from True Boundary
$$\text{Bias}[\hat{y}(q)] = \mathbb{E}[\hat{y}(q)] - y^*(q)$$
where y*(q) is the true (Bayes optimal) label.
Geometric interpretation: With K = 1, if training density is sufficient, the boundary closely tracks the true class interface. With large K, the boundary is pulled toward the majority class in the neighborhood—if class A is locally sparse, the boundary invades into A's territory, creating systematic bias.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.neighbors import KNeighborsClassifierfrom matplotlib.colors import ListedColormap def visualize_bias_variance_geometric(n_experiments=50): """ Visualize the bias-variance tradeoff geometrically by: 1. Generating multiple training samples from the same distribution 2. Training K-NN with different K values 3. Showing individual decision boundaries (variance) and average boundary (bias) Key insight: - Low K: Boundaries vary wildly across experiments (high variance) - High K: Boundaries are stable but may miss true boundary (high bias) """ np.random.seed(0) # True boundary: diagonal line y = x + noise def true_class(x1, x2): return (x2 > x1).astype(int) def generate_training_data(n_samples=100): X = np.random.uniform(-2, 2, (n_samples, 2)) y = true_class(X[:, 0], X[:, 1]) # Add 10% label noise flip_mask = np.random.random(n_samples) < 0.1 y[flip_mask] = 1 - y[flip_mask] return X, y # Grid for predictions xx, yy = np.meshgrid(np.linspace(-2, 2, 150), np.linspace(-2, 2, 150)) grid_points = np.c_[xx.ravel(), yy.ravel()] k_values = [1, 5, 21, 51] fig, axes = plt.subplots(2, len(k_values), figsize=(16, 8)) for col, k in enumerate(k_values): # Collect predictions across experiments all_predictions = [] for exp in range(n_experiments): X_train, y_train = generate_training_data(100) knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X_train, y_train) Z = knn.predict(grid_points).reshape(xx.shape) all_predictions.append(Z) all_predictions = np.array(all_predictions) # Top row: Individual experiment boundaries (variance visualization) ax_var = axes[0, col] ax_var.set_title(f'K = {k}: Individual Boundaries', fontsize=11, fontweight='bold') # Plot a few individual boundaries for i in range(min(8, n_experiments)): ax_var.contour(xx, yy, all_predictions[i], levels=[0.5], colors='gray', linewidths=0.5, alpha=0.4) # True boundary (dashed line y = x) ax_var.plot([-2, 2], [-2, 2], 'g--', linewidth=2.5, label='True boundary' if col == 0 else '') ax_var.set_xlim(-2, 2) ax_var.set_ylim(-2, 2) ax_var.set_aspect('equal') if col == 0: ax_var.set_ylabel('Variance View', fontsize=10) # Bottom row: Average prediction (bias visualization) ax_bias = axes[1, col] ax_bias.set_title(f'K = {k}: Average Prediction', fontsize=11, fontweight='bold') # Average prediction probability avg_prediction = all_predictions.mean(axis=0) # Color by average prediction (continuous) contourf = ax_bias.contourf(xx, yy, avg_prediction, levels=np.linspace(0, 1, 21), cmap='RdBu_r', alpha=0.8) # Average decision boundary (where avg = 0.5) ax_bias.contour(xx, yy, avg_prediction, levels=[0.5], colors='black', linewidths=2.5) # True boundary ax_bias.plot([-2, 2], [-2, 2], 'g--', linewidth=2.5) ax_bias.set_xlim(-2, 2) ax_bias.set_ylim(-2, 2) ax_bias.set_aspect('equal') if col == 0: ax_bias.set_ylabel('Bias View', fontsize=10) # Compute variance (standard deviation of predictions at each point) pred_variance = all_predictions.std(axis=0) avg_variance = pred_variance.mean() # Compute bias (deviation of average boundary from true boundary) # At points where true class is 1, bias = avg_prediction - 1 true_labels = true_class(xx, yy) bias_magnitude = np.abs(avg_prediction - true_labels).mean() ax_bias.text(0.02, 0.98, f'Avg Var: {avg_variance:.3f}\nAvg |Bias|: {bias_magnitude:.3f}', transform=ax_bias.transAxes, fontsize=9, verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white', alpha=0.8)) plt.tight_layout() fig.legend(['True boundary'], loc='upper center', ncol=1, fontsize=10, bbox_to_anchor=(0.5, 1.02)) return fig if __name__ == "__main__": fig = visualize_bias_variance_geometric() plt.savefig("bias_variance_knn_geometric.png", dpi=150, bbox_inches='tight') plt.show()The Optimal K:
The optimal K minimizes total error: Bias² + Variance + irreducible noise. This optimal point shifts depending on:
Rules of thumb like K = √n are starting points, but cross-validation remains essential for finding the true optimum.
Not all regions of feature space behave equally under K-NN. Boundaries, sparse regions, and edges exhibit distinctive behaviors that affect both accuracy and interpretation.
Near Class Boundaries:
Points near the true class boundary experience the most significant smoothing effect:
If the true boundary is complex (highly curved), larger K "straightens" it, pushing predictions toward whichever class is locally more numerous. This creates boundary bias: systematic misclassification of minority class points in boundary regions.
At Data Edges (Sparse Regions):
Near the edges of the training data distribution, K-NN faces unique challenges:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.neighbors import KNeighborsClassifierfrom scipy.spatial import ConvexHull def visualize_edge_effects(): """ Demonstrate edge effects in K-NN classification: 1. Asymmetric neighborhoods at data boundaries 2. Extrapolation behavior outside convex hull 3. How K affects predictions in sparse regions """ np.random.seed(42) # Generate data with clear boundary and sparse edges n_samples = 100 # Class 0: Dense cluster X0 = np.random.randn(n_samples, 2) * 0.5 + np.array([0, 0]) y0 = np.zeros(n_samples, dtype=int) # Class 1: Scattered points forming an arc theta = np.linspace(np.pi/4, 3*np.pi/4, n_samples) r = 3 + np.random.randn(n_samples) * 0.3 X1 = np.column_stack([r * np.cos(theta), r * np.sin(theta)]) y1 = np.ones(n_samples, dtype=int) X = np.vstack([X0, X1]) y = np.hstack([y0, y1]) # Compute convex hull hull = ConvexHull(X) # Query points: Inside, on edge, outside query_inside = np.array([[1.5, 1.5]]) query_edge = np.array([[-2.5, 2.0]]) query_outside = np.array([[0, 4.5]]) fig, axes = plt.subplots(1, 3, figsize=(15, 5)) k_values = [3, 11, 31] for ax, k in zip(axes, k_values): ax.set_title(f'K = {k}: Edge Effects', fontsize=12, fontweight='bold') knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X, y) # Plot decision boundary xx, yy = np.meshgrid(np.linspace(-4, 4, 200), np.linspace(-2, 5, 200)) Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape) ax.contourf(xx, yy, Z, alpha=0.3, cmap='RdBu_r') ax.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2) # Plot training points ax.scatter(X0[:, 0], X0[:, 1], c='blue', s=30, alpha=0.6, label='Class 0') ax.scatter(X1[:, 0], X1[:, 1], c='red', s=30, alpha=0.6, label='Class 1') # Plot convex hull for simplex in hull.simplices: ax.plot(X[simplex, 0], X[simplex, 1], 'k--', linewidth=1, alpha=0.5) # Query points with neighborhood visualization for q, color, name in [(query_inside, 'green', 'Inside'), (query_edge, 'orange', 'Edge'), (query_outside, 'purple', 'Outside')]: # Find k nearest neighbors distances, indices = knn.kneighbors(q) max_dist = distances[0, -1] # Draw neighborhood circle circle = plt.Circle(q[0], max_dist, fill=False, color=color, linewidth=2, linestyle='-') ax.add_patch(circle) # Mark query point ax.scatter(q[0, 0], q[0, 1], c=color, s=200, marker='*', edgecolor='black', linewidth=1.5, zorder=10, label=f'{name}: r_k={max_dist:.2f}') # Lines to nearest neighbors for idx in indices[0]: ax.plot([q[0, 0], X[idx, 0]], [q[0, 1], X[idx, 1]], color=color, linewidth=0.5, alpha=0.5) ax.set_xlim(-4, 4) ax.set_ylim(-2, 5) ax.set_aspect('equal') ax.legend(loc='lower right', fontsize=8) plt.tight_layout() return fig if __name__ == "__main__": fig = visualize_edge_effects() plt.savefig("edge_effects_knn.png", dpi=150, bbox_inches='tight') plt.show()Key Observations:
Neighborhood radius varies spatially: In dense regions, r_k is small; in sparse regions, r_k is large. This creates adaptive bandwidth across the feature space.
Outside the convex hull: K-NN extends its predictions infinitely. There's no "uncertainty" signal—predictions look as confident outside the data as inside, despite being extrapolations.
Asymmetric voting: At edges, neighbors are not symmetrically distributed around the query. The vote is dominated by whichever class occupies the populated direction.
Implications for Practice:
K-NN provides no natural measure of confidence that reflects distance from training data. A query point far outside the training distribution receives a prediction just as confidently as a point at the centroid. Always pair K-NN with density estimation or conformal prediction to detect out-of-distribution queries.
Having understood the geometric implications of K, we now address the practical question: how do we select K for a specific problem?
Rule of Thumb Starting Points:
K = √n: A classic heuristic. For n = 1000, start with K ≈ 31.
K = odd number for binary classification: Avoids ties in voting.
K ∈ [3, 10] for small datasets: Conservative range that avoids extreme underfitting or overfitting.
Cross-Validation for K Selection:
The principled approach: evaluate multiple K values using cross-validation and select the one with best validation performance.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.model_selection import cross_val_score, StratifiedKFoldfrom sklearn.datasets import make_classification def select_k_with_cross_validation(): """ Demonstrate cross-validation based K selection. Shows: 1. CV accuracy curve across K values 2. Standard error bands for uncertainty 3. The "elbow" pattern and optimal K identification """ # Generate synthetic dataset X, y = make_classification( n_samples=500, n_features=10, n_informative=6, n_redundant=2, n_clusters_per_class=2, flip_y=0.1, random_state=42 ) # K values to evaluate k_values = list(range(1, 52, 2)) # Odd values only # Cross-validation setup cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42) # Store results mean_scores = [] std_scores = [] for k in k_values: knn = KNeighborsClassifier(n_neighbors=k) scores = cross_val_score(knn, X, y, cv=cv, scoring='accuracy') mean_scores.append(scores.mean()) std_scores.append(scores.std()) mean_scores = np.array(mean_scores) std_scores = np.array(std_scores) # Find optimal K (within 1 std of best) best_idx = np.argmax(mean_scores) best_k = k_values[best_idx] best_score = mean_scores[best_idx] # "1 standard error rule": Simplest model within 1 SE of best threshold = best_score - std_scores[best_idx] one_se_candidates = [k for k, (m, s) in zip(k_values, zip(mean_scores, std_scores)) if m >= threshold] one_se_k = max(one_se_candidates) # Largest K meeting threshold (simplest) # Visualization fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Left: CV curve with confidence bands ax1 = axes[0] ax1.plot(k_values, mean_scores, 'b-', linewidth=2, marker='o', markersize=5, label='Mean CV Accuracy') ax1.fill_between(k_values, mean_scores - std_scores, mean_scores + std_scores, alpha=0.2, color='blue', label='±1 Std Dev') ax1.axvline(x=best_k, color='green', linestyle='--', linewidth=2, label=f'Best K = {best_k}') ax1.axvline(x=one_se_k, color='orange', linestyle=':', linewidth=2, label=f'1-SE K = {one_se_k}') ax1.axhline(y=threshold, color='gray', linestyle=':', alpha=0.5) ax1.set_xlabel('K (Number of Neighbors)', fontsize=12) ax1.set_ylabel('Cross-Validation Accuracy', fontsize=12) ax1.set_title('K Selection via Cross-Validation', fontsize=14, fontweight='bold') ax1.legend(loc='lower right') ax1.grid(True, alpha=0.3) # Right: Variance of predictions ax2 = axes[1] ax2.plot(k_values, std_scores, 'r-', linewidth=2, marker='s', markersize=5) ax2.set_xlabel('K (Number of Neighbors)', fontsize=12) ax2.set_ylabel('Standard Deviation of CV Scores', fontsize=12) ax2.set_title('Prediction Variance vs K', fontsize=14, fontweight='bold') ax2.grid(True, alpha=0.3) # Annotate ax2.annotate('High variance\n(sensitive to folds)', xy=(k_values[0], std_scores[0]), xytext=(10, std_scores[0] + 0.01), fontsize=10, ha='left', arrowprops=dict(arrowstyle='->', color='gray')) plt.tight_layout() print(f"Best K (highest mean accuracy): {best_k} with accuracy {best_score:.4f}") print(f"1-SE K (simplest within 1 SE): {one_se_k}") return fig, best_k, one_se_k if __name__ == "__main__": fig, best_k, one_se_k = select_k_with_cross_validation() plt.savefig("k_selection_cv.png", dpi=150, bbox_inches='tight') plt.show()The neighborhood size K is not just a hyperparameter—it's a geometric transformation knob that reshapes the entire decision surface.
What's Next:
We have seen that K = 1 produces Voronoi cells. The next page explores this connection in depth: how exactly does the Voronoi structure manifest as 1-NN regions, what are the implications for classification error, and how can we leverage this understanding for algorithm design?
You now understand how K transforms KNN decision boundaries from jagged Voronoi tessellations to smooth averaged surfaces. This geometric perspective makes the bias-variance tradeoff tangible and guides practical K selection.