Loading learning content...
Mathematics describes, but visualization reveals. While we can prove that Voronoi cells partition space and that 1-NN creates convex decision regions, seeing these structures builds intuition that transfers to problems where direct visualization is impossible.
This page covers practical techniques for visualizing Voronoi diagrams and KNN decision surfaces—from straightforward 2D plots to creative approaches for understanding higher-dimensional structure.
By the end of this page, you will master 2D Voronoi visualization, understand decision boundary plotting techniques, learn dimension reduction approaches for high-D visualization, and develop practical skills for interpreting KNN behavior visually.
Two-dimensional visualization is the foundation. In 2D, Voronoi cells are convex polygons, edges are line segments, and the full structure is directly viewable.
Key Visualization Components:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
import numpy as npimport matplotlib.pyplot as pltfrom scipy.spatial import Voronoifrom matplotlib.patches import Polygonfrom matplotlib.collections import PatchCollection def visualize_classified_voronoi(X, y, ax=None, bounds=None): """ Create publication-quality Voronoi visualization with class coloring. Args: X: (n, 2) array of training points y: (n,) array of class labels ax: matplotlib axes (created if None) bounds: (xmin, xmax, ymin, ymax) for clipping """ if ax is None: fig, ax = plt.subplots(1, 1, figsize=(10, 8)) if bounds is None: margin = 0.5 bounds = (X[:, 0].min() - margin, X[:, 0].max() + margin, X[:, 1].min() - margin, X[:, 1].max() + margin) # Add bounding box points for finite cells bbox_points = np.array([ [bounds[0] - 10, bounds[2] - 10], [bounds[1] + 10, bounds[2] - 10], [bounds[0] - 10, bounds[3] + 10], [bounds[1] + 10, bounds[3] + 10] ]) augmented_X = np.vstack([X, bbox_points]) vor = Voronoi(augmented_X) # Color palette for classes colors = plt.cm.Set2(np.linspace(0, 1, len(np.unique(y)))) # Draw filled Voronoi cells patches = [] patch_colors = [] for point_idx in range(len(X)): # Only original points region_idx = vor.point_region[point_idx] region = vor.regions[region_idx] if not region or -1 in region: continue verts = vor.vertices[region] # Clip to bounds verts = np.clip(verts, [bounds[0], bounds[2]], [bounds[1], bounds[3]]) patches.append(Polygon(verts, closed=True)) patch_colors.append(colors[y[point_idx]]) pc = PatchCollection(patches, alpha=0.6, edgecolor='gray', linewidth=0.5) pc.set_facecolor(patch_colors) ax.add_collection(pc) # Plot training points for c in np.unique(y): mask = y == c ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], s=80, edgecolor='white', linewidth=1.5, zorder=10, label=f'Class {c}') ax.set_xlim(bounds[0], bounds[1]) ax.set_ylim(bounds[2], bounds[3]) ax.set_aspect('equal') ax.legend(loc='best') return ax # Example usageif __name__ == "__main__": np.random.seed(42) X = np.vstack([ np.random.randn(15, 2) * 0.5 + [1, 1], np.random.randn(15, 2) * 0.5 + [3, 3] ]) y = np.array([0]*15 + [1]*15) fig, ax = plt.subplots(figsize=(10, 8)) visualize_classified_voronoi(X, y, ax) ax.set_title("Voronoi Tessellation with Class Labels", fontweight='bold') plt.savefig("voronoi_classified.png", dpi=150) plt.show()For KNN with K > 1, direct Voronoi visualization doesn't apply. Instead, we use grid-based decision boundary plotting—evaluate the classifier on a dense grid and visualize the results.
The Grid Method:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.neighbors import KNeighborsClassifierfrom matplotlib.colors import ListedColormap def plot_knn_decision_boundary(X, y, k=5, resolution=200, ax=None): """ Plot KNN decision boundaries using grid evaluation. Args: X: Training features (n, 2) y: Training labels (n,) k: Number of neighbors resolution: Grid resolution for contour plot """ if ax is None: fig, ax = plt.subplots(figsize=(10, 8)) # Create mesh grid margin = 0.5 x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin xx, yy = np.meshgrid( np.linspace(x_min, x_max, resolution), np.linspace(y_min, y_max, resolution) ) grid_points = np.c_[xx.ravel(), yy.ravel()] # Train and predict knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X, y) Z = knn.predict(grid_points).reshape(xx.shape) # Plot decision regions n_classes = len(np.unique(y)) colors = plt.cm.Set2(np.linspace(0, 1, n_classes)) cmap = ListedColormap(colors[:, :3]) ax.contourf(xx, yy, Z, alpha=0.4, cmap=cmap) ax.contour(xx, yy, Z, colors='black', linewidths=2) # Plot training points for c in np.unique(y): mask = y == c ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], s=80, edgecolor='white', linewidth=1.5, zorder=10, label=f'Class {c}') ax.set_xlim(x_min, x_max) ax.set_ylim(y_min, y_max) ax.set_aspect('equal') ax.set_title(f'{k}-NN Decision Boundary', fontsize=14, fontweight='bold') ax.legend(loc='best') return ax # Compare different K valuesif __name__ == "__main__": np.random.seed(42) X = np.vstack([ np.random.randn(50, 2) * 0.8 + [0, 0], np.random.randn(50, 2) * 0.8 + [2, 2] ]) y = np.array([0]*50 + [1]*50) fig, axes = plt.subplots(1, 3, figsize=(15, 5)) for ax, k in zip(axes, [1, 5, 25]): plot_knn_decision_boundary(X, y, k=k, ax=ax) plt.tight_layout() plt.savefig("knn_boundaries_comparison.png", dpi=150)Higher grid resolution reveals finer boundary details but increases computation. For publication, use 200-500 resolution. For quick exploration, 100 suffices. For large datasets, consider plotting only a subset of training points while computing boundaries on all data.
Beyond hard decision boundaries, visualizing prediction probabilities reveals classifier confidence across the feature space.
Probability Estimation in KNN: $$P(Y = c | q) = \frac{1}{K} \sum_{i \in \text{KNN}(q)} \mathbb{1}[y_i = c]$$
This vote fraction provides a natural confidence measure: 5/5 votes is high confidence; 3/5 is borderline.
Visualization Approaches:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.neighbors import KNeighborsClassifier def plot_knn_probability_map(X, y, k=5, resolution=150): """ Visualize KNN prediction probabilities as a heatmap. Shows confidence variation across feature space. """ fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # Create mesh margin = 0.5 x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin xx, yy = np.meshgrid( np.linspace(x_min, x_max, resolution), np.linspace(y_min, y_max, resolution) ) grid = np.c_[xx.ravel(), yy.ravel()] # Fit and predict probabilities knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X, y) proba = knn.predict_proba(grid)[:, 1].reshape(xx.shape) # Panel 1: Probability for class 1 ax1 = axes[0] im1 = ax1.contourf(xx, yy, proba, levels=20, cmap='RdBu_r') ax1.contour(xx, yy, proba, levels=[0.5], colors='black', linewidths=2) ax1.scatter(X[:, 0], X[:, 1], c=y, cmap='RdBu_r', edgecolor='white', s=50, zorder=10) ax1.set_title('P(Class=1)', fontweight='bold') plt.colorbar(im1, ax=ax1) # Panel 2: Maximum probability (confidence) ax2 = axes[1] max_proba = np.maximum(proba, 1 - proba) im2 = ax2.contourf(xx, yy, max_proba, levels=20, cmap='Greens') ax2.scatter(X[:, 0], X[:, 1], c='black', s=30, zorder=10) ax2.set_title('Confidence (max P)', fontweight='bold') plt.colorbar(im2, ax=ax2) # Panel 3: Entropy (uncertainty) ax3 = axes[2] eps = 1e-10 entropy = -(proba * np.log2(proba + eps) + (1-proba) * np.log2(1-proba + eps)) im3 = ax3.contourf(xx, yy, entropy, levels=20, cmap='hot_r') ax3.scatter(X[:, 0], X[:, 1], c='cyan', s=30, zorder=10) ax3.set_title('Entropy (uncertainty)', fontweight='bold') plt.colorbar(im3, ax=ax3) for ax in axes: ax.set_xlim(x_min, x_max) ax.set_ylim(y_min, y_max) ax.set_aspect('equal') plt.suptitle(f'{k}-NN Probability Visualization', fontsize=14, y=1.02) plt.tight_layout() return fig if __name__ == "__main__": np.random.seed(42) X = np.vstack([np.random.randn(40, 2) + [0, 0], np.random.randn(40, 2) + [2.5, 2.5]]) y = np.array([0]*40 + [1]*40) fig = plot_knn_probability_map(X, y, k=7) plt.savefig("knn_probability_maps.png", dpi=150)While direct visualization fails beyond 3D, several strategies provide insight into high-dimensional KNN behavior.
Strategy 1: Dimension Reduction
Project to 2D using PCA, t-SNE, or UMAP, then visualize decision boundaries in the reduced space. Caveat: boundaries in projected space may not match true high-D boundaries.
Strategy 2: Feature Pair Plots
For d features, create d(d-1)/2 pairwise scatter plots, each showing decision boundaries in that 2D slice. Useful for identifying which feature pairs drive classification.
Strategy 3: Parallel Coordinates
Plot each sample as a polyline across feature axes, colored by class. Reveals separability patterns across features.
Strategy 4: Distance Distribution Analysis
Plot distributions of distances to K nearest neighbors, separated by correct/incorrect predictions. Reveals when KNN fails (distances too large, neighbors too diverse).
Static visualizations capture snapshots; animations reveal dynamics.
Animating K:
Create frame-by-frame animations showing decision boundaries as K increases from 1 to n/2. This dramatically illustrates the smoothing effect and helps identify the "sweet spot" K value.
Animating Training Set Growth:
Show how Voronoi cells subdivide as training points are added. Cells shrink, boundaries refine, and the approximation to the true boundary improves.
Interactive Tools:
These tools are invaluable for building intuition and for presentations where audience exploration is desired.
For teaching: animate K from 1 upward to show smoothing. For debugging: compare true boundary (if known) against learned boundary. For stakeholders: show probability maps rather than hard boundaries to convey uncertainty.
What's Next:
The final page explores decision surface smoothing with K in detail—how the neighborhood size controls the effective bandwidth of the classifier and connects KNN to kernel methods and local regression.
You now have practical tools for visualizing KNN decision surfaces, from Voronoi diagrams to probability heatmaps to high-dimensional strategies. These visualization skills are essential for model understanding and debugging.