Machine LearningK-Nearest Neighbors

Voronoi Diagrams & Decision Surfaces

LevelIntermediate

Duration90 mins

TopicK-Nearest Neighbors

4 / 5

Visualization Techniques

Seeing the Geometry of Classification

Mathematics describes, but visualization reveals. While we can prove that Voronoi cells partition space and that 1-NN creates convex decision regions, seeing these structures builds intuition that transfers to problems where direct visualization is impossible.

This page covers practical techniques for visualizing Voronoi diagrams and KNN decision surfaces—from straightforward 2D plots to creative approaches for understanding higher-dimensional structure.

What You Will Learn

By the end of this page, you will master 2D Voronoi visualization, understand decision boundary plotting techniques, learn dimension reduction approaches for high-D visualization, and develop practical skills for interpreting KNN behavior visually.

2D Voronoi Diagram Visualization

Two-dimensional visualization is the foundation. In 2D, Voronoi cells are convex polygons, edges are line segments, and the full structure is directly viewable.

Key Visualization Components:

Voronoi cells: Color-filled polygons, one per training point
Cell edges: The decision boundaries between adjacent cells
Sites (generators): The training points themselves
Class coloring: Cells colored by their training point's label
Delaunay overlay: Optional dual structure for context

voronoi_2d_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
 
def visualize_classified_voronoi(X, y, ax=None, bounds=None):
    """
    Create publication-quality Voronoi visualization with class coloring.
    
    Args:
        X: (n, 2) array of training points
        y: (n,) array of class labels
        ax: matplotlib axes (created if None)
        bounds: (xmin, xmax, ymin, ymax) for clipping
    """
    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(10, 8))
    
    if bounds is None:
        margin = 0.5
        bounds = (X[:, 0].min() - margin, X[:, 0].max() + margin,
                  X[:, 1].min() - margin, X[:, 1].max() + margin)
    
    # Add bounding box points for finite cells
    bbox_points = np.array([
        [bounds[0] - 10, bounds[2] - 10],
        [bounds[1] + 10, bounds[2] - 10],
        [bounds[0] - 10, bounds[3] + 10],
        [bounds[1] + 10, bounds[3] + 10]
    ])
    augmented_X = np.vstack([X, bbox_points])
    
    vor = Voronoi(augmented_X)
    
    # Color palette for classes
    colors = plt.cm.Set2(np.linspace(0, 1, len(np.unique(y))))
    
    # Draw filled Voronoi cells
    patches = []
    patch_colors = []
    
    for point_idx in range(len(X)):  # Only original points
        region_idx = vor.point_region[point_idx]
        region = vor.regions[region_idx]
        
        if not region or -1 in region:
            continue
        
        verts = vor.vertices[region]
        # Clip to bounds
        verts = np.clip(verts, [bounds[0], bounds[2]], [bounds[1], bounds[3]])
        
        patches.append(Polygon(verts, closed=True))
        patch_colors.append(colors[y[point_idx]])
    
    pc = PatchCollection(patches, alpha=0.6, edgecolor='gray', linewidth=0.5)
    pc.set_facecolor(patch_colors)
    ax.add_collection(pc)
    
    # Plot training points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], s=80,
                   edgecolor='white', linewidth=1.5, zorder=10,
                   label=f'Class {c}')
    
    ax.set_xlim(bounds[0], bounds[1])
    ax.set_ylim(bounds[2], bounds[3])
    ax.set_aspect('equal')
    ax.legend(loc='best')
    
    return ax
 
# Example usage
if __name__ == "__main__":
    np.random.seed(42)
    X = np.vstack([
        np.random.randn(15, 2) * 0.5 + [1, 1],
        np.random.randn(15, 2) * 0.5 + [3, 3]
    ])
    y = np.array([0]*15 + [1]*15)
    
    fig, ax = plt.subplots(figsize=(10, 8))
    visualize_classified_voronoi(X, y, ax)
    ax.set_title("Voronoi Tessellation with Class Labels", fontweight='bold')
    plt.savefig("voronoi_classified.png", dpi=150)
    plt.show()

Decision Boundary Plotting

For KNN with K > 1, direct Voronoi visualization doesn't apply. Instead, we use grid-based decision boundary plotting—evaluate the classifier on a dense grid and visualize the results.

The Grid Method:

Create a fine mesh grid covering the feature space
Predict class labels for all grid points
Use contour plots to show class regions and boundaries
Overlay training points for context

decision_boundary_plotting.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from matplotlib.colors import ListedColormap
 
def plot_knn_decision_boundary(X, y, k=5, resolution=200, ax=None):
    """
    Plot KNN decision boundaries using grid evaluation.
    
    Args:
        X: Training features (n, 2)
        y: Training labels (n,)
        k: Number of neighbors
        resolution: Grid resolution for contour plot
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 8))
    
    # Create mesh grid
    margin = 0.5
    x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
    y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
    
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, resolution),
        np.linspace(y_min, y_max, resolution)
    )
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    
    # Train and predict
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    Z = knn.predict(grid_points).reshape(xx.shape)
    
    # Plot decision regions
    n_classes = len(np.unique(y))
    colors = plt.cm.Set2(np.linspace(0, 1, n_classes))
    cmap = ListedColormap(colors[:, :3])
    
    ax.contourf(xx, yy, Z, alpha=0.4, cmap=cmap)
    ax.contour(xx, yy, Z, colors='black', linewidths=2)
    
    # Plot training points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], s=80,
                   edgecolor='white', linewidth=1.5, zorder=10,
                   label=f'Class {c}')
    
    ax.set_xlim(x_min, x_max)
    ax.set_ylim(y_min, y_max)
    ax.set_aspect('equal')
    ax.set_title(f'{k}-NN Decision Boundary', fontsize=14, fontweight='bold')
    ax.legend(loc='best')
    
    return ax
 
# Compare different K values
if __name__ == "__main__":
    np.random.seed(42)
    X = np.vstack([
        np.random.randn(50, 2) * 0.8 + [0, 0],
        np.random.randn(50, 2) * 0.8 + [2, 2]
    ])
    y = np.array([0]*50 + [1]*50)
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    for ax, k in zip(axes, [1, 5, 25]):
        plot_knn_decision_boundary(X, y, k=k, ax=ax)
    
    plt.tight_layout()
    plt.savefig("knn_boundaries_comparison.png", dpi=150)

Resolution Trade-offs

Higher grid resolution reveals finer boundary details but increases computation. For publication, use 200-500 resolution. For quick exploration, 100 suffices. For large datasets, consider plotting only a subset of training points while computing boundaries on all data.

Probability and Confidence Visualization

Beyond hard decision boundaries, visualizing prediction probabilities reveals classifier confidence across the feature space.

Probability Estimation in KNN: $$P(Y = c | q) = \frac{1}{K} \sum_{i \in \text{KNN}(q)} \mathbb{1}[y_i = c]$$

This vote fraction provides a natural confidence measure: 5/5 votes is high confidence; 3/5 is borderline.

Visualization Approaches:

Probability heatmaps: Color by P(Y = 1) on continuous scale
Uncertainty maps: Highlight regions where max probability < threshold
Entropy maps: Show classification entropy H = -Σ p log p

probability_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
 
def plot_knn_probability_map(X, y, k=5, resolution=150):
    """
    Visualize KNN prediction probabilities as a heatmap.
    Shows confidence variation across feature space.
    """
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Create mesh
    margin = 0.5
    x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
    y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, resolution),
        np.linspace(y_min, y_max, resolution)
    )
    grid = np.c_[xx.ravel(), yy.ravel()]
    
    # Fit and predict probabilities
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    proba = knn.predict_proba(grid)[:, 1].reshape(xx.shape)
    
    # Panel 1: Probability for class 1
    ax1 = axes[0]
    im1 = ax1.contourf(xx, yy, proba, levels=20, cmap='RdBu_r')
    ax1.contour(xx, yy, proba, levels=[0.5], colors='black', linewidths=2)
    ax1.scatter(X[:, 0], X[:, 1], c=y, cmap='RdBu_r', 
               edgecolor='white', s=50, zorder=10)
    ax1.set_title('P(Class=1)', fontweight='bold')
    plt.colorbar(im1, ax=ax1)
    
    # Panel 2: Maximum probability (confidence)
    ax2 = axes[1]
    max_proba = np.maximum(proba, 1 - proba)
    im2 = ax2.contourf(xx, yy, max_proba, levels=20, cmap='Greens')
    ax2.scatter(X[:, 0], X[:, 1], c='black', s=30, zorder=10)
    ax2.set_title('Confidence (max P)', fontweight='bold')
    plt.colorbar(im2, ax=ax2)
    
    # Panel 3: Entropy (uncertainty)
    ax3 = axes[2]
    eps = 1e-10
    entropy = -(proba * np.log2(proba + eps) + 
               (1-proba) * np.log2(1-proba + eps))
    im3 = ax3.contourf(xx, yy, entropy, levels=20, cmap='hot_r')
    ax3.scatter(X[:, 0], X[:, 1], c='cyan', s=30, zorder=10)
    ax3.set_title('Entropy (uncertainty)', fontweight='bold')
    plt.colorbar(im3, ax=ax3)
    
    for ax in axes:
        ax.set_xlim(x_min, x_max)
        ax.set_ylim(y_min, y_max)
        ax.set_aspect('equal')
    
    plt.suptitle(f'{k}-NN Probability Visualization', fontsize=14, y=1.02)
    plt.tight_layout()
    return fig
 
if __name__ == "__main__":
    np.random.seed(42)
    X = np.vstack([np.random.randn(40, 2) + [0, 0],
                   np.random.randn(40, 2) + [2.5, 2.5]])
    y = np.array([0]*40 + [1]*40)
    fig = plot_knn_probability_map(X, y, k=7)
    plt.savefig("knn_probability_maps.png", dpi=150)

High-Dimensional Visualization Strategies

While direct visualization fails beyond 3D, several strategies provide insight into high-dimensional KNN behavior.

Strategy 1: Dimension Reduction

Project to 2D using PCA, t-SNE, or UMAP, then visualize decision boundaries in the reduced space. Caveat: boundaries in projected space may not match true high-D boundaries.

Strategy 2: Feature Pair Plots

For d features, create d(d-1)/2 pairwise scatter plots, each showing decision boundaries in that 2D slice. Useful for identifying which feature pairs drive classification.

Strategy 3: Parallel Coordinates

Plot each sample as a polyline across feature axes, colored by class. Reveals separability patterns across features.

Strategy 4: Distance Distribution Analysis

Plot distributions of distances to K nearest neighbors, separated by correct/incorrect predictions. Reveals when KNN fails (distances too large, neighbors too diverse).

When to Use Each Strategy

•PCA projection: For understanding global structure; preserves variance but not necessarily class boundaries
•t-SNE/UMAP projection: For revealing cluster structure; distorts distances but preserves neighborhoods
•Feature pairs: For interpretability; shows which features drive decisions
•Distance analysis: For debugging failures; reveals when k-neighborhood is unreliable

Animation and Interactive Visualization

Static visualizations capture snapshots; animations reveal dynamics.

Animating K:

Create frame-by-frame animations showing decision boundaries as K increases from 1 to n/2. This dramatically illustrates the smoothing effect and helps identify the "sweet spot" K value.

Animating Training Set Growth:

Show how Voronoi cells subdivide as training points are added. Cells shrink, boundaries refine, and the approximation to the true boundary improves.

Interactive Tools:

Plotly/Bokeh: Web-based interactive plots with hover information
Voronoi playground: Drag training points and watch boundaries update
K-slider: Adjust K in real-time and see boundary changes

These tools are invaluable for building intuition and for presentations where audience exploration is desired.

Presentation Best Practices

For teaching: animate K from 1 upward to show smoothing. For debugging: compare true boundary (if known) against learned boundary. For stakeholders: show probability maps rather than hard boundaries to convey uncertainty.

Summary: Visualization as Understanding

Key Takeaways

•Voronoi diagrams visualize 1-NN exactly: Color cells by class to see decision regions directly
•Grid-based plotting works for any K: Evaluate on dense mesh, use contours for boundaries
•Probability maps reveal confidence: Continuous coloring shows where predictions are uncertain
•High-D requires creative approaches: Dimension reduction, feature pairs, distance analysis
•Animation builds intuition: Watch K-smoothing or training set growth dynamically

What's Next:

The final page explores decision surface smoothing with K in detail—how the neighborhood size controls the effective bandwidth of the classifier and connects KNN to kernel methods and local regression.

Page Complete

You now have practical tools for visualizing KNN decision surfaces, from Voronoi diagrams to probability heatmaps to high-dimensional strategies. These visualization skills are essential for model understanding and debugging.

4 / 5

Loading learning content...

Machine LearningK-Nearest Neighbors

Voronoi Diagrams & Decision Surfaces

LevelIntermediate

Duration90 mins

TopicK-Nearest Neighbors

4 / 5

Visualization Techniques

Seeing the Geometry of Classification

This page covers practical techniques for visualizing Voronoi diagrams and KNN decision surfaces—from straightforward 2D plots to creative approaches for understanding higher-dimensional structure.

What You Will Learn

2D Voronoi Diagram Visualization

Two-dimensional visualization is the foundation. In 2D, Voronoi cells are convex polygons, edges are line segments, and the full structure is directly viewable.

Key Visualization Components:

Voronoi cells: Color-filled polygons, one per training point
Cell edges: The decision boundaries between adjacent cells
Sites (generators): The training points themselves
Class coloring: Cells colored by their training point's label
Delaunay overlay: Optional dual structure for context

voronoi_2d_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
 
def visualize_classified_voronoi(X, y, ax=None, bounds=None):
    """
    Create publication-quality Voronoi visualization with class coloring.
    
    Args:
        X: (n, 2) array of training points
        y: (n,) array of class labels
        ax: matplotlib axes (created if None)
        bounds: (xmin, xmax, ymin, ymax) for clipping
    """
    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(10, 8))
    
    if bounds is None:
        margin = 0.5
        bounds = (X[:, 0].min() - margin, X[:, 0].max() + margin,
                  X[:, 1].min() - margin, X[:, 1].max() + margin)
    
    # Add bounding box points for finite cells
    bbox_points = np.array([
        [bounds[0] - 10, bounds[2] - 10],
        [bounds[1] + 10, bounds[2] - 10],
        [bounds[0] - 10, bounds[3] + 10],
        [bounds[1] + 10, bounds[3] + 10]
    ])
    augmented_X = np.vstack([X, bbox_points])
    
    vor = Voronoi(augmented_X)
    
    # Color palette for classes
    colors = plt.cm.Set2(np.linspace(0, 1, len(np.unique(y))))
    
    # Draw filled Voronoi cells
    patches = []
    patch_colors = []
    
    for point_idx in range(len(X)):  # Only original points
        region_idx = vor.point_region[point_idx]
        region = vor.regions[region_idx]
        
        if not region or -1 in region:
            continue
        
        verts = vor.vertices[region]
        # Clip to bounds
        verts = np.clip(verts, [bounds[0], bounds[2]], [bounds[1], bounds[3]])
        
        patches.append(Polygon(verts, closed=True))
        patch_colors.append(colors[y[point_idx]])
    
    pc = PatchCollection(patches, alpha=0.6, edgecolor='gray', linewidth=0.5)
    pc.set_facecolor(patch_colors)
    ax.add_collection(pc)
    
    # Plot training points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], s=80,
                   edgecolor='white', linewidth=1.5, zorder=10,
                   label=f'Class {c}')
    
    ax.set_xlim(bounds[0], bounds[1])
    ax.set_ylim(bounds[2], bounds[3])
    ax.set_aspect('equal')
    ax.legend(loc='best')
    
    return ax
 
# Example usage
if __name__ == "__main__":
    np.random.seed(42)
    X = np.vstack([
        np.random.randn(15, 2) * 0.5 + [1, 1],
        np.random.randn(15, 2) * 0.5 + [3, 3]
    ])
    y = np.array([0]*15 + [1]*15)
    
    fig, ax = plt.subplots(figsize=(10, 8))
    visualize_classified_voronoi(X, y, ax)
    ax.set_title("Voronoi Tessellation with Class Labels", fontweight='bold')
    plt.savefig("voronoi_classified.png", dpi=150)
    plt.show()

Decision Boundary Plotting

For KNN with K > 1, direct Voronoi visualization doesn't apply. Instead, we use grid-based decision boundary plotting—evaluate the classifier on a dense grid and visualize the results.

The Grid Method:

Create a fine mesh grid covering the feature space
Predict class labels for all grid points
Use contour plots to show class regions and boundaries
Overlay training points for context

decision_boundary_plotting.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from matplotlib.colors import ListedColormap
 
def plot_knn_decision_boundary(X, y, k=5, resolution=200, ax=None):
    """
    Plot KNN decision boundaries using grid evaluation.
    
    Args:
        X: Training features (n, 2)
        y: Training labels (n,)
        k: Number of neighbors
        resolution: Grid resolution for contour plot
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(10, 8))
    
    # Create mesh grid
    margin = 0.5
    x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
    y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
    
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, resolution),
        np.linspace(y_min, y_max, resolution)
    )
    grid_points = np.c_[xx.ravel(), yy.ravel()]
    
    # Train and predict
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    Z = knn.predict(grid_points).reshape(xx.shape)
    
    # Plot decision regions
    n_classes = len(np.unique(y))
    colors = plt.cm.Set2(np.linspace(0, 1, n_classes))
    cmap = ListedColormap(colors[:, :3])
    
    ax.contourf(xx, yy, Z, alpha=0.4, cmap=cmap)
    ax.contour(xx, yy, Z, colors='black', linewidths=2)
    
    # Plot training points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], s=80,
                   edgecolor='white', linewidth=1.5, zorder=10,
                   label=f'Class {c}')
    
    ax.set_xlim(x_min, x_max)
    ax.set_ylim(y_min, y_max)
    ax.set_aspect('equal')
    ax.set_title(f'{k}-NN Decision Boundary', fontsize=14, fontweight='bold')
    ax.legend(loc='best')
    
    return ax
 
# Compare different K values
if __name__ == "__main__":
    np.random.seed(42)
    X = np.vstack([
        np.random.randn(50, 2) * 0.8 + [0, 0],
        np.random.randn(50, 2) * 0.8 + [2, 2]
    ])
    y = np.array([0]*50 + [1]*50)
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    for ax, k in zip(axes, [1, 5, 25]):
        plot_knn_decision_boundary(X, y, k=k, ax=ax)
    
    plt.tight_layout()
    plt.savefig("knn_boundaries_comparison.png", dpi=150)

Resolution Trade-offs

Probability and Confidence Visualization

Beyond hard decision boundaries, visualizing prediction probabilities reveals classifier confidence across the feature space.

Probability Estimation in KNN: $$P(Y = c | q) = \frac{1}{K} \sum_{i \in \text{KNN}(q)} \mathbb{1}[y_i = c]$$

This vote fraction provides a natural confidence measure: 5/5 votes is high confidence; 3/5 is borderline.

Visualization Approaches:

Probability heatmaps: Color by P(Y = 1) on continuous scale
Uncertainty maps: Highlight regions where max probability < threshold
Entropy maps: Show classification entropy H = -Σ p log p

probability_visualization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
 
def plot_knn_probability_map(X, y, k=5, resolution=150):
    """
    Visualize KNN prediction probabilities as a heatmap.
    Shows confidence variation across feature space.
    """
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Create mesh
    margin = 0.5
    x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
    y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
    xx, yy = np.meshgrid(
        np.linspace(x_min, x_max, resolution),
        np.linspace(y_min, y_max, resolution)
    )
    grid = np.c_[xx.ravel(), yy.ravel()]
    
    # Fit and predict probabilities
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    proba = knn.predict_proba(grid)[:, 1].reshape(xx.shape)
    
    # Panel 1: Probability for class 1
    ax1 = axes[0]
    im1 = ax1.contourf(xx, yy, proba, levels=20, cmap='RdBu_r')
    ax1.contour(xx, yy, proba, levels=[0.5], colors='black', linewidths=2)
    ax1.scatter(X[:, 0], X[:, 1], c=y, cmap='RdBu_r', 
               edgecolor='white', s=50, zorder=10)
    ax1.set_title('P(Class=1)', fontweight='bold')
    plt.colorbar(im1, ax=ax1)
    
    # Panel 2: Maximum probability (confidence)
    ax2 = axes[1]
    max_proba = np.maximum(proba, 1 - proba)
    im2 = ax2.contourf(xx, yy, max_proba, levels=20, cmap='Greens')
    ax2.scatter(X[:, 0], X[:, 1], c='black', s=30, zorder=10)
    ax2.set_title('Confidence (max P)', fontweight='bold')
    plt.colorbar(im2, ax=ax2)
    
    # Panel 3: Entropy (uncertainty)
    ax3 = axes[2]
    eps = 1e-10
    entropy = -(proba * np.log2(proba + eps) + 
               (1-proba) * np.log2(1-proba + eps))
    im3 = ax3.contourf(xx, yy, entropy, levels=20, cmap='hot_r')
    ax3.scatter(X[:, 0], X[:, 1], c='cyan', s=30, zorder=10)
    ax3.set_title('Entropy (uncertainty)', fontweight='bold')
    plt.colorbar(im3, ax=ax3)
    
    for ax in axes:
        ax.set_xlim(x_min, x_max)
        ax.set_ylim(y_min, y_max)
        ax.set_aspect('equal')
    
    plt.suptitle(f'{k}-NN Probability Visualization', fontsize=14, y=1.02)
    plt.tight_layout()
    return fig
 
if __name__ == "__main__":
    np.random.seed(42)
    X = np.vstack([np.random.randn(40, 2) + [0, 0],
                   np.random.randn(40, 2) + [2.5, 2.5]])
    y = np.array([0]*40 + [1]*40)
    fig = plot_knn_probability_map(X, y, k=7)
    plt.savefig("knn_probability_maps.png", dpi=150)

High-Dimensional Visualization Strategies

While direct visualization fails beyond 3D, several strategies provide insight into high-dimensional KNN behavior.

Strategy 1: Dimension Reduction

Project to 2D using PCA, t-SNE, or UMAP, then visualize decision boundaries in the reduced space. Caveat: boundaries in projected space may not match true high-D boundaries.

Strategy 2: Feature Pair Plots

For d features, create d(d-1)/2 pairwise scatter plots, each showing decision boundaries in that 2D slice. Useful for identifying which feature pairs drive classification.

Strategy 3: Parallel Coordinates

Plot each sample as a polyline across feature axes, colored by class. Reveals separability patterns across features.

Strategy 4: Distance Distribution Analysis

Plot distributions of distances to K nearest neighbors, separated by correct/incorrect predictions. Reveals when KNN fails (distances too large, neighbors too diverse).

When to Use Each Strategy

•PCA projection: For understanding global structure; preserves variance but not necessarily class boundaries
•t-SNE/UMAP projection: For revealing cluster structure; distorts distances but preserves neighborhoods
•Feature pairs: For interpretability; shows which features drive decisions
•Distance analysis: For debugging failures; reveals when k-neighborhood is unreliable

Animation and Interactive Visualization

Static visualizations capture snapshots; animations reveal dynamics.

Animating K:

Create frame-by-frame animations showing decision boundaries as K increases from 1 to n/2. This dramatically illustrates the smoothing effect and helps identify the "sweet spot" K value.

Animating Training Set Growth:

Show how Voronoi cells subdivide as training points are added. Cells shrink, boundaries refine, and the approximation to the true boundary improves.

Interactive Tools:

Plotly/Bokeh: Web-based interactive plots with hover information
Voronoi playground: Drag training points and watch boundaries update
K-slider: Adjust K in real-time and see boundary changes

These tools are invaluable for building intuition and for presentations where audience exploration is desired.

Presentation Best Practices

Summary: Visualization as Understanding

Key Takeaways

•Voronoi diagrams visualize 1-NN exactly: Color cells by class to see decision regions directly
•Grid-based plotting works for any K: Evaluate on dense mesh, use contours for boundaries
•Probability maps reveal confidence: Continuous coloring shows where predictions are uncertain
•High-D requires creative approaches: Dimension reduction, feature pairs, distance analysis
•Animation builds intuition: Watch K-smoothing or training set growth dynamically

What's Next:

Page Complete

4 / 5