Machine LearningK-Nearest Neighbors

Voronoi Diagrams & Decision Surfaces

LevelIntermediate

Duration90 mins

TopicK-Nearest Neighbors

3 / 5

1-NN Creates Voronoi Cells

The Exact Correspondence

In previous pages, we established that Voronoi diagrams partition space by proximity and that K-NN decision boundaries transform with K. Now we focus on the purest case: K = 1, where the correspondence between geometry and classification becomes exact.

The 1-NN classifier is not merely related to Voronoi tessellations—it is the Voronoi tessellation of the training set. Each Voronoi cell corresponds to exactly one training point, and the cell's label is that point's label. The boundaries between classes are precisely the Voronoi edges separating differently-labeled points.

This exact correspondence isn't just theoretical elegance. It enables rigorous analysis of 1-NN behavior, provides geometric insight into classification errors, and reveals why 1-NN has the unique properties it does—including its famous asymptotic error bound.

What You Will Learn

By the end of this page, you will understand the formal proof of 1-NN/Voronoi equivalence, how this structure determines classification error, the famous Cover-Hart theorem on 1-NN asymptotic error, and practical implications of the Voronoi structure for understanding 1-NN behavior.

Formal Proof of 1-NN/Voronoi Equivalence

Let us establish the equivalence with mathematical rigor.

Setup: Let D = {(x₁, y₁), (x₂, y₂), ..., (xₙ, yₙ)} be a training dataset where xᵢ ∈ ℝᵈ are feature vectors and yᵢ ∈ {1, 2, ..., C} are class labels. Let d(·,·) denote the distance metric (typically Euclidean).

Definition (1-NN Classifier): $$\text{NN}1(q) = y{j^} \text{ where } j^ = \arg\min_{j \in {1,...,n}} d(q, x_j)$$

Definition (Voronoi Cell): $$V(x_i) = {q \in \mathbb{R}^d : d(q, x_i) \leq d(q, x_j) \text{ for all } j \neq i}$$

Theorem (1-NN/Voronoi Equivalence): For any query point q, the 1-NN prediction equals the label of the training point whose Voronoi cell contains q: $$\text{NN}_1(q) = y_i \iff q \in V(x_i)$$

Proof:

(⟹) Suppose NN₁(q) = yᵢ. Then xᵢ is the nearest neighbor of q, meaning: $$d(q, x_i) \leq d(q, x_j) \text{ for all } j$$ By definition of Voronoi cell, this means q ∈ V(xᵢ). ∎

(⟸) Suppose q ∈ V(xᵢ). By definition of V(xᵢ): $$d(q, x_i) \leq d(q, x_j) \text{ for all } j$$ Thus xᵢ is the nearest neighbor of q (or tied for nearest), and NN₁(q) = yᵢ. ∎

Handling Ties

When q lies exactly on a Voronoi edge (equidistant from multiple sites), the tie-breaking rule determines the prediction. Common strategies: (1) choose arbitrarily among tied neighbors, (2) choose the one with smallest index, (3) return "undefined" or multiple labels. In practice, exact Voronoi edges have measure zero—ties occur with probability zero for continuous feature distributions.

Corollary (Classification Regions):

The decision region for class c under 1-NN is the union of Voronoi cells belonging to training points of class c: $$R_1(c) = \bigcup_{i: y_i = c} V(x_i)$$

Corollary (Decision Boundaries):

The 1-NN decision boundary between classes c and c' consists of Voronoi edges separating cells V(xᵢ) and V(xⱼ) where yᵢ = c and yⱼ = c'.

Since Voronoi edges are perpendicular bisectors of segments connecting their generating points, 1-NN decision boundaries are piecewise linear (in Euclidean space)—composed of line segments (2D), planar patches (3D), or (d-1)-dimensional hyperplanar facets (general d).

Implication: While 1-NN can model arbitrarily complex class shapes (non-convex, multimodal, etc.), the local decision surface is always locally linear—a patchwork of flat pieces at the microscopic level.

Geometric Properties of 1-NN Decision Regions

The Voronoi structure imposes specific geometric constraints on 1-NN classification. Understanding these properties reveals both the power and limitations of 1-NN.

Property 1: Each Training Point Owns a Convex Region

Every Voronoi cell is convex (intersection of half-spaces). This means:

Within any single Voronoi cell, linear interpolation stays in the cell
A single training point cannot "own" a non-convex region
Complex (non-convex) class regions arise only through combination of multiple convex cells

Property 2: Interpolation at Training Points

Unlike parametric models, 1-NN perfectly interpolates the training data: $$\text{NN}_1(x_i) = y_i \text{ for all training points } x_i$$

This is trivially true since xᵢ is its own nearest neighbor: xᵢ ∈ V(xᵢ). This zero training error property makes 1-NN prone to overfitting—noise in the training labels is perfectly captured.

Property 3: Boundary Sharpness

Voronoi edges form sharp decision boundaries with no transition zone. A query infinitesimally close to the boundary on one side receives a completely different prediction than one infinitesimally on the other side. There's no notion of "confidence" decreasing near boundaries in the basic 1-NN framework.

voronoi_1nn_properties.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
 
def demonstrate_1nn_voronoi_properties():
    """
    Demonstrate key properties of 1-NN via Voronoi visualization:
    1. Convexity of individual cells
    2. Perfect interpolation at training points
    3. Sharp boundaries with no transition zone
    4. Class regions as unions of cells
    """
    np.random.seed(42)
    
    # Training data: simple two-class problem with noise
    # Class 0: cluster at (1, 1)
    X0 = np.random.randn(12, 2) * 0.5 + np.array([1, 1])
    y0 = np.zeros(12, dtype=int)
    
    # Class 1: cluster at (2.5, 2.5)
    X1 = np.random.randn(12, 2) * 0.5 + np.array([2.5, 2.5])
    y1 = np.ones(12, dtype=int)
    
    # Add a noisy point (class 0 in class 1's region)
    X_noise = np.array([[2.3, 2.7]])
    y_noise = np.array([0])
    
    X = np.vstack([X0, X1, X_noise])
    y = np.hstack([y0, y1, y_noise])
    
    # Compute Voronoi diagram
    vor = Voronoi(X)
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    colors = ['#3498db', '#e74c3c']
    
    # --- Panel 1: Full Voronoi with colored cells ---
    ax1 = axes[0]
    ax1.set_title('1-NN Regions = Voronoi Cells', fontsize=12, fontweight='bold')
    
    # Color each Voronoi cell by its label
    for point_idx in range(len(X)):
        region_idx = vor.point_region[point_idx]
        region = vor.regions[region_idx]
        
        if not region or -1 in region:
            continue
        
        polygon_verts = vor.vertices[region]
        patch = Polygon(polygon_verts, alpha=0.4, 
                       facecolor=colors[y[point_idx]], edgecolor='gray')
        ax1.add_patch(patch)
    
    # Plot training points
    for c in [0, 1]:
        mask = y == c
        ax1.scatter(X[mask, 0], X[mask, 1], c=colors[c], s=80,
                   edgecolor='white', linewidth=1.5, zorder=10,
                   label=f'Class {c}')
    
    # Highlight noise point
    ax1.annotate('Noise point\n(Class 0 island)', 
                xy=(X_noise[0, 0], X_noise[0, 1]), 
                xytext=(3.2, 3.5),
                fontsize=9, ha='center',
                arrowprops=dict(arrowstyle='->', color='black'),
                bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    ax1.set_xlim(-0.5, 4.5)
    ax1.set_ylim(-0.5, 4.5)
    ax1.set_aspect('equal')
    ax1.legend(loc='lower right')
    
    # --- Panel 2: Convexity demonstration ---
    ax2 = axes[1]
    ax2.set_title('Property: Each Cell is Convex', fontsize=12, fontweight='bold')
    
    # Pick one cell and show its convex property
    demo_point_idx = 5  # Some cell in the middle
    region_idx = vor.point_region[demo_point_idx]
    region = vor.regions[region_idx]
    
    if region and -1 not in region:
        polygon_verts = vor.vertices[region]
        patch = Polygon(polygon_verts, alpha=0.5, facecolor=colors[y[demo_point_idx]], 
                       edgecolor='navy', linewidth=2)
        ax2.add_patch(patch)
        
        # Show two points in the cell and their connecting line
        p1 = polygon_verts[0]
        p2 = polygon_verts[2]
        ax2.plot([p1[0], p2[0]], [p1[1], p2[1]], 'g-', linewidth=2, 
                label='Line segment stays inside')
        ax2.scatter([p1[0], p2[0]], [p1[1], p2[1]], c='green', s=60, zorder=10)
    
    # Plot the generating point
    ax2.scatter(X[demo_point_idx, 0], X[demo_point_idx, 1], 
               c=colors[y[demo_point_idx]], s=100, marker='*',
               edgecolor='black', linewidth=1.5, zorder=11,
               label='Cell generator (training point)')
    
    ax2.set_xlim(-0.5, 4.5)
    ax2.set_ylim(-0.5, 4.5)
    ax2.set_aspect('equal')
    ax2.legend(loc='lower right', fontsize=9)
    ax2.annotate('Any line between two\npoints in a convex set\nstays inside the set',
                xy=(1.5, 0.3), fontsize=10, ha='center',
                bbox=dict(boxstyle='round', facecolor='lightyellow'))
    
    # --- Panel 3: Boundary sharpness ---
    ax3 = axes[2]
    ax3.set_title('Property: Sharp Boundaries', fontsize=12, fontweight='bold')
    
    # Zoom into a boundary region
    # Find an edge between different classes
    for i, (p1_idx, p2_idx) in enumerate(vor.ridge_points):
        if y[p1_idx] != y[p2_idx]:  # Different classes
            ridge_vertices = vor.ridge_vertices[i]
            if -1 not in ridge_vertices:
                v1, v2 = vor.vertices[ridge_vertices]
                
                # Plot the decision boundary
                ax3.plot([v1[0], v2[0]], [v1[1], v2[1]], 'k-', linewidth=3,
                        label='Decision boundary')
                
                # Points very close to boundary on both sides
                midpoint = (v1 + v2) / 2
                direction = X[p1_idx] - X[p2_idx]
                direction = direction / np.linalg.norm(direction)
                
                epsilon = 0.05
                q1 = midpoint + epsilon * direction
                q2 = midpoint - epsilon * direction
                
                # Show predictions flip
                ax3.scatter(*q1, c=colors[y[p1_idx]], s=200, marker='>', 
                           edgecolor='black', zorder=10)
                ax3.scatter(*q2, c=colors[y[p2_idx]], s=200, marker='<',
                           edgecolor='black', zorder=10)
                
                ax3.annotate(f'→ Class {y[p1_idx]}', xy=q1, 
                            xytext=(q1[0] + 0.3, q1[1]), fontsize=10)
                ax3.annotate(f'← Class {y[p2_idx]}', xy=q2,
                            xytext=(q2[0] - 0.5, q2[1]), fontsize=10, ha='right')
                
                # Training points
                ax3.scatter(X[p1_idx, 0], X[p1_idx, 1], c=colors[y[p1_idx]], 
                           s=100, edgecolor='white', linewidth=1.5, zorder=5)
                ax3.scatter(X[p2_idx, 0], X[p2_idx, 1], c=colors[y[p2_idx]], 
                           s=100, edgecolor='white', linewidth=1.5, zorder=5)
                
                # Zoom window
                ax3.set_xlim(midpoint[0] - 1, midpoint[0] + 1)
                ax3.set_ylim(midpoint[1] - 1, midpoint[1] + 1)
                break
    
    ax3.set_aspect('equal')
    ax3.text(0.5, 0.02, 'Adjacent points get opposite predictions',
            transform=ax3.transAxes, fontsize=10, ha='center', style='italic')
    
    plt.tight_layout()
    return fig
 
if __name__ == "__main__":
    fig = demonstrate_1nn_voronoi_properties()
    plt.savefig("1nn_voronoi_properties.png", dpi=150, bbox_inches='tight')
    plt.show()

Property 4: Training Points are Cell Centers

Each Voronoi cell contains its generating point (the training point). Moreover, the training point is the unique point in its cell closest to all neighboring cells' generators. However, the training point is not necessarily the geometric centroid of its cell—cells can be highly asymmetric.

Property 5: Locality of Influence

Each training point's influence is strictly local—confined to its Voronoi cell. Points in cell V(xᵢ) are completely determined by xᵢ; other training points only contribute through defining where V(xᵢ)'s boundaries lie. This is fundamentally different from parametric models where all training data influences model parameters globally.

The Cover-Hart Theorem: Asymptotic Error Bounds

One of the most remarkable results in machine learning theory is the Cover-Hart theorem (1967), which establishes tight bounds on the asymptotic error rate of the 1-NN classifier. The Voronoi structure is essential to understanding this result.

Definition (Bayes Optimal Error):

The Bayes optimal classifier minimizes the probability of error given the true class-conditional distributions: $$R^* = 1 - \mathbb{E}_X[\max_c P(Y = c | X)]$$

This is the lowest achievable error rate—no classifier can do better.

Theorem (Cover-Hart, 1967):

As the number of training points n → ∞, the 1-NN error rate R₁ satisfies: $$R^* \leq R_1 \leq R^(2 - \frac{C}{C-1}R^)$$

For the binary case (C = 2): $$R^* \leq R_1 \leq 2R^(1 - R^)$$

And importantly: $$R_1 \leq 2R^*$$

In words: The 1-NN error rate is at most twice the Bayes optimal error rate, asymptotically.

Interpretation

This is remarkable: with sufficient data, a completely non-parametric method—requiring no model assumptions, no training procedure, no parameter tuning—achieves error within a factor of 2 of the theoretical optimum. For problems where R* is small, 1-NN is nearly optimal. If R* = 5%, then R₁ ≤ 10%.

Intuition via Voronoi Structure:

As n → ∞, Voronoi cells shrink. In the limit:

Each query point q has a training point xᵢ arbitrarily close
The label yᵢ is drawn from P(Y|X = xᵢ)
Since xᵢ → q, we have P(Y|X = xᵢ) → P(Y|X = q)

Now consider error analysis:

Bayes optimal at q: Predict class c* = argmax P(Y = c | X = q). Error probability = 1 - max_c P(Y = c | q).
1-NN at q: Predict whatever label the nearest training point happened to have. Two ways to err:
- The nearest training point has the wrong label (its true class was c* but it was labeled differently due to noise in the data generation process)
- Different from Bayes: even if training point had correct label, it might differ from q's true optimal

The factor of 2 arises because 1-NN can make two types of mistakes: (1) the training point near q was mislabeled, and (2) even if correctly labeled, it might not represent q's true distribution perfectly.

Proof Sketch:

Let η(x) = P(Y = 1 | X = x) be the regression function for binary classification. At a query point q:

Bayes error at q: min(η(q), 1 - η(q))
1-NN error at q (as n → ∞): η(q)(1 - η(q)) + (1 - η(q))η(q) = 2η(q)(1 - η(q))

Integrating over the distribution of X gives: R₁ = 2∫η(x)(1 - η(x))dP(x)

The bound R₁ ≤ 2R* follows from noting that 2η(1-η) ≤ 2·min(η, 1-η).

1-NN Error Bounds vs Bayes Optimal
Bayes Error R*	1-NN Upper Bound	1-NN Lower Bound	Bound Tightness
0%	0%	0%	Exact (trivial case)
5%	≤ 9.5%	≥ 5%	Nearly tight at low error
10%	≤ 18%	≥ 10%	Still quite tight
20%	≤ 32%	≥ 20%	Gap widening
30%	≤ 42%	≥ 30%	Significant gap possible
50%	≤ 50%	≥ 50%	Random guessing bound

Extension to K-NN:

For k-NN with k → ∞ and k/n → 0 as n → ∞, the error rate converges to the Bayes optimal R*: $$\lim_{n \to \infty} R_k = R^*$$

This is why increasing K generally improves performance up to a point—it approaches Bayes optimality. However, finite sample effects and the bias introduced by larger K often dominate in practice.

Practical Implications:

1-NN is a strong baseline: For well-behaved problems, 1-NN provides competitive performance with no tuning.
Error rate interpretation: If your 1-NN achieves 15% error, the Bayes optimal rate is at least 7.5%—the problem has inherent difficulty.
Motivation for larger K: Since K-NN approaches Bayes optimality while 1-NN only achieves factor-2, there's theoretical motivation to use K > 1.

Voronoi-Based Error Analysis

The Voronoi structure enables precise analysis of where and why 1-NN makes mistakes. This geometric perspective complements the probabilistic analysis of Cover-Hart.

Error Decomposition by Cell:

The expected 1-NN error can be written as a sum over Voronoi cells: $$R_1 = \sum_{i=1}^{n} P(q \in V(x_i)) \cdot P(\text{error} | q \in V(x_i))$$

where:

P(q ∈ V(xᵢ)) is the probability mass of cell V(xᵢ) under the query distribution
P(error | q ∈ V(xᵢ)) is the probability of misclassification given the query lands in cell i

Sources of 1-NN Error:

Label noise: Training point xᵢ has incorrect label due to noise in the data collection process. All queries in V(xᵢ) inherit this error.
Boundary mismatch: True class boundary doesn't align with Voronoi edges. Some portion of V(xᵢ) truly belongs to a different class.
Finite sample effects: With limited n, cells are large. Queries far from their nearest neighbor may differ in true class.

voronoi_error_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi
from sklearn.neighbors import KNeighborsClassifier
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
 
def analyze_1nn_error_by_cell():
    """
    Analyze 1-NN error contribution from each Voronoi cell.
    
    Shows:
    1. Cell-by-cell error rate estimation
    2. Identification of high-error cells (near boundaries, noisy labels)
    3. Relationship between cell size and error
    """
    np.random.seed(42)
    
    # Generate data with known true boundary (y = x + 0.5)
    n_train = 40
    X_train = np.random.uniform(0, 4, (n_train, 2))
    y_train_true = (X_train[:, 1] > X_train[:, 0] + 0.5).astype(int)
    
    # Add label noise (10%)
    noise_mask = np.random.random(n_train) < 0.1
    y_train = y_train_true.copy()
    y_train[noise_mask] = 1 - y_train[noise_mask]
    
    # Compute Voronoi
    vor = Voronoi(X_train)
    
    # Estimate per-cell error rate using test points
    n_test = 5000
    X_test = np.random.uniform(0, 4, (n_test, 2))
    y_test = (X_test[:, 1] > X_test[:, 0] + 0.5).astype(int)
    
    # 1-NN predictions
    knn = KNeighborsClassifier(n_neighbors=1)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    errors = y_pred != y_test
    
    # Assign each test point to its Voronoi cell
    _, cell_indices = knn.kneighbors(X_test, n_neighbors=1)
    cell_indices = cell_indices.flatten()
    
    # Compute per-cell error rate
    cell_error_rates = np.zeros(n_train)
    cell_counts = np.zeros(n_train)
    
    for idx, (cell_idx, is_error) in enumerate(zip(cell_indices, errors)):
        cell_counts[cell_idx] += 1
        if is_error:
            cell_error_rates[cell_idx] += 1
    
    # Avoid division by zero
    mask = cell_counts > 0
    cell_error_rates[mask] /= cell_counts[mask]
    
    # Visualization
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # --- Panel 1: Cells colored by error rate ---
    ax1 = axes[0]
    ax1.set_title('Per-Cell Error Rate', fontsize=12, fontweight='bold')
    
    patches = []
    patch_colors = []
    
    for point_idx in range(n_train):
        region_idx = vor.point_region[point_idx]
        region = vor.regions[region_idx]
        
        if not region or -1 in region:
            continue
        
        polygon_verts = vor.vertices[region]
        # Clip to bounding box
        polygon_verts = np.clip(polygon_verts, 0, 4)
        
        patches.append(Polygon(polygon_verts))
        patch_colors.append(cell_error_rates[point_idx])
    
    pc = PatchCollection(patches, cmap='RdYlGn_r', alpha=0.7, edgecolor='gray')
    pc.set_array(np.array(patch_colors))
    pc.set_clim(0, 1)
    ax1.add_collection(pc)
    
    # Training points
    ax1.scatter(X_train[:, 0], X_train[:, 1], c='black', s=40, zorder=10)
    
    # True boundary
    ax1.plot([0, 3.5], [0.5, 4], 'k--', linewidth=2, label='True boundary')
    
    ax1.set_xlim(0, 4)
    ax1.set_ylim(0, 4)
    ax1.set_aspect('equal')
    plt.colorbar(pc, ax=ax1, label='Cell Error Rate')
    ax1.legend(loc='lower right')
    
    # --- Panel 2: Highlight problematic cells ---
    ax2 = axes[1]
    ax2.set_title('High-Error Cells Identified', fontsize=12, fontweight='bold')
    
    # Color by class
    colors = ['#3498db', '#e74c3c']
    for c in [0, 1]:
        mask = y_train == c
        ax2.scatter(X_train[mask, 0], X_train[mask, 1], 
                   c=colors[c], s=60, edgecolor='white', linewidth=1.5,
                   label=f'Class {c}')
    
    # Mark noisy points
    ax2.scatter(X_train[noise_mask, 0], X_train[noise_mask, 1], 
               facecolors='none', edgecolors='yellow', linewidths=3, s=200,
               label='Noisy labels', zorder=11)
    
    # Mark high-error cells
    high_error = cell_error_rates > 0.3
    ax2.scatter(X_train[high_error, 0], X_train[high_error, 1], 
               facecolors='none', edgecolors='black', linewidths=2, s=150,
               marker='s', label='High error (>30%)', zorder=10)
    
    ax2.plot([0, 3.5], [0.5, 4], 'k--', linewidth=2)
    ax2.set_xlim(0, 4)
    ax2.set_ylim(0, 4)
    ax2.set_aspect('equal')
    ax2.legend(loc='lower right', fontsize=9)
    
    # --- Panel 3: Error vs distance to boundary ---
    ax3 = axes[2]
    ax3.set_title('Error Rate vs Distance to True Boundary', 
                 fontsize=12, fontweight='bold')
    
    # Distance from each training point to true boundary (y = x + 0.5)
    # Line: x - y + 0.5 = 0, distance = |x - y + 0.5| / sqrt(2)
    dist_to_boundary = np.abs(X_train[:, 0] - X_train[:, 1] + 0.5) / np.sqrt(2)
    
    # Scatter plot with enough counts
    valid = cell_counts > 10
    ax3.scatter(dist_to_boundary[valid], cell_error_rates[valid], 
               s=cell_counts[valid], alpha=0.7,
               c=y_train[valid], cmap='RdBu')
    
    ax3.set_xlabel('Distance to True Boundary', fontsize=11)
    ax3.set_ylabel('Cell Error Rate', fontsize=11)
    ax3.set_xlim(0, 2.5)
    ax3.set_ylim(-0.05, 1.05)
    ax3.axhline(y=0.5, color='gray', linestyle=':', alpha=0.5)
    
    # Add trend line
    z = np.polyfit(dist_to_boundary[valid], cell_error_rates[valid], 1)
    p = np.poly1d(z)
    x_line = np.linspace(0, 2.5, 100)
    ax3.plot(x_line, np.clip(p(x_line), 0, 1), 'r-', linewidth=2, 
            label=f'Trend: slope={z[0]:.2f}')
    ax3.legend()
    
    plt.tight_layout()
    
    # Summary statistics
    total_error = errors.sum() / len(errors)
    print(f"Overall 1-NN Error Rate: {total_error:.2%}")
    print(f"Cells with >30% error: {(cell_error_rates > 0.3).sum()} / {n_train}")
    print(f"Noisy training points: {noise_mask.sum()}")
    
    return fig
 
if __name__ == "__main__":
    fig = analyze_1nn_error_by_cell()
    plt.savefig("voronoi_error_analysis.png", dpi=150, bbox_inches='tight')
    plt.show()

Geometric Insights for Error Reduction

•Cells near boundaries have higher error: Training points close to the true class boundary create cells that straddle both classes, leading to systematic misclassification.
•Noisy labels create error islands: A mislabeled training point creates a Voronoi cell where every query is misclassified. Editing or removing such points is crucial.
•Large cells indicate sparse regions: Where training data is sparse, cells are large. Large cells may contain diverse true labels, increasing error.
•Cell shape matters: Elongated cells extending toward the wrong class region are problematic. Isotropic (round) cells are preferable.

Algorithmic Implications of Voronoi Structure

The Voronoi-1NN equivalence has direct implications for algorithm design, both for improving 1-NN performance and for leveraging Voronoi properties in other contexts.

Condensing: Removing Redundant Points

Not all training points contribute equally. Interior points (whose Voronoi cells don't touch the decision boundary) are redundant—removing them doesn't change the decision boundary.

Condensed Nearest Neighbor (CNN):

Start with S = {one point per class}
Scan training set; add point x to S if 1-NN(x) using S ≠ true label
Repeat until no points added

The resulting set S is consistent: it classifies all original training points correctly while being (potentially much) smaller.

Editing: Removing Noisy Points

Points whose labels disagree with their neighbors are likely noise. Editing removes them to create smoother boundaries.

Wilson's Edited Nearest Neighbor (ENN):

For each training point x, compute its K nearest neighbors
If majority of K neighbors have different label than x, remove x
Return cleaned dataset

This eliminates points that create Voronoi islands of incorrect predictions.

condensing_editing_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
 
def condensed_nearest_neighbor(X, y, random_state=None):
    """
    Condensed Nearest Neighbor (Hart, 1968).
    
    Returns a subset S such that 1-NN on S correctly classifies all of X.
    
    The condensed set contains only "boundary" points - those whose Voronoi
    cells touch the decision boundary. Interior points are redundant.
    """
    rng = np.random.default_rng(random_state)
    n_samples = len(X)
    classes = np.unique(y)
    
    # Initialize S with one random point per class
    S_indices = []
    for c in classes:
        class_indices = np.where(y == c)[0]
        S_indices.append(rng.choice(class_indices))
    S_indices = list(S_indices)
    
    # Repeatedly scan and absorb misclassified points
    changed = True
    while changed:
        changed = False
        
        # Build 1-NN with current S
        knn = KNeighborsClassifier(n_neighbors=1)
        knn.fit(X[S_indices], y[S_indices])
        
        # Check all training points
        for i in range(n_samples):
            if i in S_indices:
                continue
            
            pred = knn.predict(X[i:i+1])[0]
            if pred != y[i]:
                # Misclassified - add to S
                S_indices.append(i)
                changed = True
    
    return np.array(S_indices), X[S_indices], y[S_indices]
 
def edited_nearest_neighbor(X, y, k=3):
    """
    Edited Nearest Neighbor (Wilson, 1972).
    
    Removes points that are misclassified by their K nearest neighbors.
    These are likely noise points or boundary outliers.
    """
    n_samples = len(X)
    
    # Find K nearest neighbors for each point (excluding self)
    knn = KNeighborsClassifier(n_neighbors=k+1)  # +1 because self is included
    knn.fit(X, y)
    
    distances, indices = knn.kneighbors(X)
    
    keep_mask = np.ones(n_samples, dtype=bool)
    
    for i in range(n_samples):
        # Neighbors (excluding self)
        neighbor_indices = indices[i, 1:k+1]  # Skip first (self)
        neighbor_labels = y[neighbor_indices]
        
        # Majority vote
        majority_label = np.bincount(neighbor_labels).argmax()
        
        # If disagreement with majority, mark for removal
        if y[i] != majority_label:
            keep_mask[i] = False
    
    return keep_mask, X[keep_mask], y[keep_mask]
 
def compare_condensing_editing():
    """Compare original, condensed, and edited datasets."""
    np.random.seed(42)
    
    # Generate noisy dataset
    X, y = make_classification(
        n_samples=500, n_features=2, n_informative=2,
        n_redundant=0, n_clusters_per_class=2,
        flip_y=0.15, random_state=42
    )
    
    print("Original dataset:")
    print(f"  Size: {len(X)} points")
    
    # Condensed NN
    S_idx, X_cond, y_cond = condensed_nearest_neighbor(X, y, random_state=42)
    print(f"\nCondensed dataset:")
    print(f"  Size: {len(X_cond)} points ({100*len(X_cond)/len(X):.1f}%)")
    
    # Check accuracy on original data
    knn_cond = KNeighborsClassifier(n_neighbors=1)
    knn_cond.fit(X_cond, y_cond)
    cond_acc = (knn_cond.predict(X) == y).mean()
    print(f"  Training accuracy: {cond_acc:.2%}")
    
    # Edited NN
    keep_mask, X_edit, y_edit = edited_nearest_neighbor(X, y, k=3)
    print(f"\nEdited dataset:")
    print(f"  Size: {len(X_edit)} points ({100*len(X_edit)/len(X):.1f}%)")
    print(f"  Points removed: {(~keep_mask).sum()}")
    
    return X, y, X_cond, y_cond, X_edit, y_edit
 
if __name__ == "__main__":
    X, y, X_cond, y_cond, X_edit, y_edit = compare_condensing_editing()

Prototype Selection:

Instead of using all training points, select a smaller set of prototypes that captures the data structure. This reduces storage and query time while potentially improving generalization.

Methods include:

K-means clustering followed by using centroids as prototypes
Learning Vector Quantization (LVQ) for supervised prototype placement
Prototype-based condensation variants

Voronoi-Aware Data Structures:

Knowing that 1-NN corresponds to Voronoi enables specialized data structures:

Point location structures: Precompute Voronoi and use point location for O(log n) queries
Delaunay walking: Navigate Delaunay triangulation toward query point
Voronoi filtering: Skip clearly irrelevant candidates based on cell structure

These approaches trade preprocessing time for query efficiency—useful when queries far outnumber training updates.

High-Dimensional Limitations

Explicit Voronoi construction becomes impractical in high dimensions (complexity O(n^⌈d/2⌉)). For d > 5-10, approximate nearest neighbor methods (LSH, randomized KD-trees) are essential. The conceptual connection to Voronoi remains valuable for understanding, but explicit geometric construction is infeasible.

Connections to Other Methods

The Voronoi-1NN connection illuminates relationships to other machine learning methods and geometric algorithms.

Connection to RBF Networks:

Radial Basis Function networks with one unit per training point and Gaussian activations create "soft" Voronoi-like partitions. The hard assignment of Voronoi cells becomes weighted contributions: $$f(q) = \sum_i w_i \exp(-\gamma|q - x_i|^2)$$

As γ → ∞, this approaches 1-NN (winner-take-all).

Connection to Gaussian Mixture Models:

Hard clustering via GMM (assigning each point to its most probable cluster) creates a partition analogous to Voronoi. With equal-covariance spherical Gaussians, the partition is exactly Voronoi. With different covariances, boundaries become curved.

Connection to Support Vector Machines:

Interestingly, SVM support vectors often lie near class boundaries—similar to how "boundary" training points in 1-NN are most important (non-redundant). The condensed 1-NN set is conceptually related to support vectors.

Methods Connected to Voronoi Structure
Method	Voronoi Connection	Key Difference
1-NN	Exact Voronoi tessellation	Hard boundaries, no parameters
K-NN (K>1)	Order-K Voronoi / smoothed	Voting creates boundary widening
RBF Networks	Soft Voronoi (Gaussian decay)	Continuous influence regions
K-Means Clustering	Voronoi of centroids	Centroids learned, not training points
GMM (spherical)	Weighted/soft Voronoi	Probabilistic assignment
Learning Vector Quantization	Adaptive Voronoi prototypes	Prototypes move during training

Connection to Decision Trees:

Decision trees create axis-aligned rectangular partitions. These can be seen as a restricted form of Voronoi-like partitioning where boundaries must be perpendicular to coordinate axes. The loss of diagonal boundaries is compensated by hierarchical structure and interpretability.

Connection to Kernel Density Estimation:

1-NN classification can be viewed as classification based on estimated class densities: $$\hat{f}c(q) = \frac{1}{|\text{class } c|} \sum{i: y_i = c} \delta(x_i)$$

The "nearest point wins" rule implicitly estimates local density as concentrated at the nearest training point. Proper kernel density estimation replaces delta functions with smooth kernels, producing smooth density estimates and hence smoother decision boundaries.

Summary: The Definitive 1-NN Characterization

This page has established the precise and complete correspondence between 1-NN classification and Voronoi tessellation. This is not just an analogy—it's a mathematical identity that enables both theoretical analysis and practical algorithm design.

Key Takeaways

•Exact equivalence: 1-NN decision regions are precisely Voronoi cells. This is a theorem, not an approximation.
•Geometric properties inherited: Convexity of cells, piecewise linear boundaries, and locality of influence all follow from Voronoi structure.
•Cover-Hart theorem: 1-NN achieves error ≤ 2× Bayes optimal asymptotically—a remarkable result for a non-parametric method.
•Error localization: Each Voronoi cell contributes independently to error; cells near boundaries and noisy cells contribute most.
•Algorithmic insights: Condensing removes interior (redundant) points; editing removes noisy boundary points. Both exploit Voronoi structure.
•Connections abound: RBF networks, K-means, GMMs, and LVQ all relate to Voronoi-based spatial partitioning.

What's Next:

With the 1-NN/Voronoi equivalence firmly established, we turn to practical visualization. The next page explores techniques for visualizing Voronoi structures and KNN decision surfaces in two and higher dimensions, building intuition that transfers to problems where direct visualization is impossible.

Page Complete

You now have a rigorous understanding of why 1-NN classification creates Voronoi cells, the theoretical guarantees this provides (Cover-Hart), and how this structure informs practical algorithm design. This geometric foundation underlies all K-NN variants and related methods.

3 / 5

Loading learning content...

Machine LearningK-Nearest Neighbors

Voronoi Diagrams & Decision Surfaces

LevelIntermediate

Duration90 mins

TopicK-Nearest Neighbors

3 / 5

1-NN Creates Voronoi Cells

The Exact Correspondence

What You Will Learn

Formal Proof of 1-NN/Voronoi Equivalence

Let us establish the equivalence with mathematical rigor.

Definition (1-NN Classifier): $$\text{NN}1(q) = y{j^} \text{ where } j^ = \arg\min_{j \in {1,...,n}} d(q, x_j)$$

Definition (Voronoi Cell): $$V(x_i) = {q \in \mathbb{R}^d : d(q, x_i) \leq d(q, x_j) \text{ for all } j \neq i}$$

Theorem (1-NN/Voronoi Equivalence): For any query point q, the 1-NN prediction equals the label of the training point whose Voronoi cell contains q: $$\text{NN}_1(q) = y_i \iff q \in V(x_i)$$

Proof:

(⟹) Suppose NN₁(q) = yᵢ. Then xᵢ is the nearest neighbor of q, meaning: $$d(q, x_i) \leq d(q, x_j) \text{ for all } j$$ By definition of Voronoi cell, this means q ∈ V(xᵢ). ∎

(⟸) Suppose q ∈ V(xᵢ). By definition of V(xᵢ): $$d(q, x_i) \leq d(q, x_j) \text{ for all } j$$ Thus xᵢ is the nearest neighbor of q (or tied for nearest), and NN₁(q) = yᵢ. ∎

Handling Ties

Corollary (Classification Regions):

The decision region for class c under 1-NN is the union of Voronoi cells belonging to training points of class c: $$R_1(c) = \bigcup_{i: y_i = c} V(x_i)$$

Corollary (Decision Boundaries):

The 1-NN decision boundary between classes c and c' consists of Voronoi edges separating cells V(xᵢ) and V(xⱼ) where yᵢ = c and yⱼ = c'.

Geometric Properties of 1-NN Decision Regions

The Voronoi structure imposes specific geometric constraints on 1-NN classification. Understanding these properties reveals both the power and limitations of 1-NN.

Property 1: Each Training Point Owns a Convex Region

Every Voronoi cell is convex (intersection of half-spaces). This means:

Within any single Voronoi cell, linear interpolation stays in the cell
A single training point cannot "own" a non-convex region
Complex (non-convex) class regions arise only through combination of multiple convex cells

Property 2: Interpolation at Training Points

Unlike parametric models, 1-NN perfectly interpolates the training data: $$\text{NN}_1(x_i) = y_i \text{ for all training points } x_i$$

Property 3: Boundary Sharpness

voronoi_1nn_properties.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
 
def demonstrate_1nn_voronoi_properties():
    """
    Demonstrate key properties of 1-NN via Voronoi visualization:
    1. Convexity of individual cells
    2. Perfect interpolation at training points
    3. Sharp boundaries with no transition zone
    4. Class regions as unions of cells
    """
    np.random.seed(42)
    
    # Training data: simple two-class problem with noise
    # Class 0: cluster at (1, 1)
    X0 = np.random.randn(12, 2) * 0.5 + np.array([1, 1])
    y0 = np.zeros(12, dtype=int)
    
    # Class 1: cluster at (2.5, 2.5)
    X1 = np.random.randn(12, 2) * 0.5 + np.array([2.5, 2.5])
    y1 = np.ones(12, dtype=int)
    
    # Add a noisy point (class 0 in class 1's region)
    X_noise = np.array([[2.3, 2.7]])
    y_noise = np.array([0])
    
    X = np.vstack([X0, X1, X_noise])
    y = np.hstack([y0, y1, y_noise])
    
    # Compute Voronoi diagram
    vor = Voronoi(X)
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    colors = ['#3498db', '#e74c3c']
    
    # --- Panel 1: Full Voronoi with colored cells ---
    ax1 = axes[0]
    ax1.set_title('1-NN Regions = Voronoi Cells', fontsize=12, fontweight='bold')
    
    # Color each Voronoi cell by its label
    for point_idx in range(len(X)):
        region_idx = vor.point_region[point_idx]
        region = vor.regions[region_idx]
        
        if not region or -1 in region:
            continue
        
        polygon_verts = vor.vertices[region]
        patch = Polygon(polygon_verts, alpha=0.4, 
                       facecolor=colors[y[point_idx]], edgecolor='gray')
        ax1.add_patch(patch)
    
    # Plot training points
    for c in [0, 1]:
        mask = y == c
        ax1.scatter(X[mask, 0], X[mask, 1], c=colors[c], s=80,
                   edgecolor='white', linewidth=1.5, zorder=10,
                   label=f'Class {c}')
    
    # Highlight noise point
    ax1.annotate('Noise point\n(Class 0 island)', 
                xy=(X_noise[0, 0], X_noise[0, 1]), 
                xytext=(3.2, 3.5),
                fontsize=9, ha='center',
                arrowprops=dict(arrowstyle='->', color='black'),
                bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    ax1.set_xlim(-0.5, 4.5)
    ax1.set_ylim(-0.5, 4.5)
    ax1.set_aspect('equal')
    ax1.legend(loc='lower right')
    
    # --- Panel 2: Convexity demonstration ---
    ax2 = axes[1]
    ax2.set_title('Property: Each Cell is Convex', fontsize=12, fontweight='bold')
    
    # Pick one cell and show its convex property
    demo_point_idx = 5  # Some cell in the middle
    region_idx = vor.point_region[demo_point_idx]
    region = vor.regions[region_idx]
    
    if region and -1 not in region:
        polygon_verts = vor.vertices[region]
        patch = Polygon(polygon_verts, alpha=0.5, facecolor=colors[y[demo_point_idx]], 
                       edgecolor='navy', linewidth=2)
        ax2.add_patch(patch)
        
        # Show two points in the cell and their connecting line
        p1 = polygon_verts[0]
        p2 = polygon_verts[2]
        ax2.plot([p1[0], p2[0]], [p1[1], p2[1]], 'g-', linewidth=2, 
                label='Line segment stays inside')
        ax2.scatter([p1[0], p2[0]], [p1[1], p2[1]], c='green', s=60, zorder=10)
    
    # Plot the generating point
    ax2.scatter(X[demo_point_idx, 0], X[demo_point_idx, 1], 
               c=colors[y[demo_point_idx]], s=100, marker='*',
               edgecolor='black', linewidth=1.5, zorder=11,
               label='Cell generator (training point)')
    
    ax2.set_xlim(-0.5, 4.5)
    ax2.set_ylim(-0.5, 4.5)
    ax2.set_aspect('equal')
    ax2.legend(loc='lower right', fontsize=9)
    ax2.annotate('Any line between two\npoints in a convex set\nstays inside the set',
                xy=(1.5, 0.3), fontsize=10, ha='center',
                bbox=dict(boxstyle='round', facecolor='lightyellow'))
    
    # --- Panel 3: Boundary sharpness ---
    ax3 = axes[2]
    ax3.set_title('Property: Sharp Boundaries', fontsize=12, fontweight='bold')
    
    # Zoom into a boundary region
    # Find an edge between different classes
    for i, (p1_idx, p2_idx) in enumerate(vor.ridge_points):
        if y[p1_idx] != y[p2_idx]:  # Different classes
            ridge_vertices = vor.ridge_vertices[i]
            if -1 not in ridge_vertices:
                v1, v2 = vor.vertices[ridge_vertices]
                
                # Plot the decision boundary
                ax3.plot([v1[0], v2[0]], [v1[1], v2[1]], 'k-', linewidth=3,
                        label='Decision boundary')
                
                # Points very close to boundary on both sides
                midpoint = (v1 + v2) / 2
                direction = X[p1_idx] - X[p2_idx]
                direction = direction / np.linalg.norm(direction)
                
                epsilon = 0.05
                q1 = midpoint + epsilon * direction
                q2 = midpoint - epsilon * direction
                
                # Show predictions flip
                ax3.scatter(*q1, c=colors[y[p1_idx]], s=200, marker='>', 
                           edgecolor='black', zorder=10)
                ax3.scatter(*q2, c=colors[y[p2_idx]], s=200, marker='<',
                           edgecolor='black', zorder=10)
                
                ax3.annotate(f'→ Class {y[p1_idx]}', xy=q1, 
                            xytext=(q1[0] + 0.3, q1[1]), fontsize=10)
                ax3.annotate(f'← Class {y[p2_idx]}', xy=q2,
                            xytext=(q2[0] - 0.5, q2[1]), fontsize=10, ha='right')
                
                # Training points
                ax3.scatter(X[p1_idx, 0], X[p1_idx, 1], c=colors[y[p1_idx]], 
                           s=100, edgecolor='white', linewidth=1.5, zorder=5)
                ax3.scatter(X[p2_idx, 0], X[p2_idx, 1], c=colors[y[p2_idx]], 
                           s=100, edgecolor='white', linewidth=1.5, zorder=5)
                
                # Zoom window
                ax3.set_xlim(midpoint[0] - 1, midpoint[0] + 1)
                ax3.set_ylim(midpoint[1] - 1, midpoint[1] + 1)
                break
    
    ax3.set_aspect('equal')
    ax3.text(0.5, 0.02, 'Adjacent points get opposite predictions',
            transform=ax3.transAxes, fontsize=10, ha='center', style='italic')
    
    plt.tight_layout()
    return fig
 
if __name__ == "__main__":
    fig = demonstrate_1nn_voronoi_properties()
    plt.savefig("1nn_voronoi_properties.png", dpi=150, bbox_inches='tight')
    plt.show()

Property 4: Training Points are Cell Centers

Property 5: Locality of Influence

The Cover-Hart Theorem: Asymptotic Error Bounds

Definition (Bayes Optimal Error):

The Bayes optimal classifier minimizes the probability of error given the true class-conditional distributions: $$R^* = 1 - \mathbb{E}_X[\max_c P(Y = c | X)]$$

This is the lowest achievable error rate—no classifier can do better.

Theorem (Cover-Hart, 1967):

As the number of training points n → ∞, the 1-NN error rate R₁ satisfies: $$R^* \leq R_1 \leq R^(2 - \frac{C}{C-1}R^)$$

For the binary case (C = 2): $$R^* \leq R_1 \leq 2R^(1 - R^)$$

And importantly: $$R_1 \leq 2R^*$$

In words: The 1-NN error rate is at most twice the Bayes optimal error rate, asymptotically.

Interpretation

Intuition via Voronoi Structure:

As n → ∞, Voronoi cells shrink. In the limit:

Each query point q has a training point xᵢ arbitrarily close
The label yᵢ is drawn from P(Y|X = xᵢ)
Since xᵢ → q, we have P(Y|X = xᵢ) → P(Y|X = q)

Now consider error analysis:

Bayes optimal at q: Predict class c* = argmax P(Y = c | X = q). Error probability = 1 - max_c P(Y = c | q).
1-NN at q: Predict whatever label the nearest training point happened to have. Two ways to err:
- The nearest training point has the wrong label (its true class was c* but it was labeled differently due to noise in the data generation process)
- Different from Bayes: even if training point had correct label, it might differ from q's true optimal

Proof Sketch:

Let η(x) = P(Y = 1 | X = x) be the regression function for binary classification. At a query point q:

Bayes error at q: min(η(q), 1 - η(q))
1-NN error at q (as n → ∞): η(q)(1 - η(q)) + (1 - η(q))η(q) = 2η(q)(1 - η(q))

Integrating over the distribution of X gives: R₁ = 2∫η(x)(1 - η(x))dP(x)

The bound R₁ ≤ 2R* follows from noting that 2η(1-η) ≤ 2·min(η, 1-η).

1-NN Error Bounds vs Bayes Optimal
Bayes Error R*	1-NN Upper Bound	1-NN Lower Bound	Bound Tightness
0%	0%	0%	Exact (trivial case)
5%	≤ 9.5%	≥ 5%	Nearly tight at low error
10%	≤ 18%	≥ 10%	Still quite tight
20%	≤ 32%	≥ 20%	Gap widening
30%	≤ 42%	≥ 30%	Significant gap possible
50%	≤ 50%	≥ 50%	Random guessing bound

Extension to K-NN:

For k-NN with k → ∞ and k/n → 0 as n → ∞, the error rate converges to the Bayes optimal R*: $$\lim_{n \to \infty} R_k = R^*$$

This is why increasing K generally improves performance up to a point—it approaches Bayes optimality. However, finite sample effects and the bias introduced by larger K often dominate in practice.

Practical Implications:

1-NN is a strong baseline: For well-behaved problems, 1-NN provides competitive performance with no tuning.
Error rate interpretation: If your 1-NN achieves 15% error, the Bayes optimal rate is at least 7.5%—the problem has inherent difficulty.
Motivation for larger K: Since K-NN approaches Bayes optimality while 1-NN only achieves factor-2, there's theoretical motivation to use K > 1.

Voronoi-Based Error Analysis

The Voronoi structure enables precise analysis of where and why 1-NN makes mistakes. This geometric perspective complements the probabilistic analysis of Cover-Hart.

Error Decomposition by Cell:

The expected 1-NN error can be written as a sum over Voronoi cells: $$R_1 = \sum_{i=1}^{n} P(q \in V(x_i)) \cdot P(\text{error} | q \in V(x_i))$$

where:

P(q ∈ V(xᵢ)) is the probability mass of cell V(xᵢ) under the query distribution
P(error | q ∈ V(xᵢ)) is the probability of misclassification given the query lands in cell i

Sources of 1-NN Error:

Label noise: Training point xᵢ has incorrect label due to noise in the data collection process. All queries in V(xᵢ) inherit this error.
Boundary mismatch: True class boundary doesn't align with Voronoi edges. Some portion of V(xᵢ) truly belongs to a different class.
Finite sample effects: With limited n, cells are large. Queries far from their nearest neighbor may differ in true class.

voronoi_error_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi
from sklearn.neighbors import KNeighborsClassifier
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
 
def analyze_1nn_error_by_cell():
    """
    Analyze 1-NN error contribution from each Voronoi cell.
    
    Shows:
    1. Cell-by-cell error rate estimation
    2. Identification of high-error cells (near boundaries, noisy labels)
    3. Relationship between cell size and error
    """
    np.random.seed(42)
    
    # Generate data with known true boundary (y = x + 0.5)
    n_train = 40
    X_train = np.random.uniform(0, 4, (n_train, 2))
    y_train_true = (X_train[:, 1] > X_train[:, 0] + 0.5).astype(int)
    
    # Add label noise (10%)
    noise_mask = np.random.random(n_train) < 0.1
    y_train = y_train_true.copy()
    y_train[noise_mask] = 1 - y_train[noise_mask]
    
    # Compute Voronoi
    vor = Voronoi(X_train)
    
    # Estimate per-cell error rate using test points
    n_test = 5000
    X_test = np.random.uniform(0, 4, (n_test, 2))
    y_test = (X_test[:, 1] > X_test[:, 0] + 0.5).astype(int)
    
    # 1-NN predictions
    knn = KNeighborsClassifier(n_neighbors=1)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    errors = y_pred != y_test
    
    # Assign each test point to its Voronoi cell
    _, cell_indices = knn.kneighbors(X_test, n_neighbors=1)
    cell_indices = cell_indices.flatten()
    
    # Compute per-cell error rate
    cell_error_rates = np.zeros(n_train)
    cell_counts = np.zeros(n_train)
    
    for idx, (cell_idx, is_error) in enumerate(zip(cell_indices, errors)):
        cell_counts[cell_idx] += 1
        if is_error:
            cell_error_rates[cell_idx] += 1
    
    # Avoid division by zero
    mask = cell_counts > 0
    cell_error_rates[mask] /= cell_counts[mask]
    
    # Visualization
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # --- Panel 1: Cells colored by error rate ---
    ax1 = axes[0]
    ax1.set_title('Per-Cell Error Rate', fontsize=12, fontweight='bold')
    
    patches = []
    patch_colors = []
    
    for point_idx in range(n_train):
        region_idx = vor.point_region[point_idx]
        region = vor.regions[region_idx]
        
        if not region or -1 in region:
            continue
        
        polygon_verts = vor.vertices[region]
        # Clip to bounding box
        polygon_verts = np.clip(polygon_verts, 0, 4)
        
        patches.append(Polygon(polygon_verts))
        patch_colors.append(cell_error_rates[point_idx])
    
    pc = PatchCollection(patches, cmap='RdYlGn_r', alpha=0.7, edgecolor='gray')
    pc.set_array(np.array(patch_colors))
    pc.set_clim(0, 1)
    ax1.add_collection(pc)
    
    # Training points
    ax1.scatter(X_train[:, 0], X_train[:, 1], c='black', s=40, zorder=10)
    
    # True boundary
    ax1.plot([0, 3.5], [0.5, 4], 'k--', linewidth=2, label='True boundary')
    
    ax1.set_xlim(0, 4)
    ax1.set_ylim(0, 4)
    ax1.set_aspect('equal')
    plt.colorbar(pc, ax=ax1, label='Cell Error Rate')
    ax1.legend(loc='lower right')
    
    # --- Panel 2: Highlight problematic cells ---
    ax2 = axes[1]
    ax2.set_title('High-Error Cells Identified', fontsize=12, fontweight='bold')
    
    # Color by class
    colors = ['#3498db', '#e74c3c']
    for c in [0, 1]:
        mask = y_train == c
        ax2.scatter(X_train[mask, 0], X_train[mask, 1], 
                   c=colors[c], s=60, edgecolor='white', linewidth=1.5,
                   label=f'Class {c}')
    
    # Mark noisy points
    ax2.scatter(X_train[noise_mask, 0], X_train[noise_mask, 1], 
               facecolors='none', edgecolors='yellow', linewidths=3, s=200,
               label='Noisy labels', zorder=11)
    
    # Mark high-error cells
    high_error = cell_error_rates > 0.3
    ax2.scatter(X_train[high_error, 0], X_train[high_error, 1], 
               facecolors='none', edgecolors='black', linewidths=2, s=150,
               marker='s', label='High error (>30%)', zorder=10)
    
    ax2.plot([0, 3.5], [0.5, 4], 'k--', linewidth=2)
    ax2.set_xlim(0, 4)
    ax2.set_ylim(0, 4)
    ax2.set_aspect('equal')
    ax2.legend(loc='lower right', fontsize=9)
    
    # --- Panel 3: Error vs distance to boundary ---
    ax3 = axes[2]
    ax3.set_title('Error Rate vs Distance to True Boundary', 
                 fontsize=12, fontweight='bold')
    
    # Distance from each training point to true boundary (y = x + 0.5)
    # Line: x - y + 0.5 = 0, distance = |x - y + 0.5| / sqrt(2)
    dist_to_boundary = np.abs(X_train[:, 0] - X_train[:, 1] + 0.5) / np.sqrt(2)
    
    # Scatter plot with enough counts
    valid = cell_counts > 10
    ax3.scatter(dist_to_boundary[valid], cell_error_rates[valid], 
               s=cell_counts[valid], alpha=0.7,
               c=y_train[valid], cmap='RdBu')
    
    ax3.set_xlabel('Distance to True Boundary', fontsize=11)
    ax3.set_ylabel('Cell Error Rate', fontsize=11)
    ax3.set_xlim(0, 2.5)
    ax3.set_ylim(-0.05, 1.05)
    ax3.axhline(y=0.5, color='gray', linestyle=':', alpha=0.5)
    
    # Add trend line
    z = np.polyfit(dist_to_boundary[valid], cell_error_rates[valid], 1)
    p = np.poly1d(z)
    x_line = np.linspace(0, 2.5, 100)
    ax3.plot(x_line, np.clip(p(x_line), 0, 1), 'r-', linewidth=2, 
            label=f'Trend: slope={z[0]:.2f}')
    ax3.legend()
    
    plt.tight_layout()
    
    # Summary statistics
    total_error = errors.sum() / len(errors)
    print(f"Overall 1-NN Error Rate: {total_error:.2%}")
    print(f"Cells with >30% error: {(cell_error_rates > 0.3).sum()} / {n_train}")
    print(f"Noisy training points: {noise_mask.sum()}")
    
    return fig
 
if __name__ == "__main__":
    fig = analyze_1nn_error_by_cell()
    plt.savefig("voronoi_error_analysis.png", dpi=150, bbox_inches='tight')
    plt.show()

Geometric Insights for Error Reduction

•Cells near boundaries have higher error: Training points close to the true class boundary create cells that straddle both classes, leading to systematic misclassification.
•Noisy labels create error islands: A mislabeled training point creates a Voronoi cell where every query is misclassified. Editing or removing such points is crucial.
•Large cells indicate sparse regions: Where training data is sparse, cells are large. Large cells may contain diverse true labels, increasing error.
•Cell shape matters: Elongated cells extending toward the wrong class region are problematic. Isotropic (round) cells are preferable.

Algorithmic Implications of Voronoi Structure

The Voronoi-1NN equivalence has direct implications for algorithm design, both for improving 1-NN performance and for leveraging Voronoi properties in other contexts.

Condensing: Removing Redundant Points

Not all training points contribute equally. Interior points (whose Voronoi cells don't touch the decision boundary) are redundant—removing them doesn't change the decision boundary.

Condensed Nearest Neighbor (CNN):

Start with S = {one point per class}
Scan training set; add point x to S if 1-NN(x) using S ≠ true label
Repeat until no points added

The resulting set S is consistent: it classifies all original training points correctly while being (potentially much) smaller.

Editing: Removing Noisy Points

Points whose labels disagree with their neighbors are likely noise. Editing removes them to create smoother boundaries.

Wilson's Edited Nearest Neighbor (ENN):

For each training point x, compute its K nearest neighbors
If majority of K neighbors have different label than x, remove x
Return cleaned dataset

This eliminates points that create Voronoi islands of incorrect predictions.

condensing_editing_algorithms.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
 
def condensed_nearest_neighbor(X, y, random_state=None):
    """
    Condensed Nearest Neighbor (Hart, 1968).
    
    Returns a subset S such that 1-NN on S correctly classifies all of X.
    
    The condensed set contains only "boundary" points - those whose Voronoi
    cells touch the decision boundary. Interior points are redundant.
    """
    rng = np.random.default_rng(random_state)
    n_samples = len(X)
    classes = np.unique(y)
    
    # Initialize S with one random point per class
    S_indices = []
    for c in classes:
        class_indices = np.where(y == c)[0]
        S_indices.append(rng.choice(class_indices))
    S_indices = list(S_indices)
    
    # Repeatedly scan and absorb misclassified points
    changed = True
    while changed:
        changed = False
        
        # Build 1-NN with current S
        knn = KNeighborsClassifier(n_neighbors=1)
        knn.fit(X[S_indices], y[S_indices])
        
        # Check all training points
        for i in range(n_samples):
            if i in S_indices:
                continue
            
            pred = knn.predict(X[i:i+1])[0]
            if pred != y[i]:
                # Misclassified - add to S
                S_indices.append(i)
                changed = True
    
    return np.array(S_indices), X[S_indices], y[S_indices]
 
def edited_nearest_neighbor(X, y, k=3):
    """
    Edited Nearest Neighbor (Wilson, 1972).
    
    Removes points that are misclassified by their K nearest neighbors.
    These are likely noise points or boundary outliers.
    """
    n_samples = len(X)
    
    # Find K nearest neighbors for each point (excluding self)
    knn = KNeighborsClassifier(n_neighbors=k+1)  # +1 because self is included
    knn.fit(X, y)
    
    distances, indices = knn.kneighbors(X)
    
    keep_mask = np.ones(n_samples, dtype=bool)
    
    for i in range(n_samples):
        # Neighbors (excluding self)
        neighbor_indices = indices[i, 1:k+1]  # Skip first (self)
        neighbor_labels = y[neighbor_indices]
        
        # Majority vote
        majority_label = np.bincount(neighbor_labels).argmax()
        
        # If disagreement with majority, mark for removal
        if y[i] != majority_label:
            keep_mask[i] = False
    
    return keep_mask, X[keep_mask], y[keep_mask]
 
def compare_condensing_editing():
    """Compare original, condensed, and edited datasets."""
    np.random.seed(42)
    
    # Generate noisy dataset
    X, y = make_classification(
        n_samples=500, n_features=2, n_informative=2,
        n_redundant=0, n_clusters_per_class=2,
        flip_y=0.15, random_state=42
    )
    
    print("Original dataset:")
    print(f"  Size: {len(X)} points")
    
    # Condensed NN
    S_idx, X_cond, y_cond = condensed_nearest_neighbor(X, y, random_state=42)
    print(f"\nCondensed dataset:")
    print(f"  Size: {len(X_cond)} points ({100*len(X_cond)/len(X):.1f}%)")
    
    # Check accuracy on original data
    knn_cond = KNeighborsClassifier(n_neighbors=1)
    knn_cond.fit(X_cond, y_cond)
    cond_acc = (knn_cond.predict(X) == y).mean()
    print(f"  Training accuracy: {cond_acc:.2%}")
    
    # Edited NN
    keep_mask, X_edit, y_edit = edited_nearest_neighbor(X, y, k=3)
    print(f"\nEdited dataset:")
    print(f"  Size: {len(X_edit)} points ({100*len(X_edit)/len(X):.1f}%)")
    print(f"  Points removed: {(~keep_mask).sum()}")
    
    return X, y, X_cond, y_cond, X_edit, y_edit
 
if __name__ == "__main__":
    X, y, X_cond, y_cond, X_edit, y_edit = compare_condensing_editing()

Prototype Selection:

Instead of using all training points, select a smaller set of prototypes that captures the data structure. This reduces storage and query time while potentially improving generalization.

Methods include:

K-means clustering followed by using centroids as prototypes
Learning Vector Quantization (LVQ) for supervised prototype placement
Prototype-based condensation variants

Voronoi-Aware Data Structures:

Knowing that 1-NN corresponds to Voronoi enables specialized data structures:

Point location structures: Precompute Voronoi and use point location for O(log n) queries
Delaunay walking: Navigate Delaunay triangulation toward query point
Voronoi filtering: Skip clearly irrelevant candidates based on cell structure

These approaches trade preprocessing time for query efficiency—useful when queries far outnumber training updates.

High-Dimensional Limitations

Connections to Other Methods

The Voronoi-1NN connection illuminates relationships to other machine learning methods and geometric algorithms.

Connection to RBF Networks:

As γ → ∞, this approaches 1-NN (winner-take-all).

Connection to Gaussian Mixture Models:

Connection to Support Vector Machines:

Methods Connected to Voronoi Structure
Method	Voronoi Connection	Key Difference
1-NN	Exact Voronoi tessellation	Hard boundaries, no parameters
K-NN (K>1)	Order-K Voronoi / smoothed	Voting creates boundary widening
RBF Networks	Soft Voronoi (Gaussian decay)	Continuous influence regions
K-Means Clustering	Voronoi of centroids	Centroids learned, not training points
GMM (spherical)	Weighted/soft Voronoi	Probabilistic assignment
Learning Vector Quantization	Adaptive Voronoi prototypes	Prototypes move during training

Connection to Decision Trees:

Connection to Kernel Density Estimation:

1-NN classification can be viewed as classification based on estimated class densities: $$\hat{f}c(q) = \frac{1}{|\text{class } c|} \sum{i: y_i = c} \delta(x_i)$$

Summary: The Definitive 1-NN Characterization

Key Takeaways

•Exact equivalence: 1-NN decision regions are precisely Voronoi cells. This is a theorem, not an approximation.
•Geometric properties inherited: Convexity of cells, piecewise linear boundaries, and locality of influence all follow from Voronoi structure.
•Cover-Hart theorem: 1-NN achieves error ≤ 2× Bayes optimal asymptotically—a remarkable result for a non-parametric method.
•Error localization: Each Voronoi cell contributes independently to error; cells near boundaries and noisy cells contribute most.
•Algorithmic insights: Condensing removes interior (redundant) points; editing removes noisy boundary points. Both exploit Voronoi structure.
•Connections abound: RBF networks, K-means, GMMs, and LVQ all relate to Voronoi-based spatial partitioning.

What's Next:

Page Complete

3 / 5