Loading learning content...
In previous pages, we established that Voronoi diagrams partition space by proximity and that K-NN decision boundaries transform with K. Now we focus on the purest case: K = 1, where the correspondence between geometry and classification becomes exact.
The 1-NN classifier is not merely related to Voronoi tessellations—it is the Voronoi tessellation of the training set. Each Voronoi cell corresponds to exactly one training point, and the cell's label is that point's label. The boundaries between classes are precisely the Voronoi edges separating differently-labeled points.
This exact correspondence isn't just theoretical elegance. It enables rigorous analysis of 1-NN behavior, provides geometric insight into classification errors, and reveals why 1-NN has the unique properties it does—including its famous asymptotic error bound.
By the end of this page, you will understand the formal proof of 1-NN/Voronoi equivalence, how this structure determines classification error, the famous Cover-Hart theorem on 1-NN asymptotic error, and practical implications of the Voronoi structure for understanding 1-NN behavior.
Let us establish the equivalence with mathematical rigor.
Setup: Let D = {(x₁, y₁), (x₂, y₂), ..., (xₙ, yₙ)} be a training dataset where xᵢ ∈ ℝᵈ are feature vectors and yᵢ ∈ {1, 2, ..., C} are class labels. Let d(·,·) denote the distance metric (typically Euclidean).
Definition (1-NN Classifier): $$\text{NN}1(q) = y{j^} \text{ where } j^ = \arg\min_{j \in {1,...,n}} d(q, x_j)$$
Definition (Voronoi Cell): $$V(x_i) = {q \in \mathbb{R}^d : d(q, x_i) \leq d(q, x_j) \text{ for all } j \neq i}$$
Theorem (1-NN/Voronoi Equivalence): For any query point q, the 1-NN prediction equals the label of the training point whose Voronoi cell contains q: $$\text{NN}_1(q) = y_i \iff q \in V(x_i)$$
Proof:
(⟹) Suppose NN₁(q) = yᵢ. Then xᵢ is the nearest neighbor of q, meaning: $$d(q, x_i) \leq d(q, x_j) \text{ for all } j$$ By definition of Voronoi cell, this means q ∈ V(xᵢ). ∎
(⟸) Suppose q ∈ V(xᵢ). By definition of V(xᵢ): $$d(q, x_i) \leq d(q, x_j) \text{ for all } j$$ Thus xᵢ is the nearest neighbor of q (or tied for nearest), and NN₁(q) = yᵢ. ∎
When q lies exactly on a Voronoi edge (equidistant from multiple sites), the tie-breaking rule determines the prediction. Common strategies: (1) choose arbitrarily among tied neighbors, (2) choose the one with smallest index, (3) return "undefined" or multiple labels. In practice, exact Voronoi edges have measure zero—ties occur with probability zero for continuous feature distributions.
Corollary (Classification Regions):
The decision region for class c under 1-NN is the union of Voronoi cells belonging to training points of class c: $$R_1(c) = \bigcup_{i: y_i = c} V(x_i)$$
Corollary (Decision Boundaries):
The 1-NN decision boundary between classes c and c' consists of Voronoi edges separating cells V(xᵢ) and V(xⱼ) where yᵢ = c and yⱼ = c'.
Since Voronoi edges are perpendicular bisectors of segments connecting their generating points, 1-NN decision boundaries are piecewise linear (in Euclidean space)—composed of line segments (2D), planar patches (3D), or (d-1)-dimensional hyperplanar facets (general d).
Implication: While 1-NN can model arbitrarily complex class shapes (non-convex, multimodal, etc.), the local decision surface is always locally linear—a patchwork of flat pieces at the microscopic level.
The Voronoi structure imposes specific geometric constraints on 1-NN classification. Understanding these properties reveals both the power and limitations of 1-NN.
Property 1: Each Training Point Owns a Convex Region
Every Voronoi cell is convex (intersection of half-spaces). This means:
Property 2: Interpolation at Training Points
Unlike parametric models, 1-NN perfectly interpolates the training data: $$\text{NN}_1(x_i) = y_i \text{ for all training points } x_i$$
This is trivially true since xᵢ is its own nearest neighbor: xᵢ ∈ V(xᵢ). This zero training error property makes 1-NN prone to overfitting—noise in the training labels is perfectly captured.
Property 3: Boundary Sharpness
Voronoi edges form sharp decision boundaries with no transition zone. A query infinitesimally close to the boundary on one side receives a completely different prediction than one infinitesimally on the other side. There's no notion of "confidence" decreasing near boundaries in the basic 1-NN framework.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169
import numpy as npimport matplotlib.pyplot as pltfrom scipy.spatial import Voronoi, voronoi_plot_2dfrom matplotlib.patches import Polygonfrom matplotlib.collections import PatchCollection def demonstrate_1nn_voronoi_properties(): """ Demonstrate key properties of 1-NN via Voronoi visualization: 1. Convexity of individual cells 2. Perfect interpolation at training points 3. Sharp boundaries with no transition zone 4. Class regions as unions of cells """ np.random.seed(42) # Training data: simple two-class problem with noise # Class 0: cluster at (1, 1) X0 = np.random.randn(12, 2) * 0.5 + np.array([1, 1]) y0 = np.zeros(12, dtype=int) # Class 1: cluster at (2.5, 2.5) X1 = np.random.randn(12, 2) * 0.5 + np.array([2.5, 2.5]) y1 = np.ones(12, dtype=int) # Add a noisy point (class 0 in class 1's region) X_noise = np.array([[2.3, 2.7]]) y_noise = np.array([0]) X = np.vstack([X0, X1, X_noise]) y = np.hstack([y0, y1, y_noise]) # Compute Voronoi diagram vor = Voronoi(X) fig, axes = plt.subplots(1, 3, figsize=(15, 5)) colors = ['#3498db', '#e74c3c'] # --- Panel 1: Full Voronoi with colored cells --- ax1 = axes[0] ax1.set_title('1-NN Regions = Voronoi Cells', fontsize=12, fontweight='bold') # Color each Voronoi cell by its label for point_idx in range(len(X)): region_idx = vor.point_region[point_idx] region = vor.regions[region_idx] if not region or -1 in region: continue polygon_verts = vor.vertices[region] patch = Polygon(polygon_verts, alpha=0.4, facecolor=colors[y[point_idx]], edgecolor='gray') ax1.add_patch(patch) # Plot training points for c in [0, 1]: mask = y == c ax1.scatter(X[mask, 0], X[mask, 1], c=colors[c], s=80, edgecolor='white', linewidth=1.5, zorder=10, label=f'Class {c}') # Highlight noise point ax1.annotate('Noise point\n(Class 0 island)', xy=(X_noise[0, 0], X_noise[0, 1]), xytext=(3.2, 3.5), fontsize=9, ha='center', arrowprops=dict(arrowstyle='->', color='black'), bbox=dict(boxstyle='round', facecolor='white', alpha=0.8)) ax1.set_xlim(-0.5, 4.5) ax1.set_ylim(-0.5, 4.5) ax1.set_aspect('equal') ax1.legend(loc='lower right') # --- Panel 2: Convexity demonstration --- ax2 = axes[1] ax2.set_title('Property: Each Cell is Convex', fontsize=12, fontweight='bold') # Pick one cell and show its convex property demo_point_idx = 5 # Some cell in the middle region_idx = vor.point_region[demo_point_idx] region = vor.regions[region_idx] if region and -1 not in region: polygon_verts = vor.vertices[region] patch = Polygon(polygon_verts, alpha=0.5, facecolor=colors[y[demo_point_idx]], edgecolor='navy', linewidth=2) ax2.add_patch(patch) # Show two points in the cell and their connecting line p1 = polygon_verts[0] p2 = polygon_verts[2] ax2.plot([p1[0], p2[0]], [p1[1], p2[1]], 'g-', linewidth=2, label='Line segment stays inside') ax2.scatter([p1[0], p2[0]], [p1[1], p2[1]], c='green', s=60, zorder=10) # Plot the generating point ax2.scatter(X[demo_point_idx, 0], X[demo_point_idx, 1], c=colors[y[demo_point_idx]], s=100, marker='*', edgecolor='black', linewidth=1.5, zorder=11, label='Cell generator (training point)') ax2.set_xlim(-0.5, 4.5) ax2.set_ylim(-0.5, 4.5) ax2.set_aspect('equal') ax2.legend(loc='lower right', fontsize=9) ax2.annotate('Any line between two\npoints in a convex set\nstays inside the set', xy=(1.5, 0.3), fontsize=10, ha='center', bbox=dict(boxstyle='round', facecolor='lightyellow')) # --- Panel 3: Boundary sharpness --- ax3 = axes[2] ax3.set_title('Property: Sharp Boundaries', fontsize=12, fontweight='bold') # Zoom into a boundary region # Find an edge between different classes for i, (p1_idx, p2_idx) in enumerate(vor.ridge_points): if y[p1_idx] != y[p2_idx]: # Different classes ridge_vertices = vor.ridge_vertices[i] if -1 not in ridge_vertices: v1, v2 = vor.vertices[ridge_vertices] # Plot the decision boundary ax3.plot([v1[0], v2[0]], [v1[1], v2[1]], 'k-', linewidth=3, label='Decision boundary') # Points very close to boundary on both sides midpoint = (v1 + v2) / 2 direction = X[p1_idx] - X[p2_idx] direction = direction / np.linalg.norm(direction) epsilon = 0.05 q1 = midpoint + epsilon * direction q2 = midpoint - epsilon * direction # Show predictions flip ax3.scatter(*q1, c=colors[y[p1_idx]], s=200, marker='>', edgecolor='black', zorder=10) ax3.scatter(*q2, c=colors[y[p2_idx]], s=200, marker='<', edgecolor='black', zorder=10) ax3.annotate(f'→ Class {y[p1_idx]}', xy=q1, xytext=(q1[0] + 0.3, q1[1]), fontsize=10) ax3.annotate(f'← Class {y[p2_idx]}', xy=q2, xytext=(q2[0] - 0.5, q2[1]), fontsize=10, ha='right') # Training points ax3.scatter(X[p1_idx, 0], X[p1_idx, 1], c=colors[y[p1_idx]], s=100, edgecolor='white', linewidth=1.5, zorder=5) ax3.scatter(X[p2_idx, 0], X[p2_idx, 1], c=colors[y[p2_idx]], s=100, edgecolor='white', linewidth=1.5, zorder=5) # Zoom window ax3.set_xlim(midpoint[0] - 1, midpoint[0] + 1) ax3.set_ylim(midpoint[1] - 1, midpoint[1] + 1) break ax3.set_aspect('equal') ax3.text(0.5, 0.02, 'Adjacent points get opposite predictions', transform=ax3.transAxes, fontsize=10, ha='center', style='italic') plt.tight_layout() return fig if __name__ == "__main__": fig = demonstrate_1nn_voronoi_properties() plt.savefig("1nn_voronoi_properties.png", dpi=150, bbox_inches='tight') plt.show()Property 4: Training Points are Cell Centers
Each Voronoi cell contains its generating point (the training point). Moreover, the training point is the unique point in its cell closest to all neighboring cells' generators. However, the training point is not necessarily the geometric centroid of its cell—cells can be highly asymmetric.
Property 5: Locality of Influence
Each training point's influence is strictly local—confined to its Voronoi cell. Points in cell V(xᵢ) are completely determined by xᵢ; other training points only contribute through defining where V(xᵢ)'s boundaries lie. This is fundamentally different from parametric models where all training data influences model parameters globally.
One of the most remarkable results in machine learning theory is the Cover-Hart theorem (1967), which establishes tight bounds on the asymptotic error rate of the 1-NN classifier. The Voronoi structure is essential to understanding this result.
Definition (Bayes Optimal Error):
The Bayes optimal classifier minimizes the probability of error given the true class-conditional distributions: $$R^* = 1 - \mathbb{E}_X[\max_c P(Y = c | X)]$$
This is the lowest achievable error rate—no classifier can do better.
Theorem (Cover-Hart, 1967):
As the number of training points n → ∞, the 1-NN error rate R₁ satisfies: $$R^* \leq R_1 \leq R^(2 - \frac{C}{C-1}R^)$$
For the binary case (C = 2): $$R^* \leq R_1 \leq 2R^(1 - R^)$$
And importantly: $$R_1 \leq 2R^*$$
In words: The 1-NN error rate is at most twice the Bayes optimal error rate, asymptotically.
This is remarkable: with sufficient data, a completely non-parametric method—requiring no model assumptions, no training procedure, no parameter tuning—achieves error within a factor of 2 of the theoretical optimum. For problems where R* is small, 1-NN is nearly optimal. If R* = 5%, then R₁ ≤ 10%.
Intuition via Voronoi Structure:
As n → ∞, Voronoi cells shrink. In the limit:
Now consider error analysis:
Bayes optimal at q: Predict class c* = argmax P(Y = c | X = q). Error probability = 1 - max_c P(Y = c | q).
1-NN at q: Predict whatever label the nearest training point happened to have. Two ways to err:
The factor of 2 arises because 1-NN can make two types of mistakes: (1) the training point near q was mislabeled, and (2) even if correctly labeled, it might not represent q's true distribution perfectly.
Proof Sketch:
Let η(x) = P(Y = 1 | X = x) be the regression function for binary classification. At a query point q:
Integrating over the distribution of X gives: R₁ = 2∫η(x)(1 - η(x))dP(x)
The bound R₁ ≤ 2R* follows from noting that 2η(1-η) ≤ 2·min(η, 1-η).
| Bayes Error R* | 1-NN Upper Bound | 1-NN Lower Bound | Bound Tightness |
|---|---|---|---|
| 0% | 0% | 0% | Exact (trivial case) |
| 5% | ≤ 9.5% | ≥ 5% | Nearly tight at low error |
| 10% | ≤ 18% | ≥ 10% | Still quite tight |
| 20% | ≤ 32% | ≥ 20% | Gap widening |
| 30% | ≤ 42% | ≥ 30% | Significant gap possible |
| 50% | ≤ 50% | ≥ 50% | Random guessing bound |
Extension to K-NN:
For k-NN with k → ∞ and k/n → 0 as n → ∞, the error rate converges to the Bayes optimal R*: $$\lim_{n \to \infty} R_k = R^*$$
This is why increasing K generally improves performance up to a point—it approaches Bayes optimality. However, finite sample effects and the bias introduced by larger K often dominate in practice.
Practical Implications:
1-NN is a strong baseline: For well-behaved problems, 1-NN provides competitive performance with no tuning.
Error rate interpretation: If your 1-NN achieves 15% error, the Bayes optimal rate is at least 7.5%—the problem has inherent difficulty.
Motivation for larger K: Since K-NN approaches Bayes optimality while 1-NN only achieves factor-2, there's theoretical motivation to use K > 1.
The Voronoi structure enables precise analysis of where and why 1-NN makes mistakes. This geometric perspective complements the probabilistic analysis of Cover-Hart.
Error Decomposition by Cell:
The expected 1-NN error can be written as a sum over Voronoi cells: $$R_1 = \sum_{i=1}^{n} P(q \in V(x_i)) \cdot P(\text{error} | q \in V(x_i))$$
where:
Sources of 1-NN Error:
Label noise: Training point xᵢ has incorrect label due to noise in the data collection process. All queries in V(xᵢ) inherit this error.
Boundary mismatch: True class boundary doesn't align with Voronoi edges. Some portion of V(xᵢ) truly belongs to a different class.
Finite sample effects: With limited n, cells are large. Queries far from their nearest neighbor may differ in true class.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172
import numpy as npimport matplotlib.pyplot as pltfrom scipy.spatial import Voronoifrom sklearn.neighbors import KNeighborsClassifierfrom matplotlib.patches import Polygonfrom matplotlib.collections import PatchCollection def analyze_1nn_error_by_cell(): """ Analyze 1-NN error contribution from each Voronoi cell. Shows: 1. Cell-by-cell error rate estimation 2. Identification of high-error cells (near boundaries, noisy labels) 3. Relationship between cell size and error """ np.random.seed(42) # Generate data with known true boundary (y = x + 0.5) n_train = 40 X_train = np.random.uniform(0, 4, (n_train, 2)) y_train_true = (X_train[:, 1] > X_train[:, 0] + 0.5).astype(int) # Add label noise (10%) noise_mask = np.random.random(n_train) < 0.1 y_train = y_train_true.copy() y_train[noise_mask] = 1 - y_train[noise_mask] # Compute Voronoi vor = Voronoi(X_train) # Estimate per-cell error rate using test points n_test = 5000 X_test = np.random.uniform(0, 4, (n_test, 2)) y_test = (X_test[:, 1] > X_test[:, 0] + 0.5).astype(int) # 1-NN predictions knn = KNeighborsClassifier(n_neighbors=1) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) errors = y_pred != y_test # Assign each test point to its Voronoi cell _, cell_indices = knn.kneighbors(X_test, n_neighbors=1) cell_indices = cell_indices.flatten() # Compute per-cell error rate cell_error_rates = np.zeros(n_train) cell_counts = np.zeros(n_train) for idx, (cell_idx, is_error) in enumerate(zip(cell_indices, errors)): cell_counts[cell_idx] += 1 if is_error: cell_error_rates[cell_idx] += 1 # Avoid division by zero mask = cell_counts > 0 cell_error_rates[mask] /= cell_counts[mask] # Visualization fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # --- Panel 1: Cells colored by error rate --- ax1 = axes[0] ax1.set_title('Per-Cell Error Rate', fontsize=12, fontweight='bold') patches = [] patch_colors = [] for point_idx in range(n_train): region_idx = vor.point_region[point_idx] region = vor.regions[region_idx] if not region or -1 in region: continue polygon_verts = vor.vertices[region] # Clip to bounding box polygon_verts = np.clip(polygon_verts, 0, 4) patches.append(Polygon(polygon_verts)) patch_colors.append(cell_error_rates[point_idx]) pc = PatchCollection(patches, cmap='RdYlGn_r', alpha=0.7, edgecolor='gray') pc.set_array(np.array(patch_colors)) pc.set_clim(0, 1) ax1.add_collection(pc) # Training points ax1.scatter(X_train[:, 0], X_train[:, 1], c='black', s=40, zorder=10) # True boundary ax1.plot([0, 3.5], [0.5, 4], 'k--', linewidth=2, label='True boundary') ax1.set_xlim(0, 4) ax1.set_ylim(0, 4) ax1.set_aspect('equal') plt.colorbar(pc, ax=ax1, label='Cell Error Rate') ax1.legend(loc='lower right') # --- Panel 2: Highlight problematic cells --- ax2 = axes[1] ax2.set_title('High-Error Cells Identified', fontsize=12, fontweight='bold') # Color by class colors = ['#3498db', '#e74c3c'] for c in [0, 1]: mask = y_train == c ax2.scatter(X_train[mask, 0], X_train[mask, 1], c=colors[c], s=60, edgecolor='white', linewidth=1.5, label=f'Class {c}') # Mark noisy points ax2.scatter(X_train[noise_mask, 0], X_train[noise_mask, 1], facecolors='none', edgecolors='yellow', linewidths=3, s=200, label='Noisy labels', zorder=11) # Mark high-error cells high_error = cell_error_rates > 0.3 ax2.scatter(X_train[high_error, 0], X_train[high_error, 1], facecolors='none', edgecolors='black', linewidths=2, s=150, marker='s', label='High error (>30%)', zorder=10) ax2.plot([0, 3.5], [0.5, 4], 'k--', linewidth=2) ax2.set_xlim(0, 4) ax2.set_ylim(0, 4) ax2.set_aspect('equal') ax2.legend(loc='lower right', fontsize=9) # --- Panel 3: Error vs distance to boundary --- ax3 = axes[2] ax3.set_title('Error Rate vs Distance to True Boundary', fontsize=12, fontweight='bold') # Distance from each training point to true boundary (y = x + 0.5) # Line: x - y + 0.5 = 0, distance = |x - y + 0.5| / sqrt(2) dist_to_boundary = np.abs(X_train[:, 0] - X_train[:, 1] + 0.5) / np.sqrt(2) # Scatter plot with enough counts valid = cell_counts > 10 ax3.scatter(dist_to_boundary[valid], cell_error_rates[valid], s=cell_counts[valid], alpha=0.7, c=y_train[valid], cmap='RdBu') ax3.set_xlabel('Distance to True Boundary', fontsize=11) ax3.set_ylabel('Cell Error Rate', fontsize=11) ax3.set_xlim(0, 2.5) ax3.set_ylim(-0.05, 1.05) ax3.axhline(y=0.5, color='gray', linestyle=':', alpha=0.5) # Add trend line z = np.polyfit(dist_to_boundary[valid], cell_error_rates[valid], 1) p = np.poly1d(z) x_line = np.linspace(0, 2.5, 100) ax3.plot(x_line, np.clip(p(x_line), 0, 1), 'r-', linewidth=2, label=f'Trend: slope={z[0]:.2f}') ax3.legend() plt.tight_layout() # Summary statistics total_error = errors.sum() / len(errors) print(f"Overall 1-NN Error Rate: {total_error:.2%}") print(f"Cells with >30% error: {(cell_error_rates > 0.3).sum()} / {n_train}") print(f"Noisy training points: {noise_mask.sum()}") return fig if __name__ == "__main__": fig = analyze_1nn_error_by_cell() plt.savefig("voronoi_error_analysis.png", dpi=150, bbox_inches='tight') plt.show()The Voronoi-1NN equivalence has direct implications for algorithm design, both for improving 1-NN performance and for leveraging Voronoi properties in other contexts.
Condensing: Removing Redundant Points
Not all training points contribute equally. Interior points (whose Voronoi cells don't touch the decision boundary) are redundant—removing them doesn't change the decision boundary.
Condensed Nearest Neighbor (CNN):
The resulting set S is consistent: it classifies all original training points correctly while being (potentially much) smaller.
Editing: Removing Noisy Points
Points whose labels disagree with their neighbors are likely noise. Editing removes them to create smoother boundaries.
Wilson's Edited Nearest Neighbor (ENN):
This eliminates points that create Voronoi islands of incorrect predictions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
import numpy as npfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.datasets import make_classification def condensed_nearest_neighbor(X, y, random_state=None): """ Condensed Nearest Neighbor (Hart, 1968). Returns a subset S such that 1-NN on S correctly classifies all of X. The condensed set contains only "boundary" points - those whose Voronoi cells touch the decision boundary. Interior points are redundant. """ rng = np.random.default_rng(random_state) n_samples = len(X) classes = np.unique(y) # Initialize S with one random point per class S_indices = [] for c in classes: class_indices = np.where(y == c)[0] S_indices.append(rng.choice(class_indices)) S_indices = list(S_indices) # Repeatedly scan and absorb misclassified points changed = True while changed: changed = False # Build 1-NN with current S knn = KNeighborsClassifier(n_neighbors=1) knn.fit(X[S_indices], y[S_indices]) # Check all training points for i in range(n_samples): if i in S_indices: continue pred = knn.predict(X[i:i+1])[0] if pred != y[i]: # Misclassified - add to S S_indices.append(i) changed = True return np.array(S_indices), X[S_indices], y[S_indices] def edited_nearest_neighbor(X, y, k=3): """ Edited Nearest Neighbor (Wilson, 1972). Removes points that are misclassified by their K nearest neighbors. These are likely noise points or boundary outliers. """ n_samples = len(X) # Find K nearest neighbors for each point (excluding self) knn = KNeighborsClassifier(n_neighbors=k+1) # +1 because self is included knn.fit(X, y) distances, indices = knn.kneighbors(X) keep_mask = np.ones(n_samples, dtype=bool) for i in range(n_samples): # Neighbors (excluding self) neighbor_indices = indices[i, 1:k+1] # Skip first (self) neighbor_labels = y[neighbor_indices] # Majority vote majority_label = np.bincount(neighbor_labels).argmax() # If disagreement with majority, mark for removal if y[i] != majority_label: keep_mask[i] = False return keep_mask, X[keep_mask], y[keep_mask] def compare_condensing_editing(): """Compare original, condensed, and edited datasets.""" np.random.seed(42) # Generate noisy dataset X, y = make_classification( n_samples=500, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=2, flip_y=0.15, random_state=42 ) print("Original dataset:") print(f" Size: {len(X)} points") # Condensed NN S_idx, X_cond, y_cond = condensed_nearest_neighbor(X, y, random_state=42) print(f"\nCondensed dataset:") print(f" Size: {len(X_cond)} points ({100*len(X_cond)/len(X):.1f}%)") # Check accuracy on original data knn_cond = KNeighborsClassifier(n_neighbors=1) knn_cond.fit(X_cond, y_cond) cond_acc = (knn_cond.predict(X) == y).mean() print(f" Training accuracy: {cond_acc:.2%}") # Edited NN keep_mask, X_edit, y_edit = edited_nearest_neighbor(X, y, k=3) print(f"\nEdited dataset:") print(f" Size: {len(X_edit)} points ({100*len(X_edit)/len(X):.1f}%)") print(f" Points removed: {(~keep_mask).sum()}") return X, y, X_cond, y_cond, X_edit, y_edit if __name__ == "__main__": X, y, X_cond, y_cond, X_edit, y_edit = compare_condensing_editing()Prototype Selection:
Instead of using all training points, select a smaller set of prototypes that captures the data structure. This reduces storage and query time while potentially improving generalization.
Methods include:
Voronoi-Aware Data Structures:
Knowing that 1-NN corresponds to Voronoi enables specialized data structures:
These approaches trade preprocessing time for query efficiency—useful when queries far outnumber training updates.
Explicit Voronoi construction becomes impractical in high dimensions (complexity O(n^⌈d/2⌉)). For d > 5-10, approximate nearest neighbor methods (LSH, randomized KD-trees) are essential. The conceptual connection to Voronoi remains valuable for understanding, but explicit geometric construction is infeasible.
The Voronoi-1NN connection illuminates relationships to other machine learning methods and geometric algorithms.
Connection to RBF Networks:
Radial Basis Function networks with one unit per training point and Gaussian activations create "soft" Voronoi-like partitions. The hard assignment of Voronoi cells becomes weighted contributions: $$f(q) = \sum_i w_i \exp(-\gamma|q - x_i|^2)$$
As γ → ∞, this approaches 1-NN (winner-take-all).
Connection to Gaussian Mixture Models:
Hard clustering via GMM (assigning each point to its most probable cluster) creates a partition analogous to Voronoi. With equal-covariance spherical Gaussians, the partition is exactly Voronoi. With different covariances, boundaries become curved.
Connection to Support Vector Machines:
Interestingly, SVM support vectors often lie near class boundaries—similar to how "boundary" training points in 1-NN are most important (non-redundant). The condensed 1-NN set is conceptually related to support vectors.
| Method | Voronoi Connection | Key Difference |
|---|---|---|
| 1-NN | Exact Voronoi tessellation | Hard boundaries, no parameters |
| K-NN (K>1) | Order-K Voronoi / smoothed | Voting creates boundary widening |
| RBF Networks | Soft Voronoi (Gaussian decay) | Continuous influence regions |
| K-Means Clustering | Voronoi of centroids | Centroids learned, not training points |
| GMM (spherical) | Weighted/soft Voronoi | Probabilistic assignment |
| Learning Vector Quantization | Adaptive Voronoi prototypes | Prototypes move during training |
Connection to Decision Trees:
Decision trees create axis-aligned rectangular partitions. These can be seen as a restricted form of Voronoi-like partitioning where boundaries must be perpendicular to coordinate axes. The loss of diagonal boundaries is compensated by hierarchical structure and interpretability.
Connection to Kernel Density Estimation:
1-NN classification can be viewed as classification based on estimated class densities: $$\hat{f}c(q) = \frac{1}{|\text{class } c|} \sum{i: y_i = c} \delta(x_i)$$
The "nearest point wins" rule implicitly estimates local density as concentrated at the nearest training point. Proper kernel density estimation replaces delta functions with smooth kernels, producing smooth density estimates and hence smoother decision boundaries.
This page has established the precise and complete correspondence between 1-NN classification and Voronoi tessellation. This is not just an analogy—it's a mathematical identity that enables both theoretical analysis and practical algorithm design.
What's Next:
With the 1-NN/Voronoi equivalence firmly established, we turn to practical visualization. The next page explores techniques for visualizing Voronoi structures and KNN decision surfaces in two and higher dimensions, building intuition that transfers to problems where direct visualization is impossible.
You now have a rigorous understanding of why 1-NN classification creates Voronoi cells, the theoretical guarantees this provides (Cover-Hart), and how this structure informs practical algorithm design. This geometric foundation underlies all K-NN variants and related methods.