Loading content...
Imagine dropping a pebble into a pond with several floating buoys. Ripples spread outward from the pebble. Which buoy do the ripples reach first? This seemingly simple question leads to one of the most beautiful structures in computational geometry: the Voronoi diagram.
A Voronoi diagram partitions space into regions based on proximity to a set of points, called sites or generators. Each region contains all points closer to its site than to any other site. For nearest neighbor search, the Voronoi cell of a site $s$ is precisely the set of query points for which $s$ is the nearest neighbor.
Voronoi diagrams are important for several reasons:
By the end of this page, you will understand the mathematical definition of Voronoi diagrams, how they provide O(log n) nearest neighbor search in 2D, the relationship between Voronoi cells and KNN decision boundaries, the Delaunay triangulation and its duality with Voronoi diagrams, and why Voronoi diagrams become impractical in high dimensions.
Let's formalize the Voronoi diagram precisely.
Definition (Voronoi Diagram):
Given a set of $n$ points $S = {s_1, s_2, \ldots, s_n}$ in $\mathbb{R}^d$, the Voronoi cell (or Voronoi region) of site $s_i$ is:
$$V(s_i) = {\mathbf{x} \in \mathbb{R}^d : d(\mathbf{x}, s_i) \leq d(\mathbf{x}, s_j) \text{ for all } j \neq i}$$
The Voronoi diagram of $S$ is the collection of all Voronoi cells:
$$\text{Vor}(S) = {V(s_1), V(s_2), \ldots, V(s_n)}$$
Key Properties:
Building Blocks:
To construct Voronoi cells, we use half-spaces defined by bisector hyperplanes.
The bisector between sites $s_i$ and $s_j$ is the hyperplane equidistant from both:
$$B(s_i, s_j) = {\mathbf{x} : d(\mathbf{x}, s_i) = d(\mathbf{x}, s_j)}$$
For Euclidean distance, this is:
$$B(s_i, s_j) = {\mathbf{x} : (\mathbf{x} - \frac{s_i + s_j}{2}) \cdot (s_j - s_i) = 0}$$
The bisector is perpendicular to the segment $\overline{s_i s_j}$ and passes through its midpoint.
Half-space: The set of points closer to $s_i$ than to $s_j$:
$$H(s_i, s_j) = {\mathbf{x} : d(\mathbf{x}, s_i) \leq d(\mathbf{x}, s_j)}$$
Voronoi cell as intersection:
$$V(s_i) = \bigcap_{j \neq i} H(s_i, s_j)$$
Each cell is the intersection of $n-1$ half-spaces—a convex polytope.
Think of Voronoi cells as 'territories' where each site controls all points closer to it than to any rival. The cell boundaries are where territorial control changes—exactly the decision boundaries of 1-nearest neighbor classification.
In two dimensions, Voronoi diagrams have particularly nice properties that make them practical for computation.
2D Voronoi Structure:
Combinatorial Complexity:
For $n$ sites in general position in 2D:
This is $O(n)$ total complexity—linear in the number of sites!
Construction Algorithms:
Several algorithms compute 2D Voronoi diagrams:
| Algorithm | Time Complexity | Space | Strategy |
|---|---|---|---|
| Fortune's sweep | $O(n \log n)$ | $O(n)$ | Sweep line with beach line |
| Divide and conquer | $O(n \log n)$ | $O(n)$ | Recursive merge |
| Incremental | $O(n^2)$ worst | $O(n)$ | Add sites one by one |
| Via Delaunay | $O(n \log n)$ | $O(n)$ | Compute dual first |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
import numpy as npfrom scipy.spatial import Voronoi, voronoi_plot_2dimport matplotlib.pyplot as pltfrom typing import List, Tuple def visualize_voronoi_2d(points: np.ndarray, show_labels: bool = True): """ Visualize a 2D Voronoi diagram. Uses scipy.spatial.Voronoi which implements Fortune's algorithm. Parameters: ----------- points : np.ndarray, shape (n, 2) Voronoi sites show_labels : bool Whether to label the sites """ # Compute Voronoi diagram vor = Voronoi(points) # Create figure fig, ax = plt.subplots(figsize=(10, 10)) # Plot the Voronoi diagram voronoi_plot_2d(vor, ax=ax, show_vertices=True, line_colors='blue', line_width=1.5, point_size=10) # Add site labels if show_labels: for i, (x, y) in enumerate(points): ax.annotate(f'$s_{i}$', (x, y), fontsize=12, xytext=(5, 5), textcoords='offset points') # Annotate Voronoi regions for i, region in enumerate(vor.regions): if not region or -1 in region: continue # Skip empty or infinite regions polygon = [vor.vertices[j] for j in region] centroid = np.mean(polygon, axis=0) # Find which site owns this region for site_idx, point_region in enumerate(vor.point_region): if point_region == i: ax.annotate(f'$V(s_{site_idx})$', centroid, fontsize=10, ha='center', va='center', color='gray') break ax.set_xlim(points[:, 0].min() - 1, points[:, 0].max() + 1) ax.set_ylim(points[:, 1].min() - 1, points[:, 1].max() + 1) ax.set_aspect('equal') ax.set_title('2D Voronoi Diagram') ax.set_xlabel('$x$') ax.set_ylabel('$y$') return fig, ax, vor def voronoi_nn_search(vor: Voronoi, query: np.ndarray) -> int: """ Naive nearest neighbor search using Voronoi structure. In practice, you'd use point location algorithms for O(log n) search. This just demonstrates the concept. """ # Simple approach: check which cell contains the query # by computing distances to all sites distances = np.sqrt(np.sum((vor.points - query) ** 2, axis=1)) return np.argmin(distances) # Example usageif __name__ == "__main__": np.random.seed(42) sites = np.random.rand(15, 2) * 10 # 15 random sites in [0, 10]^2 fig, ax, vor = visualize_voronoi_2d(sites) # Query point query = np.array([5.0, 5.0]) nearest = voronoi_nn_search(vor, query) ax.plot(*query, 'r*', markersize=15, label='Query') ax.plot(*sites[nearest], 'go', markersize=12, label=f'NN: $s_{nearest}$') ax.legend() plt.show()Fortune's algorithm uses a 'beach line' of parabolic arcs that sweeps across the plane. Each parabola represents points equidistant from a site and the sweep line. The intersections of parabolas trace out Voronoi edges. This elegant approach achieves optimal O(n log n) time.
Once we have the Voronoi diagram, nearest neighbor search reduces to point location: given a query point $\mathbf{q}$, find which Voronoi cell contains $\mathbf{q}$.
Point Location Algorithms:
Several data structures enable logarithmic point location in 2D:
| Method | Query Time | Space | Preprocessing |
|---|---|---|---|
| Trapezoidal map | $O(\log n)$ | $O(n)$ | $O(n \log n)$ |
| Kirkpatrick hierarchy | $O(\log n)$ | $O(n)$ | $O(n)$ |
| Slab decomposition | $O(\log n)$ | $O(n^2)$ | $O(n^2)$ |
Trapezoidal Map Approach:
Kirkpatrick's Hierarchy (Optimal):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
import numpy as npfrom scipy.spatial import Delaunayfrom typing import Optional class VoronoiPointLocator: """ Point location in 2D Voronoi diagram using Delaunay triangulation. Uses the fact that Delaunay triangulation is the dual of Voronoi. Point location in a triangulation can be done in O(log n) expected time. Note: This simplified implementation uses scipy's built-in point location, which is O(log n) expected. """ def __init__(self, sites: np.ndarray): """ Build point location structure. Parameters: ----------- sites : np.ndarray, shape (n, 2) Voronoi sites / Delaunay vertices """ self.sites = sites self.n = len(sites) # Compute Delaunay triangulation (dual of Voronoi) self.delaunay = Delaunay(sites) def query(self, point: np.ndarray) -> int: """ Find nearest neighbor of query point. Time: O(log n) expected Parameters: ----------- point : np.ndarray, shape (2,) Query point Returns: -------- int : Index of nearest site """ # Find which Delaunay triangle contains the point # scipy uses a directed walk, O(√n) expected, O(n) worst simplex = self.delaunay.find_simplex(point) if simplex == -1: # Point outside convex hull; find closest site on hull hull_indices = np.unique(self.delaunay.convex_hull) distances = np.sqrt(np.sum( (self.sites[hull_indices] - point) ** 2, axis=1 )) return hull_indices[np.argmin(distances)] # The query point's NN is one of the triangle's vertices triangle_vertices = self.delaunay.simplices[simplex] vertex_points = self.sites[triangle_vertices] distances = np.sqrt(np.sum((vertex_points - point) ** 2, axis=1)) return triangle_vertices[np.argmin(distances)] def batch_query(self, points: np.ndarray) -> np.ndarray: """Find nearest neighbors for multiple query points.""" return np.array([self.query(p) for p in points]) def demonstrate_point_location(): """ Demonstrate O(log n) nearest neighbor via point location. """ import time np.random.seed(0) # Build locator for different sizes for n in [100, 1000, 10000, 100000]: sites = np.random.rand(n, 2) * 100 # Build time start = time.perf_counter() locator = VoronoiPointLocator(sites) build_time = time.perf_counter() - start # Query time (average over 1000 queries) queries = np.random.rand(1000, 2) * 100 start = time.perf_counter() results = locator.batch_query(queries) query_time = (time.perf_counter() - start) / 1000 * 1000 # ms print(f"n={n:6d}: build={build_time*1000:.1f}ms, " f"query={query_time:.4f}ms/query") # Output (illustrative):# n= 100: build= 0.5ms, query=0.0150ms/query# n= 1000: build= 2.3ms, query=0.0180ms/query# n= 10000: build= 25.0ms, query=0.0210ms/query (only 1.4x higher!)# n=100000: build= 280.0ms, query=0.0250ms/query (barely changes)Point location is a classic example of the preprocessing-query tradeoff. We invest O(n log n) time building the Voronoi diagram and point location structure, then reap O(log n) queries forever. For 1 million sites, queries take microseconds despite the massive data.
The Voronoi diagram has a beautiful dual: the Delaunay triangulation. Understanding this duality provides deeper geometric insight.
Definition (Delaunay Triangulation):
The Delaunay triangulation $\text{Del}(S)$ of a point set $S$ connects sites whose Voronoi cells share an edge.
Duality between Voronoi and Delaunay:
| Voronoi | Delaunay |
|---|---|
| Cell (region) | Vertex (site) |
| Edge | Edge (connecting sites with adjacent cells) |
| Vertex | Face (triangle) |
Empty Circle Property:
A triangulation is Delaunay if and only if the circumcircle of each triangle contains no other sites. This is the defining characteristic.
Why This Matters for KNN:
Delaunay edges connect natural neighbors — Sites connected by a Delaunay edge are likely to be nearest neighbors for queries near their shared cell boundary.
K-nearest neighbors often form a path in Delaunay — Though not always, the k nearest sites tend to be Delaunay-connected.
Delaunay guides search — Graph-based NN methods (like HNSW) are inspired by navigating a graph of neighbors.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
import numpy as npfrom scipy.spatial import Delaunay, Voronoiimport matplotlib.pyplot as plt def visualize_voronoi_delaunay_duality(sites: np.ndarray): """ Visualize the duality between Voronoi and Delaunay structures. """ fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # Compute structures vor = Voronoi(sites) delaunay = Delaunay(sites) # Left: Voronoi only ax = axes[0] ax.set_title('Voronoi Diagram') for ridge in vor.ridge_vertices: if -1 in ridge: continue ax.plot(vor.vertices[ridge, 0], vor.vertices[ridge, 1], 'b-', lw=1.5) ax.scatter(*sites.T, c='red', s=50, zorder=5) ax.set_aspect('equal') # Middle: Delaunay only ax = axes[1] ax.set_title('Delaunay Triangulation') ax.triplot(sites[:, 0], sites[:, 1], delaunay.simplices, 'g-', lw=1.5) ax.scatter(*sites.T, c='red', s=50, zorder=5) ax.set_aspect('equal') # Right: Both overlaid (showing duality) ax = axes[2] ax.set_title('Duality: Voronoi (blue) + Delaunay (green)') for ridge in vor.ridge_vertices: if -1 in ridge: continue ax.plot(vor.vertices[ridge, 0], vor.vertices[ridge, 1], 'b-', lw=1, alpha=0.7) ax.triplot(sites[:, 0], sites[:, 1], delaunay.simplices, 'g-', lw=1, alpha=0.7) ax.scatter(*sites.T, c='red', s=50, zorder=5) # Draw duality connections for ridge_idx, (i, j) in enumerate(vor.ridge_points): # Each Voronoi edge (connecting vor vertices) corresponds to # a Delaunay edge (connecting sites i and j) mid_site = (sites[i] + sites[j]) / 2 ridge_verts = vor.ridge_vertices[ridge_idx] if -1 not in ridge_verts: mid_vor = vor.vertices[ridge_verts].mean(axis=0) ax.plot([mid_site[0], mid_vor[0]], [mid_site[1], mid_vor[1]], 'k--', lw=0.5, alpha=0.5) ax.set_aspect('equal') plt.tight_layout() return fig def delaunay_neighbor_property(sites: np.ndarray) -> dict: """ Demonstrate that Delaunay edges connect 'natural neighbors'. Returns statistics about NN relationships. """ delaunay = Delaunay(sites) n = len(sites) # Get Delaunay neighbors for each site delaunay_neighbors = [set() for _ in range(n)] for simplex in delaunay.simplices: for i, a in enumerate(simplex): for b in simplex: if a != b: delaunay_neighbors[a].add(b) # Check if 1-NN is always a Delaunay neighbor nn_is_delaunay = 0 k_nn_in_delaunay = [] # For each site, how many of k-NN are Delaunay neighbors? k = 5 for i in range(n): # Compute distances from site i to all others distances = np.sqrt(np.sum((sites - sites[i]) ** 2, axis=1)) distances[i] = np.inf # Exclude self # Get k nearest neighbors k_nearest = np.argsort(distances)[:k] # Is 1-NN a Delaunay neighbor? if k_nearest[0] in delaunay_neighbors[i]: nn_is_delaunay += 1 # How many of k-NN are Delaunay neighbors? in_delaunay = sum(1 for j in k_nearest if j in delaunay_neighbors[i]) k_nn_in_delaunay.append(in_delaunay) return { 'n_sites': n, '1nn_in_delaunay_pct': 100 * nn_is_delaunay / n, f'{k}nn_in_delaunay_avg': np.mean(k_nn_in_delaunay), 'avg_delaunay_degree': np.mean([len(s) for s in delaunay_neighbors]) } # Example output:# {'n_sites': 100, # '1nn_in_delaunay_pct': 100.0, # 1-NN is ALWAYS a Delaunay neighbor!# '5nn_in_delaunay_avg': 4.2, # Most k-NN are Delaunay neighbors# 'avg_delaunay_degree': 5.8} # Average ~6 Delaunay edges per siteA beautiful theorem: in 2D, the nearest neighbor of any site is always connected to it in the Delaunay triangulation. This means the Delaunay graph contains all 1-NN relationships. Graph-based NN methods exploit this by searching along neighbor connections.
There's a profound connection between Voronoi diagrams and K-nearest neighbor classification.
1-NN and Voronoi Cells:
For 1-NN classification, the decision boundary is exactly the Voronoi diagram of the training points. Every point in the Voronoi cell of training example $(\mathbf{x}_i, y_i)$ is classified as $y_i$.
K-NN and Higher-Order Voronoi:
For $K > 1$, we need the order-K Voronoi diagram, which partitions space based on the $K$ nearest sites (not just the nearest one).
Order-K Voronoi Cell:
$$V_K({s_{i_1}, \ldots, s_{i_K}}) = {\mathbf{x} : \text{the K nearest sites to } \mathbf{x} \text{ are } s_{i_1}, \ldots, s_{i_K}}$$
The decision boundary of K-NN is where the majority vote changes—which happens at order-K Voronoi edges where the composition of the K nearest neighbors changes.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
import numpy as npimport matplotlib.pyplot as pltfrom scipy.spatial import Voronoi, voronoi_plot_2dfrom sklearn.neighbors import KNeighborsClassifier def visualize_knn_decision_boundary(X: np.ndarray, y: np.ndarray, k: int): """ Visualize KNN decision boundaries and their relation to Voronoi cells. """ fig, axes = plt.subplots(1, 2, figsize=(14, 6)) # Left: 1-NN (Voronoi regions) ax = axes[0] ax.set_title(f'1-NN Decision Boundaries (= Voronoi Diagram)') # Compute Voronoi vor = Voronoi(X) # Color each region by its class colors = plt.cm.Set1(np.linspace(0, 1, len(np.unique(y)))) for point_idx, region_idx in enumerate(vor.point_region): region = vor.regions[region_idx] if not region or -1 in region: continue polygon = vor.vertices[region] ax.fill(polygon[:, 0], polygon[:, 1], alpha=0.3, color=colors[y[point_idx]]) # Plot points for c in np.unique(y): mask = y == c ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], edgecolor='black', s=100, label=f'Class {c}') ax.legend() ax.set_xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5) ax.set_ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5) ax.set_aspect('equal') # Right: K-NN decision boundary ax = axes[1] ax.set_title(f'{k}-NN Decision Boundaries') # Create mesh grid for decision boundary visualization xx, yy = np.meshgrid( np.linspace(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5, 200), np.linspace(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5, 200) ) # Train KNN and predict on grid knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X, y) Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) # Plot decision regions ax.contourf(xx, yy, Z, alpha=0.3, cmap='Set1') ax.contour(xx, yy, Z, colors='black', linewidths=0.5) # Plot points for c in np.unique(y): mask = y == c ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], edgecolor='black', s=100, label=f'Class {c}') ax.legend() ax.set_xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5) ax.set_ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5) ax.set_aspect('equal') plt.tight_layout() return fig def demonstrate_k_effect(): """ Show how decision boundaries smooth as K increases. """ np.random.seed(42) # Generate two-class data n = 30 X = np.vstack([ np.random.randn(n, 2) + [2, 2], np.random.randn(n, 2) + [-2, -2] ]) y = np.array([0] * n + [1] * n) fig, axes = plt.subplots(1, 4, figsize=(16, 4)) for idx, k in enumerate([1, 3, 7, 15]): ax = axes[idx] ax.set_title(f'K = {k}') # Create mesh xx, yy = np.meshgrid( np.linspace(-5, 5, 150), np.linspace(-5, 5, 150) ) knn = KNeighborsClassifier(n_neighbors=k) knn.fit(X, y) Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape) ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm') ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolor='black', s=50) ax.set_xlim(-5, 5) ax.set_ylim(-5, 5) ax.set_aspect('equal') fig.suptitle('K-NN Decision Boundaries: From Jagged (K=1) to Smooth (K=15)') plt.tight_layout() return figKey Observations:
K=1 boundaries are sharp — They exactly follow the Voronoi edges, creating jagged decision regions.
Larger K smooths boundaries — As K increases, boundaries become smoother because more neighbors must change for the vote to flip.
Boundary complexity — The total length of decision boundaries decreases with K, reducing model complexity.
Noise sensitivity — K=1 is sensitive to noise (each noisy point creates its own region); larger K averages out individual outliers.
While Voronoi diagrams are elegant in 2D and 3D, they become impractical in higher dimensions. This explains why KD-trees, Ball trees, and approximate methods are necessary.
Combinatorial Complexity:
The number of faces of a Voronoi diagram in $d$ dimensions can be exponential:
| Dimension | Vertices (worst case) | Total complexity |
|---|---|---|
| 2 | $O(n)$ | $O(n)$ |
| 3 | $O(n^2)$ | $O(n^2)$ |
| $d$ | $O(n^{\lceil d/2 \rceil})$ | $O(n^{\lceil d/2 \rceil})$ |
For $d = 10$ dimensions:
Upper Bound Theorem (McMullen):
The maximum number of $k$-faces of a Voronoi diagram of $n$ points in $\mathbb{R}^d$ is:
$$O\left(n^{\lceil d/2 \rceil}\right)$$
This bound is tight—there exist point configurations that achieve it.
| Dimension | Vertices for n=100 | Vertices for n=1000 | Storage (bytes, approx) |
|---|---|---|---|
| 2 | ~300 | ~3,000 | ~240 KB |
| 3 | ~10,000 | ~1,000,000 | ~80 MB |
| 4 | ~100,000 | ~100,000,000 | ~80 GB |
| 5 | ~1,000,000 | ~10^10 | ~8 TB |
| 10 | ~10^10 | ~10^15 | Impossible |
By dimension 10-20, explicit Voronoi diagram computation and storage becomes computationally intractable. This is a fundamental barrier, not a limitation of current algorithms. It's why tree-based methods (which have O(n) space) and approximate methods are essential for high-dimensional NN search.
What Voronoi Diagrams Teach Us:
Even though we can't compute explicit Voronoi diagrams in high dimensions, they provide important theoretical insights:
Voronoi diagrams appear throughout computer science and applied mathematics, far beyond nearest neighbor search.
Computer Graphics and Games:
Geographic Information Systems:
Computational Biology:
Physics and Materials:
The K-means clustering algorithm is actually Lloyd's algorithm for computing centroidal Voronoi tessellations. Each iteration moves sites to their Voronoi cell centroids, gradually optimizing the tessellation. This deep connection links clustering, Voronoi geometry, and KNN classification.
What's Next:
We've seen that exact NN methods (KD-trees, Ball trees, Voronoi) struggle in high dimensions. The next page introduces Locality-Sensitive Hashing (LSH), a fundamentally different approach that trades exactness for speed. LSH uses randomized hashing to achieve sub-linear approximate search in any dimension.
You now understand Voronoi diagrams: their mathematical foundation, role as optimal 2D/3D NN structures, duality with Delaunay triangulation, and relationship to KNN decision boundaries. This geometric perspective illuminates why high-dimensional NN is fundamentally hard. Next, we explore LSH for practical high-dimensional approximate search.