Efficient Knn - Learning Module

Loading content...

0/245

Voronoi Diagrams

The Geometry of Nearest Neighbors

Imagine dropping a pebble into a pond with several floating buoys. Ripples spread outward from the pebble. Which buoy do the ripples reach first? This seemingly simple question leads to one of the most beautiful structures in computational geometry: the Voronoi diagram.

A Voronoi diagram partitions space into regions based on proximity to a set of points, called sites or generators. Each region contains all points closer to its site than to any other site. For nearest neighbor search, the Voronoi cell of a site $s$ is precisely the set of query points for which $s$ is the nearest neighbor.

Voronoi diagrams are important for several reasons:

Theoretical foundation — They represent the optimal data structure for 1-NN search in low dimensions
Geometric insight — They reveal how KNN decision boundaries partition space
Dual perspective — Their dual (the Delaunay triangulation) has independent applications
Historical significance — Named after mathematician Georgy Voronoi (1908), but studied by Descartes (1644), Dirichlet (1850), and others

What You Will Learn

By the end of this page, you will understand the mathematical definition of Voronoi diagrams, how they provide O(log n) nearest neighbor search in 2D, the relationship between Voronoi cells and KNN decision boundaries, the Delaunay triangulation and its duality with Voronoi diagrams, and why Voronoi diagrams become impractical in high dimensions.

Mathematical Definition

Let's formalize the Voronoi diagram precisely.

Definition (Voronoi Diagram):

Given a set of $n$ points $S = {s_1, s_2, \ldots, s_n}$ in $\mathbb{R}^d$, the Voronoi cell (or Voronoi region) of site $s_i$ is:

$$V(s_i) = {\mathbf{x} \in \mathbb{R}^d : d(\mathbf{x}, s_i) \leq d(\mathbf{x}, s_j) \text{ for all } j \neq i}$$

The Voronoi diagram of $S$ is the collection of all Voronoi cells:

$$\text{Vor}(S) = {V(s_1), V(s_2), \ldots, V(s_n)}$$

Key Properties:

Covering: $\bigcup_{i=1}^{n} V(s_i) = \mathbb{R}^d$ (cells cover all of space)
Disjoint interiors: $\text{int}(V(s_i)) \cap \text{int}(V(s_j)) = \emptyset$ for $i \neq j$
Convexity: Each Voronoi cell is a convex polytope
Containment: $s_i \in V(s_i)$ (each site is in its own cell)

Building Blocks:

To construct Voronoi cells, we use half-spaces defined by bisector hyperplanes.

The bisector between sites $s_i$ and $s_j$ is the hyperplane equidistant from both:

$$B(s_i, s_j) = {\mathbf{x} : d(\mathbf{x}, s_i) = d(\mathbf{x}, s_j)}$$

For Euclidean distance, this is:

$$B(s_i, s_j) = {\mathbf{x} : (\mathbf{x} - \frac{s_i + s_j}{2}) \cdot (s_j - s_i) = 0}$$

The bisector is perpendicular to the segment $\overline{s_i s_j}$ and passes through its midpoint.

Half-space: The set of points closer to $s_i$ than to $s_j$:

$$H(s_i, s_j) = {\mathbf{x} : d(\mathbf{x}, s_i) \leq d(\mathbf{x}, s_j)}$$

Voronoi cell as intersection:

$$V(s_i) = \bigcap_{j \neq i} H(s_i, s_j)$$

Each cell is the intersection of $n-1$ half-spaces—a convex polytope.

Geometric Interpretation

Think of Voronoi cells as 'territories' where each site controls all points closer to it than to any rival. The cell boundaries are where territorial control changes—exactly the decision boundaries of 1-nearest neighbor classification.

Voronoi Diagrams in 2D

In two dimensions, Voronoi diagrams have particularly nice properties that make them practical for computation.

2D Voronoi Structure:

Cells: Convex polygons (possibly unbounded)
Edges: Straight line segments forming cell boundaries
Vertices: Points where three or more cells meet

Combinatorial Complexity:

For $n$ sites in general position in 2D:

At most $3n - 6$ vertices
At most $3n - 5$ edges
Exactly $n$ cells

This is $O(n)$ total complexity—linear in the number of sites!

Construction Algorithms:

Several algorithms compute 2D Voronoi diagrams:

Algorithm	Time Complexity	Space	Strategy
Fortune's sweep	$O(n \log n)$	$O(n)$	Sweep line with beach line
Divide and conquer	$O(n \log n)$	$O(n)$	Recursive merge
Incremental	$O(n^2)$ worst	$O(n)$	Add sites one by one
Via Delaunay	$O(n \log n)$	$O(n)$	Compute dual first

voronoi_2d.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import numpy as np
from scipy.spatial import Voronoi, voronoi_plot_2d
import matplotlib.pyplot as plt
from typing import List, Tuple
 
def visualize_voronoi_2d(points: np.ndarray, show_labels: bool = True):
    """
    Visualize a 2D Voronoi diagram.
    
    Uses scipy.spatial.Voronoi which implements Fortune's algorithm.
    
    Parameters:
    -----------
    points : np.ndarray, shape (n, 2)
        Voronoi sites
    show_labels : bool
        Whether to label the sites
    """
    # Compute Voronoi diagram
    vor = Voronoi(points)
    
    # Create figure
    fig, ax = plt.subplots(figsize=(10, 10))
    
    # Plot the Voronoi diagram
    voronoi_plot_2d(vor, ax=ax, show_vertices=True, 
                   line_colors='blue', line_width=1.5,
                   point_size=10)
    
    # Add site labels
    if show_labels:
        for i, (x, y) in enumerate(points):
            ax.annotate(f'$s_{i}$', (x, y), fontsize=12,
                       xytext=(5, 5), textcoords='offset points')
    
    # Annotate Voronoi regions
    for i, region in enumerate(vor.regions):
        if not region or -1 in region:
            continue  # Skip empty or infinite regions
        polygon = [vor.vertices[j] for j in region]
        centroid = np.mean(polygon, axis=0)
        # Find which site owns this region
        for site_idx, point_region in enumerate(vor.point_region):
            if point_region == i:
                ax.annotate(f'$V(s_{site_idx})$', centroid, fontsize=10,
                           ha='center', va='center', color='gray')
                break
    
    ax.set_xlim(points[:, 0].min() - 1, points[:, 0].max() + 1)
    ax.set_ylim(points[:, 1].min() - 1, points[:, 1].max() + 1)
    ax.set_aspect('equal')
    ax.set_title('2D Voronoi Diagram')
    ax.set_xlabel('$x$')
    ax.set_ylabel('$y$')
    
    return fig, ax, vor
 
 
def voronoi_nn_search(vor: Voronoi, query: np.ndarray) -> int:
    """
    Naive nearest neighbor search using Voronoi structure.
    
    In practice, you'd use point location algorithms for O(log n) search.
    This just demonstrates the concept.
    """
    # Simple approach: check which cell contains the query
    # by computing distances to all sites
    distances = np.sqrt(np.sum((vor.points - query) ** 2, axis=1))
    return np.argmin(distances)
 
 
# Example usage
if __name__ == "__main__":
    np.random.seed(42)
    sites = np.random.rand(15, 2) * 10  # 15 random sites in [0, 10]^2
    fig, ax, vor = visualize_voronoi_2d(sites)
    
    # Query point
    query = np.array([5.0, 5.0])
    nearest = voronoi_nn_search(vor, query)
    ax.plot(*query, 'r*', markersize=15, label='Query')
    ax.plot(*sites[nearest], 'go', markersize=12, label=f'NN: $s_{nearest}$')
    ax.legend()
    plt.show()

Fortune's Algorithm Insight

Fortune's algorithm uses a 'beach line' of parabolic arcs that sweeps across the plane. Each parabola represents points equidistant from a site and the sweep line. The intersections of parabolas trace out Voronoi edges. This elegant approach achieves optimal O(n log n) time.

Point Location for O(log n) Search

Once we have the Voronoi diagram, nearest neighbor search reduces to point location: given a query point $\mathbf{q}$, find which Voronoi cell contains $\mathbf{q}$.

Point Location Algorithms:

Several data structures enable logarithmic point location in 2D:

Method	Query Time	Space	Preprocessing
Trapezoidal map	$O(\log n)$	$O(n)$	$O(n \log n)$
Kirkpatrick hierarchy	$O(\log n)$	$O(n)$	$O(n)$
Slab decomposition	$O(\log n)$	$O(n^2)$	$O(n^2)$

Trapezoidal Map Approach:

Extend vertical rays up and down from each Voronoi vertex
This creates a grid of trapezoids, each belonging to one cell
Build a search tree over trapezoids
Query follows the tree to find the trapezoid containing $\mathbf{q}$

Kirkpatrick's Hierarchy (Optimal):

Triangulate the Voronoi diagram
Build a hierarchy of coarser triangulations
Each level has $O(n^{1/c})$ fewer triangles for constant $c > 1$
Total levels: $O(\log n)$
Query walks down the hierarchy, $O(1)$ work per level

point_location.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import numpy as np
from scipy.spatial import Delaunay
from typing import Optional
 
class VoronoiPointLocator:
    """
    Point location in 2D Voronoi diagram using Delaunay triangulation.
    
    Uses the fact that Delaunay triangulation is the dual of Voronoi.
    Point location in a triangulation can be done in O(log n) expected time.
    
    Note: This simplified implementation uses scipy's built-in 
    point location, which is O(log n) expected.
    """
    
    def __init__(self, sites: np.ndarray):
        """
        Build point location structure.
        
        Parameters:
        -----------
        sites : np.ndarray, shape (n, 2)
            Voronoi sites / Delaunay vertices
        """
        self.sites = sites
        self.n = len(sites)
        
        # Compute Delaunay triangulation (dual of Voronoi)
        self.delaunay = Delaunay(sites)
        
    def query(self, point: np.ndarray) -> int:
        """
        Find nearest neighbor of query point.
        
        Time: O(log n) expected
        
        Parameters:
        -----------
        point : np.ndarray, shape (2,)
            Query point
        
        Returns:
        --------
        int : Index of nearest site
        """
        # Find which Delaunay triangle contains the point
        # scipy uses a directed walk, O(√n) expected, O(n) worst
        simplex = self.delaunay.find_simplex(point)
        
        if simplex == -1:
            # Point outside convex hull; find closest site on hull
            hull_indices = np.unique(self.delaunay.convex_hull)
            distances = np.sqrt(np.sum(
                (self.sites[hull_indices] - point) ** 2, axis=1
            ))
            return hull_indices[np.argmin(distances)]
        
        # The query point's NN is one of the triangle's vertices
        triangle_vertices = self.delaunay.simplices[simplex]
        vertex_points = self.sites[triangle_vertices]
        distances = np.sqrt(np.sum((vertex_points - point) ** 2, axis=1))
        
        return triangle_vertices[np.argmin(distances)]
    
    def batch_query(self, points: np.ndarray) -> np.ndarray:
        """Find nearest neighbors for multiple query points."""
        return np.array([self.query(p) for p in points])
 
 
def demonstrate_point_location():
    """
    Demonstrate O(log n) nearest neighbor via point location.
    """
    import time
    
    np.random.seed(0)
    
    # Build locator for different sizes
    for n in [100, 1000, 10000, 100000]:
        sites = np.random.rand(n, 2) * 100
        
        # Build time
        start = time.perf_counter()
        locator = VoronoiPointLocator(sites)
        build_time = time.perf_counter() - start
        
        # Query time (average over 1000 queries)
        queries = np.random.rand(1000, 2) * 100
        start = time.perf_counter()
        results = locator.batch_query(queries)
        query_time = (time.perf_counter() - start) / 1000 * 1000  # ms
        
        print(f"n={n:6d}: build={build_time*1000:.1f}ms, "
              f"query={query_time:.4f}ms/query")
 
# Output (illustrative):
# n=   100: build=    0.5ms, query=0.0150ms/query
# n=  1000: build=    2.3ms, query=0.0180ms/query
# n= 10000: build=   25.0ms, query=0.0210ms/query  (only 1.4x higher!)
# n=100000: build=  280.0ms, query=0.0250ms/query  (barely changes)

The Power of Preprocessing

Point location is a classic example of the preprocessing-query tradeoff. We invest O(n log n) time building the Voronoi diagram and point location structure, then reap O(log n) queries forever. For 1 million sites, queries take microseconds despite the massive data.

Delaunay Triangulation: The Dual Structure

The Voronoi diagram has a beautiful dual: the Delaunay triangulation. Understanding this duality provides deeper geometric insight.

Definition (Delaunay Triangulation):

The Delaunay triangulation $\text{Del}(S)$ of a point set $S$ connects sites whose Voronoi cells share an edge.

Duality between Voronoi and Delaunay:

Voronoi	Delaunay
Cell (region)	Vertex (site)
Edge	Edge (connecting sites with adjacent cells)
Vertex	Face (triangle)

Empty Circle Property:

A triangulation is Delaunay if and only if the circumcircle of each triangle contains no other sites. This is the defining characteristic.

Why This Matters for KNN:

Delaunay edges connect natural neighbors — Sites connected by a Delaunay edge are likely to be nearest neighbors for queries near their shared cell boundary.
K-nearest neighbors often form a path in Delaunay — Though not always, the k nearest sites tend to be Delaunay-connected.
Delaunay guides search — Graph-based NN methods (like HNSW) are inspired by navigating a graph of neighbors.

delaunay_dual.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
import numpy as np
from scipy.spatial import Delaunay, Voronoi
import matplotlib.pyplot as plt
 
def visualize_voronoi_delaunay_duality(sites: np.ndarray):
    """
    Visualize the duality between Voronoi and Delaunay structures.
    """
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Compute structures
    vor = Voronoi(sites)
    delaunay = Delaunay(sites)
    
    # Left: Voronoi only
    ax = axes[0]
    ax.set_title('Voronoi Diagram')
    for ridge in vor.ridge_vertices:
        if -1 in ridge:
            continue
        ax.plot(vor.vertices[ridge, 0], vor.vertices[ridge, 1], 'b-', lw=1.5)
    ax.scatter(*sites.T, c='red', s=50, zorder=5)
    ax.set_aspect('equal')
    
    # Middle: Delaunay only
    ax = axes[1]
    ax.set_title('Delaunay Triangulation')
    ax.triplot(sites[:, 0], sites[:, 1], delaunay.simplices, 'g-', lw=1.5)
    ax.scatter(*sites.T, c='red', s=50, zorder=5)
    ax.set_aspect('equal')
    
    # Right: Both overlaid (showing duality)
    ax = axes[2]
    ax.set_title('Duality: Voronoi (blue) + Delaunay (green)')
    for ridge in vor.ridge_vertices:
        if -1 in ridge:
            continue
        ax.plot(vor.vertices[ridge, 0], vor.vertices[ridge, 1], 'b-', lw=1, alpha=0.7)
    ax.triplot(sites[:, 0], sites[:, 1], delaunay.simplices, 'g-', lw=1, alpha=0.7)
    ax.scatter(*sites.T, c='red', s=50, zorder=5)
    
    # Draw duality connections
    for ridge_idx, (i, j) in enumerate(vor.ridge_points):
        # Each Voronoi edge (connecting vor vertices) corresponds to
        # a Delaunay edge (connecting sites i and j)
        mid_site = (sites[i] + sites[j]) / 2
        ridge_verts = vor.ridge_vertices[ridge_idx]
        if -1 not in ridge_verts:
            mid_vor = vor.vertices[ridge_verts].mean(axis=0)
            ax.plot([mid_site[0], mid_vor[0]], [mid_site[1], mid_vor[1]], 
                   'k--', lw=0.5, alpha=0.5)
    
    ax.set_aspect('equal')
    plt.tight_layout()
    return fig
 
 
def delaunay_neighbor_property(sites: np.ndarray) -> dict:
    """
    Demonstrate that Delaunay edges connect 'natural neighbors'.
    
    Returns statistics about NN relationships.
    """
    delaunay = Delaunay(sites)
    n = len(sites)
    
    # Get Delaunay neighbors for each site
    delaunay_neighbors = [set() for _ in range(n)]
    for simplex in delaunay.simplices:
        for i, a in enumerate(simplex):
            for b in simplex:
                if a != b:
                    delaunay_neighbors[a].add(b)
    
    # Check if 1-NN is always a Delaunay neighbor
    nn_is_delaunay = 0
    k_nn_in_delaunay = []  # For each site, how many of k-NN are Delaunay neighbors?
    
    k = 5
    for i in range(n):
        # Compute distances from site i to all others
        distances = np.sqrt(np.sum((sites - sites[i]) ** 2, axis=1))
        distances[i] = np.inf  # Exclude self
        
        # Get k nearest neighbors
        k_nearest = np.argsort(distances)[:k]
        
        # Is 1-NN a Delaunay neighbor?
        if k_nearest[0] in delaunay_neighbors[i]:
            nn_is_delaunay += 1
        
        # How many of k-NN are Delaunay neighbors?
        in_delaunay = sum(1 for j in k_nearest if j in delaunay_neighbors[i])
        k_nn_in_delaunay.append(in_delaunay)
    
    return {
        'n_sites': n,
        '1nn_in_delaunay_pct': 100 * nn_is_delaunay / n,
        f'{k}nn_in_delaunay_avg': np.mean(k_nn_in_delaunay),
        'avg_delaunay_degree': np.mean([len(s) for s in delaunay_neighbors])
    }
 
# Example output:
# {'n_sites': 100, 
#  '1nn_in_delaunay_pct': 100.0,  # 1-NN is ALWAYS a Delaunay neighbor!
#  '5nn_in_delaunay_avg': 4.2,     # Most k-NN are Delaunay neighbors
#  'avg_delaunay_degree': 5.8}    # Average ~6 Delaunay edges per site

1-NN is Always a Delaunay Neighbor

A beautiful theorem: in 2D, the nearest neighbor of any site is always connected to it in the Delaunay triangulation. This means the Delaunay graph contains all 1-NN relationships. Graph-based NN methods exploit this by searching along neighbor connections.

Voronoi Cells as KNN Decision Boundaries

There's a profound connection between Voronoi diagrams and K-nearest neighbor classification.

1-NN and Voronoi Cells:

For 1-NN classification, the decision boundary is exactly the Voronoi diagram of the training points. Every point in the Voronoi cell of training example $(\mathbf{x}_i, y_i)$ is classified as $y_i$.

K-NN and Higher-Order Voronoi:

For $K > 1$, we need the order-K Voronoi diagram, which partitions space based on the $K$ nearest sites (not just the nearest one).

Order-K Voronoi Cell:

$$V_K({s_{i_1}, \ldots, s_{i_K}}) = {\mathbf{x} : \text{the K nearest sites to } \mathbf{x} \text{ are } s_{i_1}, \ldots, s_{i_K}}$$

The decision boundary of K-NN is where the majority vote changes—which happens at order-K Voronoi edges where the composition of the K nearest neighbors changes.

knn_decision_boundaries.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
from sklearn.neighbors import KNeighborsClassifier
 
def visualize_knn_decision_boundary(X: np.ndarray, y: np.ndarray, k: int):
    """
    Visualize KNN decision boundaries and their relation to Voronoi cells.
    """
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    # Left: 1-NN (Voronoi regions)
    ax = axes[0]
    ax.set_title(f'1-NN Decision Boundaries (= Voronoi Diagram)')
    
    # Compute Voronoi
    vor = Voronoi(X)
    
    # Color each region by its class
    colors = plt.cm.Set1(np.linspace(0, 1, len(np.unique(y))))
    
    for point_idx, region_idx in enumerate(vor.point_region):
        region = vor.regions[region_idx]
        if not region or -1 in region:
            continue
        polygon = vor.vertices[region]
        ax.fill(polygon[:, 0], polygon[:, 1], 
               alpha=0.3, color=colors[y[point_idx]])
    
    # Plot points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], 
                  edgecolor='black', s=100, label=f'Class {c}')
    
    ax.legend()
    ax.set_xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5)
    ax.set_ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5)
    ax.set_aspect('equal')
    
    # Right: K-NN decision boundary
    ax = axes[1]
    ax.set_title(f'{k}-NN Decision Boundaries')
    
    # Create mesh grid for decision boundary visualization
    xx, yy = np.meshgrid(
        np.linspace(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5, 200),
        np.linspace(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5, 200)
    )
    
    # Train KNN and predict on grid
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot decision regions
    ax.contourf(xx, yy, Z, alpha=0.3, cmap='Set1')
    ax.contour(xx, yy, Z, colors='black', linewidths=0.5)
    
    # Plot points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], 
                  edgecolor='black', s=100, label=f'Class {c}')
    
    ax.legend()
    ax.set_xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5)
    ax.set_ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5)
    ax.set_aspect('equal')
    
    plt.tight_layout()
    return fig
 
 
def demonstrate_k_effect():
    """
    Show how decision boundaries smooth as K increases.
    """
    np.random.seed(42)
    
    # Generate two-class data
    n = 30
    X = np.vstack([
        np.random.randn(n, 2) + [2, 2],
        np.random.randn(n, 2) + [-2, -2]
    ])
    y = np.array([0] * n + [1] * n)
    
    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
    
    for idx, k in enumerate([1, 3, 7, 15]):
        ax = axes[idx]
        ax.set_title(f'K = {k}')
        
        # Create mesh
        xx, yy = np.meshgrid(
            np.linspace(-5, 5, 150),
            np.linspace(-5, 5, 150)
        )
        
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X, y)
        Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
        
        ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
        ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolor='black', s=50)
        ax.set_xlim(-5, 5)
        ax.set_ylim(-5, 5)
        ax.set_aspect('equal')
    
    fig.suptitle('K-NN Decision Boundaries: From Jagged (K=1) to Smooth (K=15)')
    plt.tight_layout()
    return fig

Key Observations:

K=1 boundaries are sharp — They exactly follow the Voronoi edges, creating jagged decision regions.
Larger K smooths boundaries — As K increases, boundaries become smoother because more neighbors must change for the vote to flip.
Boundary complexity — The total length of decision boundaries decreases with K, reducing model complexity.
Noise sensitivity — K=1 is sensitive to noise (each noisy point creates its own region); larger K averages out individual outliers.

Voronoi Diagrams in Higher Dimensions

While Voronoi diagrams are elegant in 2D and 3D, they become impractical in higher dimensions. This explains why KD-trees, Ball trees, and approximate methods are necessary.

Combinatorial Complexity:

The number of faces of a Voronoi diagram in $d$ dimensions can be exponential:

Dimension	Vertices (worst case)	Total complexity
2	$O(n)$	$O(n)$
3	$O(n^2)$	$O(n^2)$
$d$	$O(n^{\lceil d/2 \rceil})$	$O(n^{\lceil d/2 \rceil})$

For $d = 10$ dimensions:

Worst case: $O(n^5)$ vertices
For $n = 1000$: potentially $10^{15}$ vertices!

Upper Bound Theorem (McMullen):

The maximum number of $k$-faces of a Voronoi diagram of $n$ points in $\mathbb{R}^d$ is:

$$O\left(n^{\lceil d/2 \rceil}\right)$$

This bound is tight—there exist point configurations that achieve it.

Voronoi Diagram Complexity Growth
Dimension	Vertices for n=100	Vertices for n=1000	Storage (bytes, approx)
2	~300	~3,000	~240 KB
3	~10,000	~1,000,000	~80 MB
4	~100,000	~100,000,000	~80 GB
5	~1,000,000	~10^10	~8 TB
10	~10^10	~10^15	Impossible

The Dimensional Barrier

By dimension 10-20, explicit Voronoi diagram computation and storage becomes computationally intractable. This is a fundamental barrier, not a limitation of current algorithms. It's why tree-based methods (which have O(n) space) and approximate methods are essential for high-dimensional NN search.

What Voronoi Diagrams Teach Us:

Even though we can't compute explicit Voronoi diagrams in high dimensions, they provide important theoretical insights:

Query complexity lower bound — Voronoi cell geometry determines minimum query cost
Neighbor structure — Delaunay-like graphs (approximate nearest neighbor graphs) guide search in high dimensions
Cell shape intuition — In high dimensions, Voronoi cells become long and thin, with most volume far from the site (contributing to curse of dimensionality)
Decision boundary complexity — The number of Voronoi vertices bounds KNN decision complexity

Applications Beyond Nearest Neighbor Search

Voronoi diagrams appear throughout computer science and applied mathematics, far beyond nearest neighbor search.

Computer Graphics and Games:

Collision detection: Voronoi cells determine regions of influence
Procedural content generation: Voronoi noise creates natural-looking textures and terrain
Path planning: Voronoi edges are maximally distant from obstacles

Geographic Information Systems:

Service area planning: Each hospital, fire station, or store has a Voronoi cell of closest customers
Interpolation: Voronoi tessellation underlies natural neighbor interpolation
Urban planning: Analyze accessibility and coverage

Computational Biology:

Protein structure: Voronoi cells define atomic volumes
Cell biology: Voronoi patterns appear in cell arrangements
Ecology: Species territories often follow Voronoi-like patterns

Physics and Materials:

Crystal structure: Voronoi cells = Wigner-Seitz cells in crystallography
Foam structure: Bubbles in foam approximate Voronoi cells
Galaxy distribution: Cosmic web structure relates to Voronoi tessellations

Voronoi in Nature and Art

•Giraffe coat patterns — The spots approximate a Voronoi tessellation of pigment cells
•Dragonfly wings — The venation pattern follows Voronoi-like geometry
•Cracked mud — Drying mud cracks along Voronoi boundaries
•Honeycomb — While hexagonal, honeycomb is related to Voronoi tessellation of packed circles
•Soap bubbles — Bubble interfaces minimize surface area, creating Voronoi-like patterns

Lloyd's Algorithm

The K-means clustering algorithm is actually Lloyd's algorithm for computing centroidal Voronoi tessellations. Each iteration moves sites to their Voronoi cell centroids, gradually optimizing the tessellation. This deep connection links clustering, Voronoi geometry, and KNN classification.

Summary and Next Steps

Key Takeaways

•Voronoi cells define nearest neighbor regions — The cell of site $s$ contains exactly the points for which $s$ is the closest site.
•In 2D/3D, Voronoi enables O(log n) search via point location data structures after O(n log n) preprocessing.
•Delaunay triangulation is dual to Voronoi — Delaunay edges connect sites with adjacent Voronoi cells; 1-NN is always a Delaunay neighbor.
•KNN decision boundaries relate to order-K Voronoi — Higher K smooths boundaries by averaging over more neighbors.
•Voronoi complexity is exponential in dimension — $O(n^{d/2})$ worst case, making explicit computation impractical beyond ~10D.
•Voronoi appears throughout science — From crystal structure to giraffe patterns, Voronoi tessellation is a fundamental spatial partition.

What's Next:

We've seen that exact NN methods (KD-trees, Ball trees, Voronoi) struggle in high dimensions. The next page introduces Locality-Sensitive Hashing (LSH), a fundamentally different approach that trades exactness for speed. LSH uses randomized hashing to achieve sub-linear approximate search in any dimension.

Page Complete

You now understand Voronoi diagrams: their mathematical foundation, role as optimal 2D/3D NN structures, duality with Delaunay triangulation, and relationship to KNN decision boundaries. This geometric perspective illuminates why high-dimensional NN is fundamentally hard. Next, we explore LSH for practical high-dimensional approximate search.

Voronoi Diagrams

The Geometry of Nearest Neighbors

Voronoi diagrams are important for several reasons:

Theoretical foundation — They represent the optimal data structure for 1-NN search in low dimensions
Geometric insight — They reveal how KNN decision boundaries partition space
Dual perspective — Their dual (the Delaunay triangulation) has independent applications
Historical significance — Named after mathematician Georgy Voronoi (1908), but studied by Descartes (1644), Dirichlet (1850), and others

What You Will Learn

Mathematical Definition

Let's formalize the Voronoi diagram precisely.

Definition (Voronoi Diagram):

Given a set of $n$ points $S = {s_1, s_2, \ldots, s_n}$ in $\mathbb{R}^d$, the Voronoi cell (or Voronoi region) of site $s_i$ is:

$$V(s_i) = {\mathbf{x} \in \mathbb{R}^d : d(\mathbf{x}, s_i) \leq d(\mathbf{x}, s_j) \text{ for all } j \neq i}$$

The Voronoi diagram of $S$ is the collection of all Voronoi cells:

$$\text{Vor}(S) = {V(s_1), V(s_2), \ldots, V(s_n)}$$

Key Properties:

Covering: $\bigcup_{i=1}^{n} V(s_i) = \mathbb{R}^d$ (cells cover all of space)
Disjoint interiors: $\text{int}(V(s_i)) \cap \text{int}(V(s_j)) = \emptyset$ for $i \neq j$
Convexity: Each Voronoi cell is a convex polytope
Containment: $s_i \in V(s_i)$ (each site is in its own cell)

Building Blocks:

To construct Voronoi cells, we use half-spaces defined by bisector hyperplanes.

The bisector between sites $s_i$ and $s_j$ is the hyperplane equidistant from both:

$$B(s_i, s_j) = {\mathbf{x} : d(\mathbf{x}, s_i) = d(\mathbf{x}, s_j)}$$

For Euclidean distance, this is:

$$B(s_i, s_j) = {\mathbf{x} : (\mathbf{x} - \frac{s_i + s_j}{2}) \cdot (s_j - s_i) = 0}$$

The bisector is perpendicular to the segment $\overline{s_i s_j}$ and passes through its midpoint.

Half-space: The set of points closer to $s_i$ than to $s_j$:

$$H(s_i, s_j) = {\mathbf{x} : d(\mathbf{x}, s_i) \leq d(\mathbf{x}, s_j)}$$

Voronoi cell as intersection:

$$V(s_i) = \bigcap_{j \neq i} H(s_i, s_j)$$

Each cell is the intersection of $n-1$ half-spaces—a convex polytope.

Geometric Interpretation

Voronoi Diagrams in 2D

In two dimensions, Voronoi diagrams have particularly nice properties that make them practical for computation.

2D Voronoi Structure:

Cells: Convex polygons (possibly unbounded)
Edges: Straight line segments forming cell boundaries
Vertices: Points where three or more cells meet

Combinatorial Complexity:

For $n$ sites in general position in 2D:

At most $3n - 6$ vertices
At most $3n - 5$ edges
Exactly $n$ cells

This is $O(n)$ total complexity—linear in the number of sites!

Construction Algorithms:

Several algorithms compute 2D Voronoi diagrams:

Algorithm	Time Complexity	Space	Strategy
Fortune's sweep	$O(n \log n)$	$O(n)$	Sweep line with beach line
Divide and conquer	$O(n \log n)$	$O(n)$	Recursive merge
Incremental	$O(n^2)$ worst	$O(n)$	Add sites one by one
Via Delaunay	$O(n \log n)$	$O(n)$	Compute dual first

voronoi_2d.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import numpy as np
from scipy.spatial import Voronoi, voronoi_plot_2d
import matplotlib.pyplot as plt
from typing import List, Tuple
 
def visualize_voronoi_2d(points: np.ndarray, show_labels: bool = True):
    """
    Visualize a 2D Voronoi diagram.
    
    Uses scipy.spatial.Voronoi which implements Fortune's algorithm.
    
    Parameters:
    -----------
    points : np.ndarray, shape (n, 2)
        Voronoi sites
    show_labels : bool
        Whether to label the sites
    """
    # Compute Voronoi diagram
    vor = Voronoi(points)
    
    # Create figure
    fig, ax = plt.subplots(figsize=(10, 10))
    
    # Plot the Voronoi diagram
    voronoi_plot_2d(vor, ax=ax, show_vertices=True, 
                   line_colors='blue', line_width=1.5,
                   point_size=10)
    
    # Add site labels
    if show_labels:
        for i, (x, y) in enumerate(points):
            ax.annotate(f'$s_{i}$', (x, y), fontsize=12,
                       xytext=(5, 5), textcoords='offset points')
    
    # Annotate Voronoi regions
    for i, region in enumerate(vor.regions):
        if not region or -1 in region:
            continue  # Skip empty or infinite regions
        polygon = [vor.vertices[j] for j in region]
        centroid = np.mean(polygon, axis=0)
        # Find which site owns this region
        for site_idx, point_region in enumerate(vor.point_region):
            if point_region == i:
                ax.annotate(f'$V(s_{site_idx})$', centroid, fontsize=10,
                           ha='center', va='center', color='gray')
                break
    
    ax.set_xlim(points[:, 0].min() - 1, points[:, 0].max() + 1)
    ax.set_ylim(points[:, 1].min() - 1, points[:, 1].max() + 1)
    ax.set_aspect('equal')
    ax.set_title('2D Voronoi Diagram')
    ax.set_xlabel('$x$')
    ax.set_ylabel('$y$')
    
    return fig, ax, vor
 
 
def voronoi_nn_search(vor: Voronoi, query: np.ndarray) -> int:
    """
    Naive nearest neighbor search using Voronoi structure.
    
    In practice, you'd use point location algorithms for O(log n) search.
    This just demonstrates the concept.
    """
    # Simple approach: check which cell contains the query
    # by computing distances to all sites
    distances = np.sqrt(np.sum((vor.points - query) ** 2, axis=1))
    return np.argmin(distances)
 
 
# Example usage
if __name__ == "__main__":
    np.random.seed(42)
    sites = np.random.rand(15, 2) * 10  # 15 random sites in [0, 10]^2
    fig, ax, vor = visualize_voronoi_2d(sites)
    
    # Query point
    query = np.array([5.0, 5.0])
    nearest = voronoi_nn_search(vor, query)
    ax.plot(*query, 'r*', markersize=15, label='Query')
    ax.plot(*sites[nearest], 'go', markersize=12, label=f'NN: $s_{nearest}$')
    ax.legend()
    plt.show()

Fortune's Algorithm Insight

Point Location for O(log n) Search

Once we have the Voronoi diagram, nearest neighbor search reduces to point location: given a query point $\mathbf{q}$, find which Voronoi cell contains $\mathbf{q}$.

Point Location Algorithms:

Several data structures enable logarithmic point location in 2D:

Method	Query Time	Space	Preprocessing
Trapezoidal map	$O(\log n)$	$O(n)$	$O(n \log n)$
Kirkpatrick hierarchy	$O(\log n)$	$O(n)$	$O(n)$
Slab decomposition	$O(\log n)$	$O(n^2)$	$O(n^2)$

Trapezoidal Map Approach:

Extend vertical rays up and down from each Voronoi vertex
This creates a grid of trapezoids, each belonging to one cell
Build a search tree over trapezoids
Query follows the tree to find the trapezoid containing $\mathbf{q}$

Kirkpatrick's Hierarchy (Optimal):

Triangulate the Voronoi diagram
Build a hierarchy of coarser triangulations
Each level has $O(n^{1/c})$ fewer triangles for constant $c > 1$
Total levels: $O(\log n)$
Query walks down the hierarchy, $O(1)$ work per level

point_location.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import numpy as np
from scipy.spatial import Delaunay
from typing import Optional
 
class VoronoiPointLocator:
    """
    Point location in 2D Voronoi diagram using Delaunay triangulation.
    
    Uses the fact that Delaunay triangulation is the dual of Voronoi.
    Point location in a triangulation can be done in O(log n) expected time.
    
    Note: This simplified implementation uses scipy's built-in 
    point location, which is O(log n) expected.
    """
    
    def __init__(self, sites: np.ndarray):
        """
        Build point location structure.
        
        Parameters:
        -----------
        sites : np.ndarray, shape (n, 2)
            Voronoi sites / Delaunay vertices
        """
        self.sites = sites
        self.n = len(sites)
        
        # Compute Delaunay triangulation (dual of Voronoi)
        self.delaunay = Delaunay(sites)
        
    def query(self, point: np.ndarray) -> int:
        """
        Find nearest neighbor of query point.
        
        Time: O(log n) expected
        
        Parameters:
        -----------
        point : np.ndarray, shape (2,)
            Query point
        
        Returns:
        --------
        int : Index of nearest site
        """
        # Find which Delaunay triangle contains the point
        # scipy uses a directed walk, O(√n) expected, O(n) worst
        simplex = self.delaunay.find_simplex(point)
        
        if simplex == -1:
            # Point outside convex hull; find closest site on hull
            hull_indices = np.unique(self.delaunay.convex_hull)
            distances = np.sqrt(np.sum(
                (self.sites[hull_indices] - point) ** 2, axis=1
            ))
            return hull_indices[np.argmin(distances)]
        
        # The query point's NN is one of the triangle's vertices
        triangle_vertices = self.delaunay.simplices[simplex]
        vertex_points = self.sites[triangle_vertices]
        distances = np.sqrt(np.sum((vertex_points - point) ** 2, axis=1))
        
        return triangle_vertices[np.argmin(distances)]
    
    def batch_query(self, points: np.ndarray) -> np.ndarray:
        """Find nearest neighbors for multiple query points."""
        return np.array([self.query(p) for p in points])
 
 
def demonstrate_point_location():
    """
    Demonstrate O(log n) nearest neighbor via point location.
    """
    import time
    
    np.random.seed(0)
    
    # Build locator for different sizes
    for n in [100, 1000, 10000, 100000]:
        sites = np.random.rand(n, 2) * 100
        
        # Build time
        start = time.perf_counter()
        locator = VoronoiPointLocator(sites)
        build_time = time.perf_counter() - start
        
        # Query time (average over 1000 queries)
        queries = np.random.rand(1000, 2) * 100
        start = time.perf_counter()
        results = locator.batch_query(queries)
        query_time = (time.perf_counter() - start) / 1000 * 1000  # ms
        
        print(f"n={n:6d}: build={build_time*1000:.1f}ms, "
              f"query={query_time:.4f}ms/query")
 
# Output (illustrative):
# n=   100: build=    0.5ms, query=0.0150ms/query
# n=  1000: build=    2.3ms, query=0.0180ms/query
# n= 10000: build=   25.0ms, query=0.0210ms/query  (only 1.4x higher!)
# n=100000: build=  280.0ms, query=0.0250ms/query  (barely changes)

The Power of Preprocessing

Delaunay Triangulation: The Dual Structure

The Voronoi diagram has a beautiful dual: the Delaunay triangulation. Understanding this duality provides deeper geometric insight.

Definition (Delaunay Triangulation):

The Delaunay triangulation $\text{Del}(S)$ of a point set $S$ connects sites whose Voronoi cells share an edge.

Duality between Voronoi and Delaunay:

Voronoi	Delaunay
Cell (region)	Vertex (site)
Edge	Edge (connecting sites with adjacent cells)
Vertex	Face (triangle)

Empty Circle Property:

A triangulation is Delaunay if and only if the circumcircle of each triangle contains no other sites. This is the defining characteristic.

Why This Matters for KNN:

Delaunay edges connect natural neighbors — Sites connected by a Delaunay edge are likely to be nearest neighbors for queries near their shared cell boundary.
K-nearest neighbors often form a path in Delaunay — Though not always, the k nearest sites tend to be Delaunay-connected.
Delaunay guides search — Graph-based NN methods (like HNSW) are inspired by navigating a graph of neighbors.

delaunay_dual.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
import numpy as np
from scipy.spatial import Delaunay, Voronoi
import matplotlib.pyplot as plt
 
def visualize_voronoi_delaunay_duality(sites: np.ndarray):
    """
    Visualize the duality between Voronoi and Delaunay structures.
    """
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Compute structures
    vor = Voronoi(sites)
    delaunay = Delaunay(sites)
    
    # Left: Voronoi only
    ax = axes[0]
    ax.set_title('Voronoi Diagram')
    for ridge in vor.ridge_vertices:
        if -1 in ridge:
            continue
        ax.plot(vor.vertices[ridge, 0], vor.vertices[ridge, 1], 'b-', lw=1.5)
    ax.scatter(*sites.T, c='red', s=50, zorder=5)
    ax.set_aspect('equal')
    
    # Middle: Delaunay only
    ax = axes[1]
    ax.set_title('Delaunay Triangulation')
    ax.triplot(sites[:, 0], sites[:, 1], delaunay.simplices, 'g-', lw=1.5)
    ax.scatter(*sites.T, c='red', s=50, zorder=5)
    ax.set_aspect('equal')
    
    # Right: Both overlaid (showing duality)
    ax = axes[2]
    ax.set_title('Duality: Voronoi (blue) + Delaunay (green)')
    for ridge in vor.ridge_vertices:
        if -1 in ridge:
            continue
        ax.plot(vor.vertices[ridge, 0], vor.vertices[ridge, 1], 'b-', lw=1, alpha=0.7)
    ax.triplot(sites[:, 0], sites[:, 1], delaunay.simplices, 'g-', lw=1, alpha=0.7)
    ax.scatter(*sites.T, c='red', s=50, zorder=5)
    
    # Draw duality connections
    for ridge_idx, (i, j) in enumerate(vor.ridge_points):
        # Each Voronoi edge (connecting vor vertices) corresponds to
        # a Delaunay edge (connecting sites i and j)
        mid_site = (sites[i] + sites[j]) / 2
        ridge_verts = vor.ridge_vertices[ridge_idx]
        if -1 not in ridge_verts:
            mid_vor = vor.vertices[ridge_verts].mean(axis=0)
            ax.plot([mid_site[0], mid_vor[0]], [mid_site[1], mid_vor[1]], 
                   'k--', lw=0.5, alpha=0.5)
    
    ax.set_aspect('equal')
    plt.tight_layout()
    return fig
 
 
def delaunay_neighbor_property(sites: np.ndarray) -> dict:
    """
    Demonstrate that Delaunay edges connect 'natural neighbors'.
    
    Returns statistics about NN relationships.
    """
    delaunay = Delaunay(sites)
    n = len(sites)
    
    # Get Delaunay neighbors for each site
    delaunay_neighbors = [set() for _ in range(n)]
    for simplex in delaunay.simplices:
        for i, a in enumerate(simplex):
            for b in simplex:
                if a != b:
                    delaunay_neighbors[a].add(b)
    
    # Check if 1-NN is always a Delaunay neighbor
    nn_is_delaunay = 0
    k_nn_in_delaunay = []  # For each site, how many of k-NN are Delaunay neighbors?
    
    k = 5
    for i in range(n):
        # Compute distances from site i to all others
        distances = np.sqrt(np.sum((sites - sites[i]) ** 2, axis=1))
        distances[i] = np.inf  # Exclude self
        
        # Get k nearest neighbors
        k_nearest = np.argsort(distances)[:k]
        
        # Is 1-NN a Delaunay neighbor?
        if k_nearest[0] in delaunay_neighbors[i]:
            nn_is_delaunay += 1
        
        # How many of k-NN are Delaunay neighbors?
        in_delaunay = sum(1 for j in k_nearest if j in delaunay_neighbors[i])
        k_nn_in_delaunay.append(in_delaunay)
    
    return {
        'n_sites': n,
        '1nn_in_delaunay_pct': 100 * nn_is_delaunay / n,
        f'{k}nn_in_delaunay_avg': np.mean(k_nn_in_delaunay),
        'avg_delaunay_degree': np.mean([len(s) for s in delaunay_neighbors])
    }
 
# Example output:
# {'n_sites': 100, 
#  '1nn_in_delaunay_pct': 100.0,  # 1-NN is ALWAYS a Delaunay neighbor!
#  '5nn_in_delaunay_avg': 4.2,     # Most k-NN are Delaunay neighbors
#  'avg_delaunay_degree': 5.8}    # Average ~6 Delaunay edges per site

1-NN is Always a Delaunay Neighbor

Voronoi Cells as KNN Decision Boundaries

There's a profound connection between Voronoi diagrams and K-nearest neighbor classification.

1-NN and Voronoi Cells:

For 1-NN classification, the decision boundary is exactly the Voronoi diagram of the training points. Every point in the Voronoi cell of training example $(\mathbf{x}_i, y_i)$ is classified as $y_i$.

K-NN and Higher-Order Voronoi:

For $K > 1$, we need the order-K Voronoi diagram, which partitions space based on the $K$ nearest sites (not just the nearest one).

Order-K Voronoi Cell:

$$V_K({s_{i_1}, \ldots, s_{i_K}}) = {\mathbf{x} : \text{the K nearest sites to } \mathbf{x} \text{ are } s_{i_1}, \ldots, s_{i_K}}$$

The decision boundary of K-NN is where the majority vote changes—which happens at order-K Voronoi edges where the composition of the K nearest neighbors changes.

knn_decision_boundaries.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
from sklearn.neighbors import KNeighborsClassifier
 
def visualize_knn_decision_boundary(X: np.ndarray, y: np.ndarray, k: int):
    """
    Visualize KNN decision boundaries and their relation to Voronoi cells.
    """
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    # Left: 1-NN (Voronoi regions)
    ax = axes[0]
    ax.set_title(f'1-NN Decision Boundaries (= Voronoi Diagram)')
    
    # Compute Voronoi
    vor = Voronoi(X)
    
    # Color each region by its class
    colors = plt.cm.Set1(np.linspace(0, 1, len(np.unique(y))))
    
    for point_idx, region_idx in enumerate(vor.point_region):
        region = vor.regions[region_idx]
        if not region or -1 in region:
            continue
        polygon = vor.vertices[region]
        ax.fill(polygon[:, 0], polygon[:, 1], 
               alpha=0.3, color=colors[y[point_idx]])
    
    # Plot points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], 
                  edgecolor='black', s=100, label=f'Class {c}')
    
    ax.legend()
    ax.set_xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5)
    ax.set_ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5)
    ax.set_aspect('equal')
    
    # Right: K-NN decision boundary
    ax = axes[1]
    ax.set_title(f'{k}-NN Decision Boundaries')
    
    # Create mesh grid for decision boundary visualization
    xx, yy = np.meshgrid(
        np.linspace(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5, 200),
        np.linspace(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5, 200)
    )
    
    # Train KNN and predict on grid
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot decision regions
    ax.contourf(xx, yy, Z, alpha=0.3, cmap='Set1')
    ax.contour(xx, yy, Z, colors='black', linewidths=0.5)
    
    # Plot points
    for c in np.unique(y):
        mask = y == c
        ax.scatter(X[mask, 0], X[mask, 1], c=[colors[c]], 
                  edgecolor='black', s=100, label=f'Class {c}')
    
    ax.legend()
    ax.set_xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5)
    ax.set_ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5)
    ax.set_aspect('equal')
    
    plt.tight_layout()
    return fig
 
 
def demonstrate_k_effect():
    """
    Show how decision boundaries smooth as K increases.
    """
    np.random.seed(42)
    
    # Generate two-class data
    n = 30
    X = np.vstack([
        np.random.randn(n, 2) + [2, 2],
        np.random.randn(n, 2) + [-2, -2]
    ])
    y = np.array([0] * n + [1] * n)
    
    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
    
    for idx, k in enumerate([1, 3, 7, 15]):
        ax = axes[idx]
        ax.set_title(f'K = {k}')
        
        # Create mesh
        xx, yy = np.meshgrid(
            np.linspace(-5, 5, 150),
            np.linspace(-5, 5, 150)
        )
        
        knn = KNeighborsClassifier(n_neighbors=k)
        knn.fit(X, y)
        Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
        
        ax.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
        ax.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolor='black', s=50)
        ax.set_xlim(-5, 5)
        ax.set_ylim(-5, 5)
        ax.set_aspect('equal')
    
    fig.suptitle('K-NN Decision Boundaries: From Jagged (K=1) to Smooth (K=15)')
    plt.tight_layout()
    return fig

Key Observations:

K=1 boundaries are sharp — They exactly follow the Voronoi edges, creating jagged decision regions.
Larger K smooths boundaries — As K increases, boundaries become smoother because more neighbors must change for the vote to flip.
Boundary complexity — The total length of decision boundaries decreases with K, reducing model complexity.
Noise sensitivity — K=1 is sensitive to noise (each noisy point creates its own region); larger K averages out individual outliers.

Voronoi Diagrams in Higher Dimensions

While Voronoi diagrams are elegant in 2D and 3D, they become impractical in higher dimensions. This explains why KD-trees, Ball trees, and approximate methods are necessary.

Combinatorial Complexity:

The number of faces of a Voronoi diagram in $d$ dimensions can be exponential:

Dimension	Vertices (worst case)	Total complexity
2	$O(n)$	$O(n)$
3	$O(n^2)$	$O(n^2)$
$d$	$O(n^{\lceil d/2 \rceil})$	$O(n^{\lceil d/2 \rceil})$

For $d = 10$ dimensions:

Worst case: $O(n^5)$ vertices
For $n = 1000$: potentially $10^{15}$ vertices!

Upper Bound Theorem (McMullen):

The maximum number of $k$-faces of a Voronoi diagram of $n$ points in $\mathbb{R}^d$ is:

$$O\left(n^{\lceil d/2 \rceil}\right)$$

This bound is tight—there exist point configurations that achieve it.

Voronoi Diagram Complexity Growth
Dimension	Vertices for n=100	Vertices for n=1000	Storage (bytes, approx)
2	~300	~3,000	~240 KB
3	~10,000	~1,000,000	~80 MB
4	~100,000	~100,000,000	~80 GB
5	~1,000,000	~10^10	~8 TB
10	~10^10	~10^15	Impossible

The Dimensional Barrier

What Voronoi Diagrams Teach Us:

Even though we can't compute explicit Voronoi diagrams in high dimensions, they provide important theoretical insights:

Query complexity lower bound — Voronoi cell geometry determines minimum query cost
Neighbor structure — Delaunay-like graphs (approximate nearest neighbor graphs) guide search in high dimensions
Cell shape intuition — In high dimensions, Voronoi cells become long and thin, with most volume far from the site (contributing to curse of dimensionality)
Decision boundary complexity — The number of Voronoi vertices bounds KNN decision complexity

Applications Beyond Nearest Neighbor Search

Voronoi diagrams appear throughout computer science and applied mathematics, far beyond nearest neighbor search.

Computer Graphics and Games:

Collision detection: Voronoi cells determine regions of influence
Procedural content generation: Voronoi noise creates natural-looking textures and terrain
Path planning: Voronoi edges are maximally distant from obstacles

Geographic Information Systems:

Service area planning: Each hospital, fire station, or store has a Voronoi cell of closest customers
Interpolation: Voronoi tessellation underlies natural neighbor interpolation
Urban planning: Analyze accessibility and coverage

Computational Biology:

Protein structure: Voronoi cells define atomic volumes
Cell biology: Voronoi patterns appear in cell arrangements
Ecology: Species territories often follow Voronoi-like patterns

Physics and Materials:

Crystal structure: Voronoi cells = Wigner-Seitz cells in crystallography
Foam structure: Bubbles in foam approximate Voronoi cells
Galaxy distribution: Cosmic web structure relates to Voronoi tessellations

Voronoi in Nature and Art

•Giraffe coat patterns — The spots approximate a Voronoi tessellation of pigment cells
•Dragonfly wings — The venation pattern follows Voronoi-like geometry
•Cracked mud — Drying mud cracks along Voronoi boundaries
•Honeycomb — While hexagonal, honeycomb is related to Voronoi tessellation of packed circles
•Soap bubbles — Bubble interfaces minimize surface area, creating Voronoi-like patterns

Lloyd's Algorithm

Summary and Next Steps

Key Takeaways

•Voronoi cells define nearest neighbor regions — The cell of site $s$ contains exactly the points for which $s$ is the closest site.
•In 2D/3D, Voronoi enables O(log n) search via point location data structures after O(n log n) preprocessing.
•Delaunay triangulation is dual to Voronoi — Delaunay edges connect sites with adjacent Voronoi cells; 1-NN is always a Delaunay neighbor.
•KNN decision boundaries relate to order-K Voronoi — Higher K smooths boundaries by averaging over more neighbors.
•Voronoi complexity is exponential in dimension — $O(n^{d/2})$ worst case, making explicit computation impractical beyond ~10D.
•Voronoi appears throughout science — From crystal structure to giraffe patterns, Voronoi tessellation is a fundamental spatial partition.

What's Next:

Page Complete