Machine LearningK-Nearest Neighbors

Efficient K-Nearest Neighbors

LevelIntermediate

Duration90 mins

TopicK-Nearest Neighbors

1 / 6

Brute Force Complexity

The Scalability Wall

K-Nearest Neighbors is conceptually simple: to classify a new point, find the K closest training examples and let them vote. But this simplicity conceals a computational reality that becomes brutally apparent at scale. Consider the challenges facing real-world systems:

Spotify's recommendation engine must find similar songs from a library of 100+ million tracks
Facebook's facial recognition searches through billions of face embeddings
Google Image Search compares query images against trillions of indexed photos
Drug discovery platforms match molecular fingerprints against databases of millions of compounds

If we naively compare each query against every stored example, these systems would take hours per query instead of milliseconds. Understanding exactly why brute force fails—and by how much—is essential before we can appreciate the elegance of efficient solutions.

What You Will Learn

By the end of this page, you will understand the exact computational complexity of brute-force KNN, be able to quantify its time and space requirements, recognize when brute force is acceptable versus prohibitive, and understand why efficient data structures are mathematically necessary at scale.

The Brute Force Algorithm

Let us begin by formalizing the brute-force nearest neighbor search algorithm. While the concept is intuitive, a precise understanding of each step is necessary to derive its complexity.

Problem Definition:

Given:

A training set $\mathcal{D} = {\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n}$ where each $\mathbf{x}_i \in \mathbb{R}^d$
A query point $\mathbf{q} \in \mathbb{R}^d$
A positive integer $K$
A distance function $d(\cdot, \cdot)$

Find: The $K$ points in $\mathcal{D}$ that minimize $d(\mathbf{x}_i, \mathbf{q})$.

The Naive Approach:

The most straightforward algorithm computes the distance from the query to every point in the dataset, then selects the $K$ smallest:

brute_force_knn.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import numpy as np
from typing import List, Tuple
import heapq
 
def brute_force_knn(
    training_data: np.ndarray,  # Shape: (n, d)
    query: np.ndarray,           # Shape: (d,)
    k: int,
    distance_fn = None
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Find K nearest neighbors using brute force search.
    
    Time Complexity: O(nd + n log k)
    Space Complexity: O(n) for distances, O(k) for result
    
    Parameters:
    -----------
    training_data : np.ndarray
        Training examples, shape (n_samples, n_features)
    query : np.ndarray
        Query point, shape (n_features,)
    k : int
        Number of neighbors to find
    distance_fn : callable, optional
        Distance function; defaults to Euclidean
    
    Returns:
    --------
    indices : np.ndarray
        Indices of k nearest neighbors
    distances : np.ndarray
        Distances to k nearest neighbors
    """
    n = training_data.shape[0]
    
    if distance_fn is None:
        # Default: Euclidean distance (vectorized)
        # For each point, compute ||x_i - q||
        distances = np.sqrt(np.sum((training_data - query) ** 2, axis=1))
    else:
        # Custom distance function
        distances = np.array([
            distance_fn(training_data[i], query) 
            for i in range(n)
        ])
    
    # Method 1: Sort all distances - O(n log n)
    # sorted_indices = np.argsort(distances)[:k]
    
    # Method 2: Partial sort using heap - O(n log k)
    # More efficient when k << n
    k_smallest_indices = np.argpartition(distances, k)[:k]
    
    # Sort the k results by distance
    sorted_k = k_smallest_indices[np.argsort(distances[k_smallest_indices])]
    
    return sorted_k, distances[sorted_k]
 
 
def brute_force_knn_explicit(
    training_data: np.ndarray,
    query: np.ndarray,
    k: int
) -> List[Tuple[float, int]]:
    """
    Explicit implementation showing each computational step.
    
    This version makes every operation visible for educational purposes.
    """
    n, d = training_data.shape
    
    # Step 1: Compute all distances - O(nd) operations
    distances = []
    for i in range(n):  # n iterations
        dist = 0.0
        for j in range(d):  # d operations per iteration
            diff = training_data[i, j] - query[j]
            dist += diff * diff  # squared difference
        dist = np.sqrt(dist)
        distances.append((dist, i))
    
    # Step 2: Find k smallest using a max-heap - O(n log k)
    # We maintain a max-heap of size k
    # For each new distance, if smaller than max, replace
    k_nearest = []
    
    for dist, idx in distances:
        if len(k_nearest) < k:
            # Heap not full, add directly
            heapq.heappush(k_nearest, (-dist, idx))
        elif dist < -k_nearest[0][0]:
            # New point closer than current farthest
            heapq.heapreplace(k_nearest, (-dist, idx))
    
    # Return sorted by distance (ascending)
    result = [(-d, i) for d, i in k_nearest]
    return sorted(result)

Implementation Insight

The explicit version above reveals that brute-force KNN consists of two distinct phases: (1) distance computation, which is inherently O(nd), and (2) selection of k smallest, which can be done in O(n log k) using a heap. For typical ML applications where k is small, the distance computation dominates.

Time Complexity Analysis

Let us rigorously analyze the time complexity of brute-force KNN. We'll break down each component and then combine them.

Distance Computation:

For each of the $n$ training points, we must compute its distance to the query point. Using Euclidean distance:

$$d(\mathbf{x}i, \mathbf{q}) = \sqrt{\sum{j=1}^{d} (x_{ij} - q_j)^2}$$

This requires:

$d$ subtractions
$d$ squaring operations
$d-1$ additions
1 square root

Each of these is $O(1)$, so one distance computation is $O(d)$. For $n$ points:

$$T_{\text{distance}} = O(nd)$$

Selection of K Nearest:

After computing all distances, we need the $K$ smallest. Several approaches exist:

Method	Complexity	When to Use
Full Sort	$O(n \log n)$	When $K \approx n$
Partial Sort (Quickselect)	$O(n)$ average	Standard choice
Max-Heap	$O(n \log K)$	When $K << n$
Priority Queue	$O(n \log K)$	Streaming data

Total Complexity:

Using the heap-based approach:

$$T_{\text{total}} = O(nd) + O(n \log K) = O(nd + n \log K)$$

Since typically $K << d$ and $\log K << d$ in machine learning applications:

$$\boxed{T_{\text{brute force}} = O(nd)}$$

Brute Force KNN Time Complexity by Component
Operation	Per Query	For M Queries	Dominant When
Distance Computation	$O(nd)$	$O(mnd)$	Always (typical)
Selection (Heap)	$O(n \log K)$	$O(mn \log K)$	$d$ very small
Selection (Full Sort)	$O(n \log n)$	$O(mn \log n)$	$K \approx n$
Total (typical)	$\mathbf{O(nd)}$	$\mathbf{O(mnd)}$	—

The Critical Observation:

The brute-force complexity is linear in both dataset size and dimensionality. This means:

Doubling the dataset size doubles the query time
Doubling the feature dimensions doubles the query time
Effects multiply: 10× more data with 10× more features = 100× slower

For a single query, this might seem acceptable. But consider a system handling 1000 queries per second against a dataset of 10 million points in 512 dimensions. That's:

$$1000 \times 10^7 \times 512 = 5.12 \times 10^{12} \text{ operations/second}$$

Even at 10 GFLOPs, that's 500,000 seconds—nearly 6 days of compute—per second of real time. Clearly impossible.

Space Complexity Analysis

Beyond time, memory consumption is a critical constraint for large-scale KNN systems. Let's analyze the space requirements carefully.

Training Data Storage:

The training set must be stored in memory for fast access:

$n$ points, each with $d$ features
Each feature typically 4 bytes (float32) or 8 bytes (float64)

$$S_{\text{data}} = O(nd) \text{ floats} = O(nd \cdot b) \text{ bytes}$$

where $b$ is bytes per float.

Distance Array:

During search, we compute distances to all $n$ points:

$$S_{\text{distances}} = O(n) \text{ floats}$$

Result Storage:

We store $K$ indices and distances:

$$S_{\text{result}} = O(K)$$

Total Space:

$$\boxed{S_{\text{brute force}} = O(nd)}$$

The result and distance arrays are negligible compared to the training data.

Memory Requirements for Brute Force KNN
Dataset Size (n)	Dimensions (d)	Float Type	Memory Required
1,000	100	float32	~400 KB
10,000	100	float32	~4 MB
100,000	512	float32	~200 MB
1,000,000	512	float32	~2 GB
10,000,000	512	float32	~20 GB
100,000,000	768	float32	~300 GB
1,000,000,000	768	float32	~3 TB

Memory Hierarchy Matters

When the dataset exceeds L3 cache (typically 8-64 MB), performance degrades dramatically due to main memory latency. Beyond RAM (typically 16-512 GB), you're forced to disk-based solutions that are 10,000× slower. Efficient data structures can sometimes fit in cache what brute force cannot fit in RAM.

Memory Bandwidth Bottleneck:

Modern CPUs can perform floating-point operations faster than they can fetch data from memory. Consider:

CPU peak: ~100 GFLOPs
Memory bandwidth: ~50 GB/s
For float32: 50 GB/s ÷ 4 bytes = 12.5 billion floats/second

With $d = 512$, we can only process: $$\frac{12.5 \times 10^9}{512} \approx 24 \text{ million points/second}$$

This is a fundamental limit that no amount of algorithmic optimization within brute force can overcome. We need data structures that reduce the number of distance computations, not just make them faster.

Concrete Performance Numbers

Abstract Big-O analysis is valuable, but let's ground our understanding with actual measurements. The following benchmarks were run on a typical workstation (Intel i7, 64 GB RAM) using optimized NumPy operations.

benchmark_brute_force.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import numpy as np
import time
from typing import Dict, List
 
def benchmark_brute_force(
    n_samples_list: List[int],
    n_features_list: List[int],
    k: int = 10,
    n_queries: int = 100
) -> Dict:
    """
    Benchmark brute-force KNN across dataset sizes and dimensions.
    
    Returns timing statistics for analysis.
    """
    results = []
    
    for n in n_samples_list:
        for d in n_features_list:
            # Generate random data
            X = np.random.randn(n, d).astype(np.float32)
            queries = np.random.randn(n_queries, d).astype(np.float32)
            
            # Warm-up
            _ = np.linalg.norm(X - queries[0], axis=1)
            
            # Timed run
            times = []
            for q in queries:
                start = time.perf_counter()
                
                # Distance computation (dominates)
                distances = np.linalg.norm(X - q, axis=1)
                
                # Selection
                k_nearest = np.argpartition(distances, k)[:k]
                
                elapsed = time.perf_counter() - start
                times.append(elapsed)
            
            avg_time = np.mean(times) * 1000  # ms
            std_time = np.std(times) * 1000
            
            results.append({
                'n_samples': n,
                'n_features': d,
                'avg_time_ms': avg_time,
                'std_time_ms': std_time,
                'throughput_qps': 1000 / avg_time  # queries per second
            })
            
            print(f"n={n:>10}, d={d:>4}: {avg_time:>8.2f}ms ± {std_time:>5.2f}ms "
                  f"({1000/avg_time:>6.1f} q/s)")
    
    return results
 
# Example benchmark output (representative results):
# n=     1,000, d= 128:     0.05ms ±  0.01ms (20000.0 q/s)
# n=    10,000, d= 128:     0.48ms ±  0.02ms ( 2083.3 q/s)
# n=   100,000, d= 128:     4.82ms ±  0.15ms (  207.5 q/s)
# n= 1,000,000, d= 128:    52.31ms ±  2.10ms (   19.1 q/s)
# n=10,000,000, d= 128:   531.42ms ±  8.50ms (    1.9 q/s)
# n= 1,000,000, d= 512:   198.45ms ±  5.20ms (    5.0 q/s)
# n= 1,000,000, d=1024:   412.80ms ± 12.30ms (    2.4 q/s)

Representative Brute Force Query Times
n (samples)	d (features)	Query Time	Throughput	Practical Assessment
1,000	128	0.05 ms	20,000 q/s	✅ Real-time friendly
10,000	128	0.5 ms	2,000 q/s	✅ Fast enough for many apps
100,000	128	5 ms	200 q/s	⚠️ Noticeable latency
1,000,000	128	50 ms	20 q/s	❌ Too slow for interactive
10,000,000	128	500 ms	2 q/s	❌ Unusable for real-time
1,000,000	512	200 ms	5 q/s	❌ Unusable for real-time

The 100K Threshold

A useful rule of thumb: brute force remains practical up to about 100,000 points for low-dimensional data (d < 20) or about 10,000 points for high-dimensional data (d > 100). Beyond these thresholds, you need efficient data structures.

When Brute Force Is Actually Fine

Despite its limitations, brute force remains the correct choice in many scenarios. Understanding when simplicity wins over algorithmic sophistication is an important engineering skill.

Scenarios Where Brute Force Excels:

When Brute Force Is Appropriate

•Small Datasets (n < 10,000) — Index construction overhead may exceed the search savings. Building a KD-tree takes $O(n \log n)$ time; if you're only doing a few queries, brute force wins.
•Very High Dimensions (d > 100) — The curse of dimensionality degrades tree-based methods to near-linear behavior anyway. We'll explore this in Module 3.
•Batch Processing — When queries can be batched, matrix multiplication enables massive parallelization. Computing distances from $m$ queries to $n$ points is $O(mnd)$ but highly parallelizable.
•Exact Results Required — Approximate methods (LSH, HNSW) sacrifice accuracy for speed. If you need guaranteed exact $K$ nearest neighbors, brute force is your only option.
•Frequent Data Updates — Trees require expensive rebalancing. If your dataset changes frequently, maintaining an index may cost more than brute-force search.
•Simple Implementation Required — In embedded systems, edge devices, or environments with limited libraries, brute force's simplicity is valuable.

batched_brute_force.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
 
def batched_knn(X: np.ndarray, Q: np.ndarray, k: int) -> np.ndarray:
    """
    Batched brute-force KNN using matrix multiplication.
    
    Exploits the identity:
    ||x - q||² = ||x||² + ||q||² - 2*x·q
    
    This allows computing all pairwise distances via BLAS-optimized
    matrix operations, which can be 10-100x faster than loops.
    
    Parameters:
    -----------
    X : np.ndarray, shape (n, d)
        Training data
    Q : np.ndarray, shape (m, d)  
        Query points
    k : int
        Number of neighbors
        
    Returns:
    --------
    indices : np.ndarray, shape (m, k)
        Indices of k nearest neighbors for each query
    """
    # Precompute squared norms
    X_sq = np.sum(X ** 2, axis=1, keepdims=True)  # (n, 1)
    Q_sq = np.sum(Q ** 2, axis=1, keepdims=True)  # (m, 1)
    
    # Compute squared Euclidean distances via matrix multiplication
    # D[i,j] = ||Q[i] - X[j]||² = ||Q[i]||² + ||X[j]||² - 2*Q[i]·X[j]
    #                           = Q_sq[i] + X_sq[j].T - 2*(Q @ X.T)[i,j]
    
    # This is a (m, n) matrix of squared distances
    D_sq = Q_sq + X_sq.T - 2 * (Q @ X.T)
    
    # Numerical stability: clamp small negative values from floating point error
    D_sq = np.maximum(D_sq, 0)
    
    # Find k nearest for each query
    # argpartition is O(mn + mk log k) vs O(mn log n) for full sort
    indices = np.argpartition(D_sq, k, axis=1)[:, :k]
    
    # Sort the k nearest by distance
    batch_indices = np.arange(Q.shape[0])[:, None]
    sorted_order = np.argsort(D_sq[batch_indices, indices], axis=1)
    indices = indices[batch_indices, sorted_order]
    
    return indices
 
 
# Performance comparison: batch vs loop
# Dataset: n=100,000, d=128, m=1000 queries
#
# Loop (per query):    ~500ms total
# Batched:             ~80ms total
# Speedup:             6.25x (and scales better with GPU)

GPU Acceleration

The batched approach is perfectly suited for GPUs. Libraries like FAISS (Facebook AI Similarity Search) use this pattern to achieve >1 billion distance computations per second on modern GPUs, making brute force viable for surprisingly large datasets when batch sizes are large enough.

The Cost of Brute Force at Scale

To appreciate why efficient algorithms matter, let's quantify the economic and environmental cost of brute-force search at production scale.

Case Study: Image Similarity Service

Imagine building a reverse image search for an e-commerce platform:

Database: 50 million product images
Embedding dimension: 512 (typical for modern vision models)
Query volume: 1 million searches per day (~12 queries/second average, 100/second peak)
Latency requirement: < 200ms p99

Brute Force Cost Analysis
Metric	Calculation	Result
Distance computations/query	50M × 512	25.6 billion FLOPs
Time at 10 GFLOPS (CPU)	25.6B / 10B	2.56 seconds/query
CPUs needed for 12 q/s	2.56 × 12	~31 CPU cores
CPUs for peak (100 q/s)	2.56 × 100	~256 CPU cores
Cloud cost (c5.18xlarge)	4 instances × 24/7	~$8,000/month
Memory required	50M × 512 × 4B	~100 GB RAM
Power consumption	~500W × 4 instances	~2 kW continuous

With Efficient Indexing (Preview):

Using an HNSW index (covered in later modules), the same system might achieve:

10 million queries/second on a single GPU
~$200/month cloud cost
Full latency budget met with room to spare

That's a 40× cost reduction and 10× latency improvement. This isn't optimization—it's the difference between a viable product and an impossible one.

The Startup Killer

Many startups discover too late that their brute-force prototype can't scale without a complete architectural overhaul. Understanding these limits from the start lets you make informed decisions about when to invest in efficient indexing.

Theoretical Lower Bounds

A natural question arises: can we do better than $O(nd)$ for exact nearest neighbor search? The answer involves subtle complexity theory.

Output-Sensitive Lower Bound:

Any algorithm must at minimum:

Read the query: $O(d)$
Output $K$ results: $O(K)$

So the theoretical minimum is $\Omega(d + K)$.

The Gap:

Between $\Omega(d + K)$ and $O(nd)$ lies a vast space. Can we achieve, say, $O(\log n \cdot d)$?

Answer: It Depends on Dimension

For low dimensions ($d \leq 3$), Voronoi diagrams enable $O(\log n)$ query time with $O(n)$ space.

For higher dimensions, we have a fundamental tradeoff:

$$\text{Query Time} \times \text{Space}^{1/d} = \Omega(n)$$

This means to achieve $O(\log n)$ query time in high dimensions, we'd need:

$$\text{Space} = \Omega(n^d)$$

Exponential in dimension—completely impractical.

Best Known Exact NN Bounds
Dimension	Query Time	Space	Method
d = 1	$O(\log n)$	$O(n)$	Binary search
d = 2	$O(\log n)$	$O(n)$	Voronoi diagram
d = 3	$O(\log n)$	$O(n)$	Voronoi diagram
d = 4..10	$O(n^{1-1/d} \log n)$	$O(n)$	KD-tree variants
d > 10	$O(n^{1-\epsilon})$	$O(n^{1+\epsilon})$	Cover trees, BBD trees
d >> log n	$O(nd)$ (effectively)	$O(nd)$	Brute force (unavoidable)

The Curse of Dimensionality

This theoretical barrier is one manifestation of the 'curse of dimensionality' that we explore in Module 3. In high dimensions, the very concept of 'nearest' becomes problematic as all points tend toward equidistance. Efficient exact search becomes impossible not due to algorithmic limitations but due to the geometry of high-dimensional spaces.

Approximate Methods Break the Barrier:

The key insight is that for most applications, we don't need exact nearest neighbors—we need good enough neighbors. By allowing a small error tolerance:

$$d(\mathbf{x}{\text{returned}}, \mathbf{q}) \leq (1 + \epsilon) \cdot d(\mathbf{x}{\text{true nearest}}, \mathbf{q})$$

We can achieve:

$O(d \log n)$ query time
$O(nd)$ space
For any dimension

This is the foundation of Locality-Sensitive Hashing (LSH) and graph-based methods like HNSW, which we'll study in later pages.

Toward Efficient Nearest Neighbor Search

We've established that brute-force KNN, while conceptually simple, hits fundamental scalability walls. The key insight is that we don't need to compute distances to all points—we need to quickly identify a subset of candidates likely to contain the nearest neighbors.

This is the core idea behind every efficient nearest neighbor method:

Partition → Prune → Search

Partition: Organize data so that nearby points tend to be grouped together
Prune: Use the query to identify which groups might contain near neighbors
Search: Only compute distances within promising groups

Different methods realize this pattern differently:

Efficient NN Methods (Coming Up)

•KD-Trees (Next Page): Recursively partition space using axis-aligned hyperplanes. Achieve $O(\log n)$ search in low dimensions.
•Ball Trees: Partition using hyperspheres instead of hyperplanes. Better for certain distance metrics and higher dimensions.
•Voronoi Diagrams: The theoretical optimum for 2D/3D exact search. Provides geometric insight into KNN.
•Locality-Sensitive Hashing: Hash similar points to the same bucket with high probability. Enables sublinear approximate search.
•Approximate Nearest Neighbors: Trade small accuracy loss for dramatic speedups. Essential for production-scale systems.

Page Complete

You now understand why brute-force KNN is O(nd), where it's acceptable, and why we need better methods. This foundation is essential for appreciating the elegance of KD-trees, Ball trees, and hashing methods that transform nearest neighbor search from linear to logarithmic complexity. Next, we explore how KD-trees achieve this through clever spatial partitioning.

1 / 6

Loading learning content...

Machine LearningK-Nearest Neighbors

Efficient K-Nearest Neighbors

LevelIntermediate

Duration90 mins

TopicK-Nearest Neighbors

1 / 6

Brute Force Complexity

The Scalability Wall

Spotify's recommendation engine must find similar songs from a library of 100+ million tracks
Facebook's facial recognition searches through billions of face embeddings
Google Image Search compares query images against trillions of indexed photos
Drug discovery platforms match molecular fingerprints against databases of millions of compounds

What You Will Learn

The Brute Force Algorithm

Let us begin by formalizing the brute-force nearest neighbor search algorithm. While the concept is intuitive, a precise understanding of each step is necessary to derive its complexity.

Problem Definition:

Given:

A training set $\mathcal{D} = {\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n}$ where each $\mathbf{x}_i \in \mathbb{R}^d$
A query point $\mathbf{q} \in \mathbb{R}^d$
A positive integer $K$
A distance function $d(\cdot, \cdot)$

Find: The $K$ points in $\mathcal{D}$ that minimize $d(\mathbf{x}_i, \mathbf{q})$.

The Naive Approach:

The most straightforward algorithm computes the distance from the query to every point in the dataset, then selects the $K$ smallest:

brute_force_knn.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import numpy as np
from typing import List, Tuple
import heapq
 
def brute_force_knn(
    training_data: np.ndarray,  # Shape: (n, d)
    query: np.ndarray,           # Shape: (d,)
    k: int,
    distance_fn = None
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Find K nearest neighbors using brute force search.
    
    Time Complexity: O(nd + n log k)
    Space Complexity: O(n) for distances, O(k) for result
    
    Parameters:
    -----------
    training_data : np.ndarray
        Training examples, shape (n_samples, n_features)
    query : np.ndarray
        Query point, shape (n_features,)
    k : int
        Number of neighbors to find
    distance_fn : callable, optional
        Distance function; defaults to Euclidean
    
    Returns:
    --------
    indices : np.ndarray
        Indices of k nearest neighbors
    distances : np.ndarray
        Distances to k nearest neighbors
    """
    n = training_data.shape[0]
    
    if distance_fn is None:
        # Default: Euclidean distance (vectorized)
        # For each point, compute ||x_i - q||
        distances = np.sqrt(np.sum((training_data - query) ** 2, axis=1))
    else:
        # Custom distance function
        distances = np.array([
            distance_fn(training_data[i], query) 
            for i in range(n)
        ])
    
    # Method 1: Sort all distances - O(n log n)
    # sorted_indices = np.argsort(distances)[:k]
    
    # Method 2: Partial sort using heap - O(n log k)
    # More efficient when k << n
    k_smallest_indices = np.argpartition(distances, k)[:k]
    
    # Sort the k results by distance
    sorted_k = k_smallest_indices[np.argsort(distances[k_smallest_indices])]
    
    return sorted_k, distances[sorted_k]
 
 
def brute_force_knn_explicit(
    training_data: np.ndarray,
    query: np.ndarray,
    k: int
) -> List[Tuple[float, int]]:
    """
    Explicit implementation showing each computational step.
    
    This version makes every operation visible for educational purposes.
    """
    n, d = training_data.shape
    
    # Step 1: Compute all distances - O(nd) operations
    distances = []
    for i in range(n):  # n iterations
        dist = 0.0
        for j in range(d):  # d operations per iteration
            diff = training_data[i, j] - query[j]
            dist += diff * diff  # squared difference
        dist = np.sqrt(dist)
        distances.append((dist, i))
    
    # Step 2: Find k smallest using a max-heap - O(n log k)
    # We maintain a max-heap of size k
    # For each new distance, if smaller than max, replace
    k_nearest = []
    
    for dist, idx in distances:
        if len(k_nearest) < k:
            # Heap not full, add directly
            heapq.heappush(k_nearest, (-dist, idx))
        elif dist < -k_nearest[0][0]:
            # New point closer than current farthest
            heapq.heapreplace(k_nearest, (-dist, idx))
    
    # Return sorted by distance (ascending)
    result = [(-d, i) for d, i in k_nearest]
    return sorted(result)

Implementation Insight

Time Complexity Analysis

Let us rigorously analyze the time complexity of brute-force KNN. We'll break down each component and then combine them.

Distance Computation:

For each of the $n$ training points, we must compute its distance to the query point. Using Euclidean distance:

$$d(\mathbf{x}i, \mathbf{q}) = \sqrt{\sum{j=1}^{d} (x_{ij} - q_j)^2}$$

This requires:

$d$ subtractions
$d$ squaring operations
$d-1$ additions
1 square root

Each of these is $O(1)$, so one distance computation is $O(d)$. For $n$ points:

$$T_{\text{distance}} = O(nd)$$

Selection of K Nearest:

After computing all distances, we need the $K$ smallest. Several approaches exist:

Method	Complexity	When to Use
Full Sort	$O(n \log n)$	When $K \approx n$
Partial Sort (Quickselect)	$O(n)$ average	Standard choice
Max-Heap	$O(n \log K)$	When $K << n$
Priority Queue	$O(n \log K)$	Streaming data

Total Complexity:

Using the heap-based approach:

$$T_{\text{total}} = O(nd) + O(n \log K) = O(nd + n \log K)$$

Since typically $K << d$ and $\log K << d$ in machine learning applications:

$$\boxed{T_{\text{brute force}} = O(nd)}$$

Brute Force KNN Time Complexity by Component
Operation	Per Query	For M Queries	Dominant When
Distance Computation	$O(nd)$	$O(mnd)$	Always (typical)
Selection (Heap)	$O(n \log K)$	$O(mn \log K)$	$d$ very small
Selection (Full Sort)	$O(n \log n)$	$O(mn \log n)$	$K \approx n$
Total (typical)	$\mathbf{O(nd)}$	$\mathbf{O(mnd)}$	—

The Critical Observation:

The brute-force complexity is linear in both dataset size and dimensionality. This means:

Doubling the dataset size doubles the query time
Doubling the feature dimensions doubles the query time
Effects multiply: 10× more data with 10× more features = 100× slower

For a single query, this might seem acceptable. But consider a system handling 1000 queries per second against a dataset of 10 million points in 512 dimensions. That's:

$$1000 \times 10^7 \times 512 = 5.12 \times 10^{12} \text{ operations/second}$$

Even at 10 GFLOPs, that's 500,000 seconds—nearly 6 days of compute—per second of real time. Clearly impossible.

Space Complexity Analysis

Beyond time, memory consumption is a critical constraint for large-scale KNN systems. Let's analyze the space requirements carefully.

Training Data Storage:

The training set must be stored in memory for fast access:

$n$ points, each with $d$ features
Each feature typically 4 bytes (float32) or 8 bytes (float64)

$$S_{\text{data}} = O(nd) \text{ floats} = O(nd \cdot b) \text{ bytes}$$

where $b$ is bytes per float.

Distance Array:

During search, we compute distances to all $n$ points:

$$S_{\text{distances}} = O(n) \text{ floats}$$

Result Storage:

We store $K$ indices and distances:

$$S_{\text{result}} = O(K)$$

Total Space:

$$\boxed{S_{\text{brute force}} = O(nd)}$$

The result and distance arrays are negligible compared to the training data.

Memory Requirements for Brute Force KNN
Dataset Size (n)	Dimensions (d)	Float Type	Memory Required
1,000	100	float32	~400 KB
10,000	100	float32	~4 MB
100,000	512	float32	~200 MB
1,000,000	512	float32	~2 GB
10,000,000	512	float32	~20 GB
100,000,000	768	float32	~300 GB
1,000,000,000	768	float32	~3 TB

Memory Hierarchy Matters

Memory Bandwidth Bottleneck:

Modern CPUs can perform floating-point operations faster than they can fetch data from memory. Consider:

CPU peak: ~100 GFLOPs
Memory bandwidth: ~50 GB/s
For float32: 50 GB/s ÷ 4 bytes = 12.5 billion floats/second

With $d = 512$, we can only process: $$\frac{12.5 \times 10^9}{512} \approx 24 \text{ million points/second}$$

Concrete Performance Numbers

benchmark_brute_force.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import numpy as np
import time
from typing import Dict, List
 
def benchmark_brute_force(
    n_samples_list: List[int],
    n_features_list: List[int],
    k: int = 10,
    n_queries: int = 100
) -> Dict:
    """
    Benchmark brute-force KNN across dataset sizes and dimensions.
    
    Returns timing statistics for analysis.
    """
    results = []
    
    for n in n_samples_list:
        for d in n_features_list:
            # Generate random data
            X = np.random.randn(n, d).astype(np.float32)
            queries = np.random.randn(n_queries, d).astype(np.float32)
            
            # Warm-up
            _ = np.linalg.norm(X - queries[0], axis=1)
            
            # Timed run
            times = []
            for q in queries:
                start = time.perf_counter()
                
                # Distance computation (dominates)
                distances = np.linalg.norm(X - q, axis=1)
                
                # Selection
                k_nearest = np.argpartition(distances, k)[:k]
                
                elapsed = time.perf_counter() - start
                times.append(elapsed)
            
            avg_time = np.mean(times) * 1000  # ms
            std_time = np.std(times) * 1000
            
            results.append({
                'n_samples': n,
                'n_features': d,
                'avg_time_ms': avg_time,
                'std_time_ms': std_time,
                'throughput_qps': 1000 / avg_time  # queries per second
            })
            
            print(f"n={n:>10}, d={d:>4}: {avg_time:>8.2f}ms ± {std_time:>5.2f}ms "
                  f"({1000/avg_time:>6.1f} q/s)")
    
    return results
 
# Example benchmark output (representative results):
# n=     1,000, d= 128:     0.05ms ±  0.01ms (20000.0 q/s)
# n=    10,000, d= 128:     0.48ms ±  0.02ms ( 2083.3 q/s)
# n=   100,000, d= 128:     4.82ms ±  0.15ms (  207.5 q/s)
# n= 1,000,000, d= 128:    52.31ms ±  2.10ms (   19.1 q/s)
# n=10,000,000, d= 128:   531.42ms ±  8.50ms (    1.9 q/s)
# n= 1,000,000, d= 512:   198.45ms ±  5.20ms (    5.0 q/s)
# n= 1,000,000, d=1024:   412.80ms ± 12.30ms (    2.4 q/s)

Representative Brute Force Query Times
n (samples)	d (features)	Query Time	Throughput	Practical Assessment
1,000	128	0.05 ms	20,000 q/s	✅ Real-time friendly
10,000	128	0.5 ms	2,000 q/s	✅ Fast enough for many apps
100,000	128	5 ms	200 q/s	⚠️ Noticeable latency
1,000,000	128	50 ms	20 q/s	❌ Too slow for interactive
10,000,000	128	500 ms	2 q/s	❌ Unusable for real-time
1,000,000	512	200 ms	5 q/s	❌ Unusable for real-time

The 100K Threshold

When Brute Force Is Actually Fine

Despite its limitations, brute force remains the correct choice in many scenarios. Understanding when simplicity wins over algorithmic sophistication is an important engineering skill.

Scenarios Where Brute Force Excels:

When Brute Force Is Appropriate

•Small Datasets (n < 10,000) — Index construction overhead may exceed the search savings. Building a KD-tree takes $O(n \log n)$ time; if you're only doing a few queries, brute force wins.
•Very High Dimensions (d > 100) — The curse of dimensionality degrades tree-based methods to near-linear behavior anyway. We'll explore this in Module 3.
•Batch Processing — When queries can be batched, matrix multiplication enables massive parallelization. Computing distances from $m$ queries to $n$ points is $O(mnd)$ but highly parallelizable.
•Exact Results Required — Approximate methods (LSH, HNSW) sacrifice accuracy for speed. If you need guaranteed exact $K$ nearest neighbors, brute force is your only option.
•Frequent Data Updates — Trees require expensive rebalancing. If your dataset changes frequently, maintaining an index may cost more than brute-force search.
•Simple Implementation Required — In embedded systems, edge devices, or environments with limited libraries, brute force's simplicity is valuable.

batched_brute_force.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
 
def batched_knn(X: np.ndarray, Q: np.ndarray, k: int) -> np.ndarray:
    """
    Batched brute-force KNN using matrix multiplication.
    
    Exploits the identity:
    ||x - q||² = ||x||² + ||q||² - 2*x·q
    
    This allows computing all pairwise distances via BLAS-optimized
    matrix operations, which can be 10-100x faster than loops.
    
    Parameters:
    -----------
    X : np.ndarray, shape (n, d)
        Training data
    Q : np.ndarray, shape (m, d)  
        Query points
    k : int
        Number of neighbors
        
    Returns:
    --------
    indices : np.ndarray, shape (m, k)
        Indices of k nearest neighbors for each query
    """
    # Precompute squared norms
    X_sq = np.sum(X ** 2, axis=1, keepdims=True)  # (n, 1)
    Q_sq = np.sum(Q ** 2, axis=1, keepdims=True)  # (m, 1)
    
    # Compute squared Euclidean distances via matrix multiplication
    # D[i,j] = ||Q[i] - X[j]||² = ||Q[i]||² + ||X[j]||² - 2*Q[i]·X[j]
    #                           = Q_sq[i] + X_sq[j].T - 2*(Q @ X.T)[i,j]
    
    # This is a (m, n) matrix of squared distances
    D_sq = Q_sq + X_sq.T - 2 * (Q @ X.T)
    
    # Numerical stability: clamp small negative values from floating point error
    D_sq = np.maximum(D_sq, 0)
    
    # Find k nearest for each query
    # argpartition is O(mn + mk log k) vs O(mn log n) for full sort
    indices = np.argpartition(D_sq, k, axis=1)[:, :k]
    
    # Sort the k nearest by distance
    batch_indices = np.arange(Q.shape[0])[:, None]
    sorted_order = np.argsort(D_sq[batch_indices, indices], axis=1)
    indices = indices[batch_indices, sorted_order]
    
    return indices
 
 
# Performance comparison: batch vs loop
# Dataset: n=100,000, d=128, m=1000 queries
#
# Loop (per query):    ~500ms total
# Batched:             ~80ms total
# Speedup:             6.25x (and scales better with GPU)

GPU Acceleration

The Cost of Brute Force at Scale

To appreciate why efficient algorithms matter, let's quantify the economic and environmental cost of brute-force search at production scale.

Case Study: Image Similarity Service

Imagine building a reverse image search for an e-commerce platform:

Database: 50 million product images
Embedding dimension: 512 (typical for modern vision models)
Query volume: 1 million searches per day (~12 queries/second average, 100/second peak)
Latency requirement: < 200ms p99

Brute Force Cost Analysis
Metric	Calculation	Result
Distance computations/query	50M × 512	25.6 billion FLOPs
Time at 10 GFLOPS (CPU)	25.6B / 10B	2.56 seconds/query
CPUs needed for 12 q/s	2.56 × 12	~31 CPU cores
CPUs for peak (100 q/s)	2.56 × 100	~256 CPU cores
Cloud cost (c5.18xlarge)	4 instances × 24/7	~$8,000/month
Memory required	50M × 512 × 4B	~100 GB RAM
Power consumption	~500W × 4 instances	~2 kW continuous

With Efficient Indexing (Preview):

Using an HNSW index (covered in later modules), the same system might achieve:

10 million queries/second on a single GPU
~$200/month cloud cost
Full latency budget met with room to spare

That's a 40× cost reduction and 10× latency improvement. This isn't optimization—it's the difference between a viable product and an impossible one.

The Startup Killer

Theoretical Lower Bounds

A natural question arises: can we do better than $O(nd)$ for exact nearest neighbor search? The answer involves subtle complexity theory.

Output-Sensitive Lower Bound:

Any algorithm must at minimum:

Read the query: $O(d)$
Output $K$ results: $O(K)$

So the theoretical minimum is $\Omega(d + K)$.

The Gap:

Between $\Omega(d + K)$ and $O(nd)$ lies a vast space. Can we achieve, say, $O(\log n \cdot d)$?

Answer: It Depends on Dimension

For low dimensions ($d \leq 3$), Voronoi diagrams enable $O(\log n)$ query time with $O(n)$ space.

For higher dimensions, we have a fundamental tradeoff:

$$\text{Query Time} \times \text{Space}^{1/d} = \Omega(n)$$

This means to achieve $O(\log n)$ query time in high dimensions, we'd need:

$$\text{Space} = \Omega(n^d)$$

Exponential in dimension—completely impractical.

Best Known Exact NN Bounds
Dimension	Query Time	Space	Method
d = 1	$O(\log n)$	$O(n)$	Binary search
d = 2	$O(\log n)$	$O(n)$	Voronoi diagram
d = 3	$O(\log n)$	$O(n)$	Voronoi diagram
d = 4..10	$O(n^{1-1/d} \log n)$	$O(n)$	KD-tree variants
d > 10	$O(n^{1-\epsilon})$	$O(n^{1+\epsilon})$	Cover trees, BBD trees
d >> log n	$O(nd)$ (effectively)	$O(nd)$	Brute force (unavoidable)

The Curse of Dimensionality

Approximate Methods Break the Barrier:

The key insight is that for most applications, we don't need exact nearest neighbors—we need good enough neighbors. By allowing a small error tolerance:

$$d(\mathbf{x}{\text{returned}}, \mathbf{q}) \leq (1 + \epsilon) \cdot d(\mathbf{x}{\text{true nearest}}, \mathbf{q})$$

We can achieve:

$O(d \log n)$ query time
$O(nd)$ space
For any dimension

This is the foundation of Locality-Sensitive Hashing (LSH) and graph-based methods like HNSW, which we'll study in later pages.

Toward Efficient Nearest Neighbor Search

This is the core idea behind every efficient nearest neighbor method:

Partition → Prune → Search

Partition: Organize data so that nearby points tend to be grouped together
Prune: Use the query to identify which groups might contain near neighbors
Search: Only compute distances within promising groups

Different methods realize this pattern differently:

Efficient NN Methods (Coming Up)

•KD-Trees (Next Page): Recursively partition space using axis-aligned hyperplanes. Achieve $O(\log n)$ search in low dimensions.
•Ball Trees: Partition using hyperspheres instead of hyperplanes. Better for certain distance metrics and higher dimensions.
•Voronoi Diagrams: The theoretical optimum for 2D/3D exact search. Provides geometric insight into KNN.
•Locality-Sensitive Hashing: Hash similar points to the same bucket with high probability. Enables sublinear approximate search.
•Approximate Nearest Neighbors: Trade small accuracy loss for dramatic speedups. Essential for production-scale systems.

Page Complete

1 / 6