Loading content...
We've explored the philosophical foundations, computational properties, hyperparameter selection, and aggregation schemes of K-Nearest Neighbors. Now it's time to synthesize everything into complete, implementable algorithms.
This page provides production-quality pseudocode and implementations for:
These implementations are designed to be correct, clear, and complete—suitable for understanding the algorithm deeply and as a foundation for production systems.
By the end of this page, you will have complete pseudocode for KNN classification and regression, understand the step-by-step algorithm flow, have working Python implementations ready to use or adapt, know the computational complexity of each step, and understand the extension points for advanced features.
Every KNN implementation follows the same high-level structure, whether for classification or regression:
ALGORITHM KNN:
1. TRAINING (Store):
- Store training features X_train
- Store training labels/targets y_train
- Optionally: build acceleration structure (KD-tree, ball tree)
- Optionally: precompute and cache distance information
2. PREDICTION (for each query point x_q):
a. DISTANCE COMPUTATION:
- Compute distance from x_q to all training points
- d_i = distance(x_q, x_i) for i = 1, ..., n
b. NEIGHBOR SELECTION:
- Find indices of k smallest distances
- neighbors = argsort(distances)[:k] // or use partial sort
c. AGGREGATION:
- Gather labels/values of selected neighbors
- Apply weighting scheme
- Compute final prediction (vote for classification, mean for regression)
d. RETURN:
- Return prediction and optionally confidence/probabilities
Let's now detail each component with precise pseudocode.
Notice that Step 1 (Training) does no computation beyond storage. All the work happens in Step 2 (Prediction). This is the defining characteristic of lazy learning — training is O(n) storage, prediction is O(nd + n log k) computation.
The distance computation phase calculates distances from the query point to all training points. This is typically the most expensive step.
Pseudocode: Euclidean Distance
FUNCTION euclidean_distance(x1, x2):
// x1, x2 are d-dimensional vectors
sum_sq = 0
FOR i = 1 TO d:
diff = x1[i] - x2[i]
sum_sq = sum_sq + diff * diff
RETURN sqrt(sum_sq)
Vectorized Distance Matrix
For efficiency, compute distances to all training points at once:
FUNCTION compute_all_distances(x_q, X_train):
// x_q: query point (d dimensions)
// X_train: n x d matrix of training points
// Returns: n-dimensional vector of distances
differences = X_train - x_q // Broadcasting: n x d matrix
squared = differences * differences // Element-wise square
sum_squared = rowsum(squared) // n-dimensional vector
distances = sqrt(sum_squared) // Element-wise sqrt
RETURN distances
1
Given n distances, we need to find the k smallest. There are several approaches with different trade-offs:
Approach 1: Full Sort
FUNCTION find_k_nearest_fullsort(distances, k):
// Sort all distances, return first k indices
sorted_indices = argsort(distances) // O(n log n)
RETURN sorted_indices[0:k]
Complexity: O(n log n). Simple but wasteful when k << n.
Approach 2: Partial Sort (Partition)
FUNCTION find_k_nearest_partition(distances, k):
// Partition around k-th element, return first k indices
indices = argpartition(distances, k) // O(n) average
k_indices = indices[0:k]
// Optional: sort the k indices by distance
k_distances = distances[k_indices]
sorted_within_k = argsort(k_distances)
RETURN k_indices[sorted_within_k]
Complexity: O(n) average for partition + O(k log k) for optional sorting.
Approach 3: Heap-based Selection
FUNCTION find_k_nearest_heap(distances, k):
// Maintain a max-heap of size k
heap = MaxHeap()
FOR i = 0 TO n-1:
IF heap.size() < k:
heap.push((distances[i], i))
ELIF distances[i] < heap.peek().distance:
heap.pop()
heap.push((distances[i], i))
RETURN heap.items()
Complexity: O(n log k). Best when k << n.
| Algorithm | Time Complexity | When to Use |
|---|---|---|
| Full Sort | O(n log n) | When k ≈ n, or when you need all neighbors sorted |
| Partition (Quickselect) | O(n) average | General case; default choice in numpy |
| Heap Selection | O(n log k) | When k << n; streaming scenarios |
| Using KD-tree | O(log n) per query | When making many queries on same data |
1
Here is the complete, production-ready KNN classification algorithm integrating all components:
ALGORITHM KNN_Classification:
INPUT:
X_train: n × d training feature matrix
y_train: n-dimensional label vector (integer classes 0, 1, ..., C-1)
k: number of neighbors
weights: 'uniform' or 'distance'
metric: distance function
TRAINING (fit):
STORE X_train, y_train
n_classes = max(y_train) + 1
PREDICTION (predict for query x_q):
// Step 1: Compute distances
FOR i = 1 TO n:
distances[i] = metric(x_q, X_train[i])
// Step 2: Find k nearest
neighbor_indices = argpartition(distances, k)[:k]
neighbor_distances = distances[neighbor_indices]
neighbor_labels = y_train[neighbor_indices]
// Step 3: Compute weights
IF weights == 'uniform':
w = ones(k)
ELSE IF weights == 'distance':
w = 1 / (neighbor_distances + epsilon)
// Step 4: Aggregate votes
class_weights = zeros(n_classes)
FOR i = 1 TO k:
class_weights[neighbor_labels[i]] += w[i]
// Step 5: Return prediction
RETURN argmax(class_weights)
PROBABILITY (predict_proba for query x_q):
// Same as prediction steps 1-4, then:
probabilities = class_weights / sum(class_weights)
RETURN probabilities
1
KNN regression follows the same structure, with aggregation by mean instead of voting:
ALGORITHM KNN_Regression:
INPUT:
X_train: n × d training feature matrix
y_train: n-dimensional target vector (real values)
k: number of neighbors
weights: 'uniform' or 'distance'
aggregation: 'mean' or 'median'
PREDICTION (predict for query x_q):
// Steps 1-3: Same as classification
// Step 4: Aggregate values
IF aggregation == 'mean':
IF weights == 'uniform':
prediction = mean(neighbor_values)
ELSE:
prediction = sum(w * neighbor_values) / sum(w)
ELSE IF aggregation == 'median':
prediction = weighted_median(neighbor_values, w)
RETURN prediction
1
For production systems, a unified implementation that handles both classification and regression with all options is often desirable:
1
Understanding the computational complexity of KNN is essential for predicting performance and making deployment decisions.
| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Training (fit) | O(n·d) storage | O(n·d) | Just stores data; no computation |
| Distance computation | O(n·d) per query | O(n) | Dominates prediction time |
| Neighbor selection | O(n) avg, O(n log n) worst | O(k) | Using partition-based selection |
| Aggregation | O(k) | O(C) for C classes | Voting or averaging |
| Total prediction | O(n·d + n log k) | O(n + k) | Per query |
| Batch prediction (m queries) | O(m·n·d) | O(m·n) or O(n) | Can be parallelized |
For large datasets, the O(n·d) distance computation dominates. With n=1M and d=100, each prediction requires ~100 million floating-point operations. This is why acceleration structures (KD-trees, LSH, HNSW) are essential for production, reducing query time from O(n·d) to O(log n · d) or better.
Practical Performance Guidelines
| Dataset Size | Dimensionality | Typical Latency | Recommended Approach |
|---|---|---|---|
| n < 1,000 | d < 100 | <1 ms | Brute force is fine |
| n < 10,000 | d < 30 | 1-10 ms | Brute force or KD-tree |
| n < 100,000 | d < 20 | 10-100 ms | KD-tree or Ball tree |
| n < 1,000,000 | any d | 100ms-1s | Ball tree or ANN (FAISS, Annoy) |
| n > 1,000,000 | any d | varies | Must use ANN; brute force impractical |
Memory Requirements
For 32-bit floats:
For distance caching (rarely practical):
We've now covered the complete KNN algorithm from pseudocode to production-ready implementation.
Module Complete!
You've now mastered the KNN Algorithm module, covering:
In the next module, we'll explore Distance Metrics—the choice of which fundamentally shapes what 'nearest' means in your feature space.
Congratulations! You now have a deep understanding of the K-Nearest Neighbors algorithm. You can implement KNN from scratch for both classification and regression, understand all the design choices and their tradeoffs, and know when KNN is appropriate and what its computational limits are. You're ready to explore distance metrics, the curse of dimensionality, and acceleration structures in upcoming modules.