Knn Algorithm - Learning Module

Loading content...

0/245

Algorithm Pseudocode

From Concepts to Implementation

We've explored the philosophical foundations, computational properties, hyperparameter selection, and aggregation schemes of K-Nearest Neighbors. Now it's time to synthesize everything into complete, implementable algorithms.

This page provides production-quality pseudocode and implementations for:

KNN Classification (basic and advanced)
KNN Regression
Unified KNN with all options
Computational complexity analysis

These implementations are designed to be correct, clear, and complete—suitable for understanding the algorithm deeply and as a foundation for production systems.

What You Will Learn

By the end of this page, you will have complete pseudocode for KNN classification and regression, understand the step-by-step algorithm flow, have working Python implementations ready to use or adapt, know the computational complexity of each step, and understand the extension points for advanced features.

Core Algorithm Structure

Every KNN implementation follows the same high-level structure, whether for classification or regression:

ALGORITHM KNN:

1. TRAINING (Store):
   - Store training features X_train
   - Store training labels/targets y_train
   - Optionally: build acceleration structure (KD-tree, ball tree)
   - Optionally: precompute and cache distance information

2. PREDICTION (for each query point x_q):
   a. DISTANCE COMPUTATION:
      - Compute distance from x_q to all training points
      - d_i = distance(x_q, x_i) for i = 1, ..., n
   
   b. NEIGHBOR SELECTION:
      - Find indices of k smallest distances
      - neighbors = argsort(distances)[:k]  // or use partial sort
   
   c. AGGREGATION:
      - Gather labels/values of selected neighbors
      - Apply weighting scheme
      - Compute final prediction (vote for classification, mean for regression)
      
   d. RETURN:
      - Return prediction and optionally confidence/probabilities

Let's now detail each component with precise pseudocode.

The Lazy Learning Contract

Notice that Step 1 (Training) does no computation beyond storage. All the work happens in Step 2 (Prediction). This is the defining characteristic of lazy learning — training is O(n) storage, prediction is O(nd + n log k) computation.

Distance Computation

The distance computation phase calculates distances from the query point to all training points. This is typically the most expensive step.

Pseudocode: Euclidean Distance

FUNCTION euclidean_distance(x1, x2):
    // x1, x2 are d-dimensional vectors
    
    sum_sq = 0
    FOR i = 1 TO d:
        diff = x1[i] - x2[i]
        sum_sq = sum_sq + diff * diff
    
    RETURN sqrt(sum_sq)

Vectorized Distance Matrix

For efficiency, compute distances to all training points at once:

FUNCTION compute_all_distances(x_q, X_train):
    // x_q: query point (d dimensions)
    // X_train: n x d matrix of training points
    // Returns: n-dimensional vector of distances
    
    differences = X_train - x_q  // Broadcasting: n x d matrix
    squared = differences * differences  // Element-wise square
    sum_squared = rowsum(squared)  // n-dimensional vector
    distances = sqrt(sum_squared)  // Element-wise sqrt
    
    RETURN distances

distance_computation.py

Python

Neighbor Selection

Given n distances, we need to find the k smallest. There are several approaches with different trade-offs:

Approach 1: Full Sort

FUNCTION find_k_nearest_fullsort(distances, k):
    // Sort all distances, return first k indices
    sorted_indices = argsort(distances)  // O(n log n)
    RETURN sorted_indices[0:k]

Complexity: O(n log n). Simple but wasteful when k << n.

Approach 2: Partial Sort (Partition)

FUNCTION find_k_nearest_partition(distances, k):
    // Partition around k-th element, return first k indices
    indices = argpartition(distances, k)  // O(n) average
    k_indices = indices[0:k]
    
    // Optional: sort the k indices by distance
    k_distances = distances[k_indices]
    sorted_within_k = argsort(k_distances)
    RETURN k_indices[sorted_within_k]

Complexity: O(n) average for partition + O(k log k) for optional sorting.

Approach 3: Heap-based Selection

FUNCTION find_k_nearest_heap(distances, k):
    // Maintain a max-heap of size k
    heap = MaxHeap()
    
    FOR i = 0 TO n-1:
        IF heap.size() < k:
            heap.push((distances[i], i))
        ELIF distances[i] < heap.peek().distance:
            heap.pop()
            heap.push((distances[i], i))
    
    RETURN heap.items()

Complexity: O(n log k). Best when k << n.

Neighbor Selection Algorithm Comparison
Algorithm	Time Complexity	When to Use
Full Sort	O(n log n)	When k ≈ n, or when you need all neighbors sorted
Partition (Quickselect)	O(n) average	General case; default choice in numpy
Heap Selection	O(n log k)	When k << n; streaming scenarios
Using KD-tree	O(log n) per query	When making many queries on same data

neighbor_selection.py

Python

Complete KNN Classification Algorithm

Here is the complete, production-ready KNN classification algorithm integrating all components:

ALGORITHM KNN_Classification:

INPUT:
    X_train: n × d training feature matrix
    y_train: n-dimensional label vector (integer classes 0, 1, ..., C-1)
    k: number of neighbors
    weights: 'uniform' or 'distance'
    metric: distance function

TRAINING (fit):
    STORE X_train, y_train
    n_classes = max(y_train) + 1

PREDICTION (predict for query x_q):
    // Step 1: Compute distances
    FOR i = 1 TO n:
        distances[i] = metric(x_q, X_train[i])
    
    // Step 2: Find k nearest
    neighbor_indices = argpartition(distances, k)[:k]
    neighbor_distances = distances[neighbor_indices]
    neighbor_labels = y_train[neighbor_indices]
    
    // Step 3: Compute weights
    IF weights == 'uniform':
        w = ones(k)
    ELSE IF weights == 'distance':
        w = 1 / (neighbor_distances + epsilon)
    
    // Step 4: Aggregate votes
    class_weights = zeros(n_classes)
    FOR i = 1 TO k:
        class_weights[neighbor_labels[i]] += w[i]
    
    // Step 5: Return prediction
    RETURN argmax(class_weights)

PROBABILITY (predict_proba for query x_q):
    // Same as prediction steps 1-4, then:
    probabilities = class_weights / sum(class_weights)
    RETURN probabilities

knn_classifier.py

Python

Complete KNN Regression Algorithm

KNN regression follows the same structure, with aggregation by mean instead of voting:

ALGORITHM KNN_Regression:

INPUT:
    X_train: n × d training feature matrix
    y_train: n-dimensional target vector (real values)
    k: number of neighbors
    weights: 'uniform' or 'distance'
    aggregation: 'mean' or 'median'

PREDICTION (predict for query x_q):
    // Steps 1-3: Same as classification
    
    // Step 4: Aggregate values
    IF aggregation == 'mean':
        IF weights == 'uniform':
            prediction = mean(neighbor_values)
        ELSE:
            prediction = sum(w * neighbor_values) / sum(w)
    ELSE IF aggregation == 'median':
        prediction = weighted_median(neighbor_values, w)
    
    RETURN prediction

knn_regressor.py

Python

Unified KNN Implementation

For production systems, a unified implementation that handles both classification and regression with all options is often desirable:

knn_unified.py

Python

Computational Complexity Analysis

Understanding the computational complexity of KNN is essential for predicting performance and making deployment decisions.

KNN Complexity by Phase
Operation	Time Complexity	Space Complexity	Notes
Training (fit)	O(n·d) storage	O(n·d)	Just stores data; no computation
Distance computation	O(n·d) per query	O(n)	Dominates prediction time
Neighbor selection	O(n) avg, O(n log n) worst	O(k)	Using partition-based selection
Aggregation	O(k)	O(C) for C classes	Voting or averaging
Total prediction	O(n·d + n log k)	O(n + k)	Per query
Batch prediction (m queries)	O(m·n·d)	O(m·n) or O(n)	Can be parallelized

The n·d Bottleneck

For large datasets, the O(n·d) distance computation dominates. With n=1M and d=100, each prediction requires ~100 million floating-point operations. This is why acceleration structures (KD-trees, LSH, HNSW) are essential for production, reducing query time from O(n·d) to O(log n · d) or better.

Practical Performance Guidelines

Dataset Size	Dimensionality	Typical Latency	Recommended Approach
n < 1,000	d < 100	<1 ms	Brute force is fine
n < 10,000	d < 30	1-10 ms	Brute force or KD-tree
n < 100,000	d < 20	10-100 ms	KD-tree or Ball tree
n < 1,000,000	any d	100ms-1s	Ball tree or ANN (FAISS, Annoy)
n > 1,000,000	any d	varies	Must use ANN; brute force impractical

Memory Requirements

For 32-bit floats:

Training data: $n \times d \times 4$ bytes
Example: 1M points × 100 dimensions = 400 MB

For distance caching (rarely practical):

Distance matrix: $n^2 \times 8$ bytes (doubles)
Example: 10K points = 800 MB; 100K points = 80 GB (impractical)

Summary: Complete KNN Implementation

We've now covered the complete KNN algorithm from pseudocode to production-ready implementation.

Key Takeaways

•Core structure: Training stores data; prediction computes distances, selects neighbors, and aggregates.
•Distance computation: Typically O(n·d) per query; vectorized implementations are essential for efficiency.
•Neighbor selection: Partition-based (argpartition) is O(n) average, better than O(n log n) full sort.
•Classification: Weighted voting with optional probability outputs.
•Regression: Weighted mean/median with uncertainty estimation.
•Unified implementation: Clean abstraction enabling extensions and customization.
•Complexity: O(n·d) per prediction is the fundamental bottleneck; acceleration structures are covered in Module 4.

Module Complete!

You've now mastered the KNN Algorithm module, covering:

Instance-based Learning: The paradigm of remembering rather than abstracting
Lazy Learning: Deferring computation to query time
K Selection: Choosing the optimal number of neighbors
Voting Schemes: Aggregating neighbor opinions
Algorithm Pseudocode: Complete, production-ready implementations

In the next module, we'll explore Distance Metrics—the choice of which fundamentally shapes what 'nearest' means in your feature space.

Module Complete

Congratulations! You now have a deep understanding of the K-Nearest Neighbors algorithm. You can implement KNN from scratch for both classification and regression, understand all the design choices and their tradeoffs, and know when KNN is appropriate and what its computational limits are. You're ready to explore distance metrics, the curse of dimensionality, and acceleration structures in upcoming modules.

Algorithm Pseudocode

From Concepts to Implementation

This page provides production-quality pseudocode and implementations for:

KNN Classification (basic and advanced)
KNN Regression
Unified KNN with all options
Computational complexity analysis

These implementations are designed to be correct, clear, and complete—suitable for understanding the algorithm deeply and as a foundation for production systems.

What You Will Learn

Core Algorithm Structure

Every KNN implementation follows the same high-level structure, whether for classification or regression:

ALGORITHM KNN:

1. TRAINING (Store):
   - Store training features X_train
   - Store training labels/targets y_train
   - Optionally: build acceleration structure (KD-tree, ball tree)
   - Optionally: precompute and cache distance information

2. PREDICTION (for each query point x_q):
   a. DISTANCE COMPUTATION:
      - Compute distance from x_q to all training points
      - d_i = distance(x_q, x_i) for i = 1, ..., n
   
   b. NEIGHBOR SELECTION:
      - Find indices of k smallest distances
      - neighbors = argsort(distances)[:k]  // or use partial sort
   
   c. AGGREGATION:
      - Gather labels/values of selected neighbors
      - Apply weighting scheme
      - Compute final prediction (vote for classification, mean for regression)
      
   d. RETURN:
      - Return prediction and optionally confidence/probabilities

Let's now detail each component with precise pseudocode.

The Lazy Learning Contract

Distance Computation

The distance computation phase calculates distances from the query point to all training points. This is typically the most expensive step.

Pseudocode: Euclidean Distance

FUNCTION euclidean_distance(x1, x2):
    // x1, x2 are d-dimensional vectors
    
    sum_sq = 0
    FOR i = 1 TO d:
        diff = x1[i] - x2[i]
        sum_sq = sum_sq + diff * diff
    
    RETURN sqrt(sum_sq)

Vectorized Distance Matrix

For efficiency, compute distances to all training points at once:

FUNCTION compute_all_distances(x_q, X_train):
    // x_q: query point (d dimensions)
    // X_train: n x d matrix of training points
    // Returns: n-dimensional vector of distances
    
    differences = X_train - x_q  // Broadcasting: n x d matrix
    squared = differences * differences  // Element-wise square
    sum_squared = rowsum(squared)  // n-dimensional vector
    distances = sqrt(sum_squared)  // Element-wise sqrt
    
    RETURN distances

distance_computation.py

Python

Neighbor Selection

Given n distances, we need to find the k smallest. There are several approaches with different trade-offs:

Approach 1: Full Sort

FUNCTION find_k_nearest_fullsort(distances, k):
    // Sort all distances, return first k indices
    sorted_indices = argsort(distances)  // O(n log n)
    RETURN sorted_indices[0:k]

Complexity: O(n log n). Simple but wasteful when k << n.

Approach 2: Partial Sort (Partition)

FUNCTION find_k_nearest_partition(distances, k):
    // Partition around k-th element, return first k indices
    indices = argpartition(distances, k)  // O(n) average
    k_indices = indices[0:k]
    
    // Optional: sort the k indices by distance
    k_distances = distances[k_indices]
    sorted_within_k = argsort(k_distances)
    RETURN k_indices[sorted_within_k]

Complexity: O(n) average for partition + O(k log k) for optional sorting.

Approach 3: Heap-based Selection

FUNCTION find_k_nearest_heap(distances, k):
    // Maintain a max-heap of size k
    heap = MaxHeap()
    
    FOR i = 0 TO n-1:
        IF heap.size() < k:
            heap.push((distances[i], i))
        ELIF distances[i] < heap.peek().distance:
            heap.pop()
            heap.push((distances[i], i))
    
    RETURN heap.items()

Complexity: O(n log k). Best when k << n.

Neighbor Selection Algorithm Comparison
Algorithm	Time Complexity	When to Use
Full Sort	O(n log n)	When k ≈ n, or when you need all neighbors sorted
Partition (Quickselect)	O(n) average	General case; default choice in numpy
Heap Selection	O(n log k)	When k << n; streaming scenarios
Using KD-tree	O(log n) per query	When making many queries on same data

neighbor_selection.py

Python

Complete KNN Classification Algorithm

Here is the complete, production-ready KNN classification algorithm integrating all components:

ALGORITHM KNN_Classification:

INPUT:
    X_train: n × d training feature matrix
    y_train: n-dimensional label vector (integer classes 0, 1, ..., C-1)
    k: number of neighbors
    weights: 'uniform' or 'distance'
    metric: distance function

TRAINING (fit):
    STORE X_train, y_train
    n_classes = max(y_train) + 1

PREDICTION (predict for query x_q):
    // Step 1: Compute distances
    FOR i = 1 TO n:
        distances[i] = metric(x_q, X_train[i])
    
    // Step 2: Find k nearest
    neighbor_indices = argpartition(distances, k)[:k]
    neighbor_distances = distances[neighbor_indices]
    neighbor_labels = y_train[neighbor_indices]
    
    // Step 3: Compute weights
    IF weights == 'uniform':
        w = ones(k)
    ELSE IF weights == 'distance':
        w = 1 / (neighbor_distances + epsilon)
    
    // Step 4: Aggregate votes
    class_weights = zeros(n_classes)
    FOR i = 1 TO k:
        class_weights[neighbor_labels[i]] += w[i]
    
    // Step 5: Return prediction
    RETURN argmax(class_weights)

PROBABILITY (predict_proba for query x_q):
    // Same as prediction steps 1-4, then:
    probabilities = class_weights / sum(class_weights)
    RETURN probabilities

knn_classifier.py

Python

Complete KNN Regression Algorithm

KNN regression follows the same structure, with aggregation by mean instead of voting:

ALGORITHM KNN_Regression:

INPUT:
    X_train: n × d training feature matrix
    y_train: n-dimensional target vector (real values)
    k: number of neighbors
    weights: 'uniform' or 'distance'
    aggregation: 'mean' or 'median'

PREDICTION (predict for query x_q):
    // Steps 1-3: Same as classification
    
    // Step 4: Aggregate values
    IF aggregation == 'mean':
        IF weights == 'uniform':
            prediction = mean(neighbor_values)
        ELSE:
            prediction = sum(w * neighbor_values) / sum(w)
    ELSE IF aggregation == 'median':
        prediction = weighted_median(neighbor_values, w)
    
    RETURN prediction

knn_regressor.py

Python

Unified KNN Implementation

For production systems, a unified implementation that handles both classification and regression with all options is often desirable:

knn_unified.py

Python

Computational Complexity Analysis

Understanding the computational complexity of KNN is essential for predicting performance and making deployment decisions.

KNN Complexity by Phase
Operation	Time Complexity	Space Complexity	Notes
Training (fit)	O(n·d) storage	O(n·d)	Just stores data; no computation
Distance computation	O(n·d) per query	O(n)	Dominates prediction time
Neighbor selection	O(n) avg, O(n log n) worst	O(k)	Using partition-based selection
Aggregation	O(k)	O(C) for C classes	Voting or averaging
Total prediction	O(n·d + n log k)	O(n + k)	Per query
Batch prediction (m queries)	O(m·n·d)	O(m·n) or O(n)	Can be parallelized

The n·d Bottleneck

Practical Performance Guidelines

Dataset Size	Dimensionality	Typical Latency	Recommended Approach
n < 1,000	d < 100	<1 ms	Brute force is fine
n < 10,000	d < 30	1-10 ms	Brute force or KD-tree
n < 100,000	d < 20	10-100 ms	KD-tree or Ball tree
n < 1,000,000	any d	100ms-1s	Ball tree or ANN (FAISS, Annoy)
n > 1,000,000	any d	varies	Must use ANN; brute force impractical

Memory Requirements

For 32-bit floats:

Training data: $n \times d \times 4$ bytes
Example: 1M points × 100 dimensions = 400 MB

For distance caching (rarely practical):

Distance matrix: $n^2 \times 8$ bytes (doubles)
Example: 10K points = 800 MB; 100K points = 80 GB (impractical)

Summary: Complete KNN Implementation

We've now covered the complete KNN algorithm from pseudocode to production-ready implementation.

Key Takeaways

•Core structure: Training stores data; prediction computes distances, selects neighbors, and aggregates.
•Distance computation: Typically O(n·d) per query; vectorized implementations are essential for efficiency.
•Neighbor selection: Partition-based (argpartition) is O(n) average, better than O(n log n) full sort.
•Classification: Weighted voting with optional probability outputs.
•Regression: Weighted mean/median with uncertainty estimation.
•Unified implementation: Clean abstraction enabling extensions and customization.
•Complexity: O(n·d) per prediction is the fundamental bottleneck; acceleration structures are covered in Module 4.

Module Complete!

You've now mastered the KNN Algorithm module, covering:

Instance-based Learning: The paradigm of remembering rather than abstracting
Lazy Learning: Deferring computation to query time
K Selection: Choosing the optimal number of neighbors
Voting Schemes: Aggregating neighbor opinions
Algorithm Pseudocode: Complete, production-ready implementations

In the next module, we'll explore Distance Metrics—the choice of which fundamentally shapes what 'nearest' means in your feature space.

Module Complete