Machine LearningK-Nearest Neighbors

KNN Variants

LevelAdvanced

Duration90 mins

TopicK-Nearest Neighbors

1 / 5

Condensed Nearest Neighbors

The Storage and Speed Problem in Instance-Based Learning

K-Nearest Neighbors belongs to the family of instance-based learning algorithms—methods that store the entire training dataset and use it directly at prediction time. While this approach offers remarkable simplicity and the ability to model arbitrarily complex decision boundaries, it carries a fundamental computational burden: every prediction requires distance computations to all stored training instances.

Consider the implications at scale. A training set of 1 million instances with 100 features means every prediction involves 1 million distance calculations in 100-dimensional space. Even with efficient data structures like KD-trees or ball trees (which degrade to brute-force performance in high dimensions), this computational overhead becomes prohibitive for real-time applications.

But here's the critical insight: not all training instances contribute equally to the decision boundary. Points deep within the interior of a class region provide no discriminative value—removing them would leave the decision boundary unchanged. It's only the points near class boundaries that truly matter for classification.

Condensed Nearest Neighbors (CNN) exploits this observation to dramatically reduce the training set while preserving classification accuracy. It answers a deceptively simple question: What is the smallest subset of training data that correctly classifies all original training instances using 1-NN?

What You Will Master

By completing this page, you will understand the theoretical foundation of prototype selection, master the CNN algorithm with its variants, analyze its computational complexity and convergence properties, recognize its limitations, and know when CNN is the right choice versus alternative data reduction techniques.

Theoretical Foundation: Consistent Subsets

The mathematical foundation of CNN rests on the concept of a consistent subset. Before diving into the algorithm, we must precisely define what we're seeking.

Formal Definition of Consistency

Let $S = {(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \ldots, (\mathbf{x}_n, y_n)}$ be our original training set, where $\mathbf{x}_i \in \mathbb{R}^d$ are feature vectors and $y_i \in {1, 2, \ldots, K}$ are class labels.

Definition (Consistent Subset): A subset $T \subseteq S$ is said to be consistent with respect to $S$ if every point in $S$ is correctly classified by the 1-nearest-neighbor rule using only the points in $T$.

Formally, $T$ is consistent if:

$$\forall (\mathbf{x}_i, y_i) \in S: \text{1-NN}(\mathbf{x}_i, T) = y_i$$

where $\text{1-NN}(\mathbf{x}, T)$ returns the label of the nearest neighbor to $\mathbf{x}$ within $T$.

The Minimum Consistent Subset Problem

The ideal goal would be to find the minimum consistent subset—the smallest $T$ that correctly classifies all of $S$. Unfortunately, this minimum consistent subset problem is NP-hard. The reduction can be shown from the minimum set cover problem, which is a well-known NP-complete problem.

Computational Intractability

Finding the absolute minimum consistent subset is computationally intractable for large datasets. The Condensed Nearest Neighbors algorithm provides a greedy heuristic that produces a consistent subset, but not necessarily the smallest one. The quality of the reduction depends heavily on the order of processing and the geometry of the data.

Why Consistency Alone Isn't Enough

A consistent subset guarantees correct classification on the training data, but says nothing about generalization. In fact, there's a subtle but important relationship between subset size and generalization:

Too Large (Full Dataset): All training points retained; no storage benefit.
Too Small (Aggressive Reduction): May lose critical boundary information; generalization degrades.
Optimal: Retains boundary-defining points while removing redundant interior points.

The key insight is that points near class boundaries are geometrically necessary for maintaining the decision surface, while interior points are redundant because their contribution is dominated by closer boundary points.

Geometric Interpretation

Consider a two-class problem in 2D. The training data forms two clusters. The decision boundary of 1-NN is the Voronoi tessellation where each cell is assigned the label of its generating point.

Now observe:

An interior point $\mathbf{x}_i$ whose Voronoi cell is entirely surrounded by cells of the same class contributes nothing to the inter-class boundary.
Removing such interior points doesn't change where the decision boundary lies between classes.
Only points whose Voronoi cells touch cells of a different class—the border points—define the decision boundary.

This geometric understanding motivates the CNN algorithm: iteratively identify and retain only those points that are necessary for correct classification.

Types of Training Points in KNN
Point Type	Definition	Contribution to Decision Boundary	CNN Treatment
Border Point	Nearest to a point of different class	Directly shapes the decision boundary	Always retained
Interior Point	Surrounded only by same-class points	No contribution to inter-class boundary	Removed (redundant)
Outlier/Noise	Isolated point far from class bulk	Creates spurious local boundary	May be retained (problematic)
Bridge Point	Connects two clusters of same class	Maintains cluster connectivity	Situationally retained

The Condensed Nearest Neighbors Algorithm

The original CNN algorithm was introduced by Peter Hart in 1968. It operates through an incremental, greedy approach: start with a minimal subset and iteratively add points that are misclassified by the current subset.

Hart's Original Algorithm

The algorithm maintains two sets:

STORE: The condensed subset we're building (initially nearly empty)
GRABBAG: Remaining points from the original training set

Algorithm (CNN - Hart, 1968):

condensed_nearest_neighbors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def condensed_nearest_neighbors(X, y, distance_metric='euclidean'):
    """
    Condensed Nearest Neighbors Algorithm (Hart, 1968)
    
    Computes a consistent subset of training data that correctly classifies
    all original training instances using 1-NN.
    
    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_features)
        Training feature vectors
    y : ndarray of shape (n_samples,)
        Training class labels
    distance_metric : str or callable
        Distance metric for nearest neighbor computation
    
    Returns:
    --------
    store_indices : list
        Indices of points retained in the condensed set
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    n_samples = len(X)
    classes = np.unique(y)
    
    # Step 1: Initialize STORE with one point from each class
    # This ensures STORE can represent every class
    store_indices = []
    for c in classes:
        class_indices = np.where(y == c)[0]
        store_indices.append(class_indices[0])
    
    store_indices = set(store_indices)
    grabbag_indices = set(range(n_samples)) - store_indices
    
    # Step 2: Iterate until no changes occur (convergence)
    changed = True
    iteration = 0
    
    while changed:
        changed = False
        iteration += 1
        
        # Convert to lists for indexing
        store_list = list(store_indices)
        grabbag_list = list(grabbag_indices)
        
        if len(grabbag_list) == 0:
            break
        
        # Compute distances from grabbag points to store points
        X_store = X[store_list]
        X_grabbag = X[grabbag_list]
        
        distances = cdist(X_grabbag, X_store, metric=distance_metric)
        
        # For each grabbag point, find nearest store neighbor
        nearest_store_idx = np.argmin(distances, axis=1)
        predicted_labels = y[np.array(store_list)[nearest_store_idx]]
        true_labels = y[grabbag_list]
        
        # Find misclassified points
        misclassified_mask = predicted_labels != true_labels
        
        if np.any(misclassified_mask):
            changed = True
            # Add misclassified points to STORE
            misclassified_grabbag = np.array(grabbag_list)[misclassified_mask]
            for idx in misclassified_grabbag:
                store_indices.add(idx)
                grabbag_indices.remove(idx)
    
    print(f"CNN completed in {iteration} iterations")
    print(f"Reduced from {n_samples} to {len(store_indices)} points")
    print(f"Reduction ratio: {100 * (1 - len(store_indices)/n_samples):.1f}%")
    
    return list(store_indices)

Step-by-Step Execution Trace

Let's trace through a simple example to build intuition. Consider 8 training points in 2D from two classes (+ and -):

Point	Coordinates	Class
A	(1, 1)	+
B	(2, 1)	+
C	(1.5, 2)	+
D	(5, 5)	+
E	(7, 1)	-
F	(8, 2)	-
G	(7.5, 1.5)	-
H	(9, 3)	-

Iteration 0 (Initialization):

STORE = {A, E} (one point per class)
GRABBAG = {B, C, D, F, G, H}

Iteration 1:

Check B: Nearest in STORE = A (same class +). Correctly classified. Stays in GRABBAG.
Check C: Nearest in STORE = A (same class +). Correctly classified. Stays in GRABBAG.
Check D: Nearest in STORE = A (same class +). Correctly classified. Stays in GRABBAG.
Check F: Nearest in STORE = E (same class -). Correctly classified. Stays in GRABBAG.
Check G: Nearest in STORE = E (same class -). Correctly classified. Stays in GRABBAG.
Check H: Nearest in STORE = E (same class -). Correctly classified. Stays in GRABBAG.

If all GRABBAG points are correctly classified, the algorithm terminates with STORE = {A, E}.

This example illustrates why CNN achieves dramatic reduction: two well-separated clusters need only one representative each.

Order Dependence

The CNN algorithm is highly sensitive to initialization and processing order. Different initializations can lead to different final subsets, all of which are consistent but have different sizes. Randomizing the order and running multiple trials can help find smaller consistent subsets.

Computational Complexity Analysis

Understanding CNN's computational behavior is critical for determining its applicability to large-scale problems. Let's analyze both time and space complexity rigorously.

Time Complexity

Let $n$ be the number of training instances, $d$ the dimensionality, and $k$ the number of iterations until convergence.

Per-Iteration Cost: In each iteration, we compute distances between all GRABBAG points and all STORE points. If GRABBAG has $g$ points and STORE has $s$ points:

$$\text{Distance computation: } O(g \cdot s \cdot d)$$

Number of Iterations: The algorithm terminates when no point moves from GRABBAG to STORE. In the worst case, exactly one point is added per iteration, requiring $O(n)$ iterations.

Worst-Case Total Complexity:

$$O(n^3 \cdot d)$$

This occurs when:

Points are added one at a time
STORE grows linearly while GRABBAG shrinks linearly

Average-Case Behavior: In practice, many points are often added in early iterations, and the algorithm converges much faster. Empirically, for well-separated clusters, the number of iterations is typically $O(\log n)$ to $O(\sqrt{n})$.

CNN Complexity Breakdown
Operation	Best Case	Average Case	Worst Case
Iterations	O(1)	O(log n) to O(√n)	O(n)
Per-iteration distance computation	O(n · d)	O(n · \|STORE\| · d)	O(n² · d)
Total time complexity	O(n · d)	O(n · √n · d)	O(n³ · d)
Space complexity	O(n · d)	O(n · d)	O(n · d)

Space Complexity

CNN requires:

Input storage: $O(n \cdot d)$ for the original training set
Index sets: $O(n)$ for STORE and GRABBAG indices
Distance matrix: $O(|\text{GRABBAG}| \cdot |\text{STORE}|)$ per iteration

The distance matrix is the dominant factor. A memory-efficient implementation computes distances row-by-row rather than materializing the full matrix.

Reduction Ratio Analysis

The reduction ratio is the key practical metric:

$$\text{Reduction Ratio} = 1 - \frac{|\text{STORE}|}{n}$$

Factors Affecting Reduction:

Class Separation: Well-separated classes → Higher reduction (fewer border points needed)
Cluster Compactness: Tight clusters → Higher reduction (fewer representatives needed per cluster)
Label Noise: Noisy labels → Lower reduction (noise points become "necessary" borders)
Dimensionality: Higher dimensions → Generally lower reduction (curse of dimensionality spreads points)
Number of Classes: More classes → Lower reduction (more borders between class pairs)

Practical Speedup

The real benefit of CNN comes at prediction time, not training time. If CNN reduces the training set by 90% (from 100,000 to 10,000 points), every prediction becomes 10× faster. For applications requiring millions of predictions, this speedup is transformative—even if the initial CNN computation takes hours.

Convergence and Correctness Guarantees

The CNN algorithm enjoys important theoretical guarantees, though with notable limitations. Understanding these properties is essential for principled application.

Convergence Guarantee

Theorem (CNN Convergence): The CNN algorithm terminates in finite time.

Proof Sketch:

The STORE set is monotonically non-decreasing: once a point enters STORE, it never leaves.
GRABBAG is finite and monotonically non-increasing.
In each iteration where the algorithm doesn't terminate, at least one point moves from GRABBAG to STORE.
Since GRABBAG is finite and strictly decreases when non-empty after an iteration, the algorithm must terminate within $n$ iterations. ∎

Consistency Guarantee

Theorem (CNN Consistency): Upon termination, STORE is a consistent subset of the original training set.

Proof: By the termination condition, the algorithm stops only when every point in GRABBAG (and trivially, every point in STORE) is correctly classified by 1-NN using STORE.

Combined with the fact that STORE ∪ GRABBAG always equals the original training set, this means every original training point is correctly classified by the final STORE. Thus STORE is consistent. ∎

What CNN Does NOT Guarantee

Limitations of CNN

•Minimum Size: CNN does not guarantee finding the minimum consistent subset. Different initializations and orderings produce different (valid) results.
•Improved Generalization: Training set consistency doesn't imply test set performance. Overfitting to training data is still possible.
•Noise Robustness: Noisy labels are treated as ground truth, so mislabeled points near class boundaries become 'necessary' border points.
•Deterministic Output: The algorithm is sensitive to initialization; multiple runs may produce different subsets.
•Order Independence: The final STORE depends on which points are processed in which order during iterations.

Relationship to Nearest Neighbor Covering

CNN can be understood through covering theory. A covering of a set is a collection of subsets whose union contains the original set. CNN finds a covering such that each training point is "covered" by a consistent neighbor in STORE.

The fundamental combinatorial object is the consistent covering:

Each point $\mathbf{x}_i$ with label $y_i$ must have its nearest neighbor in STORE also labeled $y_i$
The covering must use only elements from $S$ (not arbitrary prototypes)

This perspective connects CNN to other prototype selection methods in computational geometry, particularly the minimum dominating set problem on the mutual k-nearest-neighbor graph.

Convergence Rate Characterization

While worst-case convergence is $O(n)$ iterations, the actual convergence rate depends on data geometry:

Fast Convergence (Few Iterations):

Well-separated, compact clusters
Low label noise
Balanced class sizes

Slow Convergence (Many Iterations):

Highly overlapping class distributions
Complex, interleaved decision boundaries
High label noise
Severe class imbalance (rare class points each become border points)

CNN Variants and Algorithmic Improvements

The original CNN algorithm has inspired numerous improvements addressing its limitations. These variants offer different tradeoffs between reduction rate, accuracy, and computational cost.

Reduced Nearest Neighbors (RNN)

RNN is a post-processing step applied after CNN. It attempts to further shrink the condensed set by removing points that are no longer necessary for consistency.

RNN Algorithm:

Apply CNN to get initial STORE
For each point $\mathbf{x}$ in STORE (in some order):
- Temporarily remove $\mathbf{x}$ from STORE
- Check if all original training points are still correctly classified
- If yes: permanently remove $\mathbf{x}$
- If no: restore $\mathbf{x}$ to STORE

reduced_nearest_neighbors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def reduced_nearest_neighbors(X, y, store_indices, distance_metric='euclidean'):
    """
    Reduced Nearest Neighbors - Post-processing to further shrink CNN output
    
    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_features)
        Original training feature vectors
    y : ndarray of shape (n_samples,)
        Training class labels
    store_indices : list
        Indices from CNN output
    
    Returns:
    --------
    reduced_indices : list
        Further reduced set of indices
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    store_indices = list(store_indices)
    
    # Process in reverse order (points added later are more likely dispensable)
    for idx in reversed(store_indices.copy()):
        # Temporarily remove this point
        temp_store = [i for i in store_indices if i != idx]
        
        if len(temp_store) == 0:
            continue
        
        # Check if all original points are still correctly classified
        X_store = X[temp_store]
        distances = cdist(X, X_store, metric=distance_metric)
        nearest_indices = np.argmin(distances, axis=1)
        predicted = y[np.array(temp_store)[nearest_indices]]
        
        if np.all(predicted == y):
            # This point is dispensable
            store_indices.remove(idx)
    
    print(f"RNN reduced from {len(store_indices)} to {len(store_indices)} points")
    return store_indices

Generalized Condensed Nearest Neighbors (GCNN)

GCNN extends CNN to work with k-NN for k > 1 rather than just 1-NN. The consistency criterion becomes: all original points must be correctly classified by k-NN using STORE.

This is more challenging because:

Need at least $k$ points of each class in STORE
Consistency checking requires k-nearest-neighbor computation, not just 1-NN
More points are typically retained to ensure k-NN consistency

Modification Condensed Nearest Neighbors (MCNN)

MCNN modifies the iteration strategy to process points in a specific order based on distance to the decision boundary:

Sort training points by their "boundary proximity" score
Process points starting from those closest to estimated decision boundary
This tends to find smaller consistent subsets by prioritizing true border points

Comparison of CNN Variants
Variant	Key Modification	Advantage	Disadvantage
CNN (Original)	Greedy addition of misclassified points	Simple, guaranteed consistent	Not minimal, order-sensitive
RNN (Reduced)	Post-processing removal step	Smaller subsets than CNN alone	Additional O(n²) overhead
GCNN (Generalized)	Works with k-NN for k > 1	Consistent for k-NN classification	Larger subsets required
MCNN (Modified)	Boundary-prioritized ordering	Often finds smaller subsets	Requires boundary estimation
Fast CNN	Random sampling + incremental	Scales to larger datasets	Approximate consistency

Practical Recommendation

For most applications, apply CNN followed by RNN. The combination typically achieves 80-95% reduction on datasets with well-defined clusters, while the computational overhead of RNN is modest compared to the improvement in final subset size.

Practical Implementation Considerations

Implementing CNN effectively in production systems requires attention to several practical details that aren't evident from the theoretical algorithm.

Initialization Strategy

The choice of which points to initially add to STORE significantly affects the final result. Several strategies exist:

Random per Class: Select one random point from each class. Simple but may require multiple runs.

Centroid Seeding: Select the point closest to each class centroid. Tends to start with "typical" representatives.

Boundary Seeding: Use a quick heuristic to estimate boundary points and start with those. Often produces smaller final sets.

cnn_initialization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
from sklearn.neighbors import NearestNeighbors
 
def initialize_store_centroid(X, y):
    """Initialize STORE with points nearest to class centroids"""
    classes = np.unique(y)
    initial_indices = []
    
    for c in classes:
        class_mask = y == c
        class_points = X[class_mask]
        class_indices = np.where(class_mask)[0]
        
        # Compute class centroid
        centroid = class_points.mean(axis=0)
        
        # Find point nearest to centroid
        distances = np.linalg.norm(class_points - centroid, axis=1)
        nearest_idx = class_indices[np.argmin(distances)]
        initial_indices.append(nearest_idx)
    
    return set(initial_indices)
 
def initialize_store_boundary(X, y, n_neighbors=5):
    """Initialize STORE with estimated boundary points"""
    n_samples = len(X)
    initial_indices = set()
    
    # Fit nearest neighbors on full dataset
    nn = NearestNeighbors(n_neighbors=n_neighbors + 1)
    nn.fit(X)
    
    # For each point, check if neighbors include different classes
    for i in range(n_samples):
        _, indices = nn.kneighbors(X[i].reshape(1, -1))
        neighbor_labels = y[indices[0][1:]]  # Exclude self
        
        # If neighbors contain different classes, this is a boundary point
        if len(np.unique(neighbor_labels)) > 1 or y[i] not in neighbor_labels:
            initial_indices.add(i)
    
    # Ensure at least one point per class
    classes = np.unique(y)
    for c in classes:
        if not any(y[i] == c for i in initial_indices):
            class_indices = np.where(y == c)[0]
            initial_indices.add(class_indices[0])
    
    return initial_indices

Batch vs. Incremental Processing

The original CNN processes the entire GRABBAG in each iteration. For very large datasets, this becomes inefficient. Incremental CNN processes points one at a time:

incremental_cnn.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def incremental_cnn(X, y, distance_metric='euclidean'):
    """
    Incremental CNN - Processes one point at a time
    More memory efficient for large datasets
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    n_samples = len(X)
    classes = np.unique(y)
    
    # Initialize with one point per class
    store_indices = []
    for c in classes:
        class_idx = np.where(y == c)[0][0]
        store_indices.append(class_idx)
    
    store_set = set(store_indices)
    remaining = set(range(n_samples)) - store_set
    
    # Single pass through remaining points
    for idx in list(remaining):
        X_store = X[list(store_set)]
        x_query = X[idx].reshape(1, -1)
        
        # Find nearest neighbor in current STORE
        distances = cdist(x_query, X_store, metric=distance_metric)
        nearest_store_idx = list(store_set)[np.argmin(distances)]
        
        # If misclassified, add to STORE
        if y[nearest_store_idx] != y[idx]:
            store_set.add(idx)
    
    return list(store_set)

Incremental CNN Trade-off

Incremental CNN is faster (single pass) but may produce larger subsets than iterative CNN. Points that would be correctly classified after later additions to STORE are added prematurely because they were processed before those later additions. Use iterative CNN when subset size is critical; use incremental CNN when training time is the bottleneck.

Handling Feature Scaling

Like all distance-based methods, CNN is sensitive to feature scales. Features with larger numeric ranges dominate the distance computation.

Best Practice: Standardize or normalize features before applying CNN:

Standardization (z-score): Centers features at 0, scales to unit variance
Min-max normalization: Scales all features to [0, 1]

The choice should match whatever preprocessing you'll use at prediction time. The same transformation must be applied to test data.

Memory-Efficient Distance Computation

For large datasets, materializing an $n \times n$ distance matrix is impractical. Use chunked distance computation:

chunked_distance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def chunked_nearest_neighbor(X_query, X_store, y_store, chunk_size=1000):
    """
    Memory-efficient nearest neighbor search
    Processes query points in chunks to avoid large distance matrices
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    n_query = len(X_query)
    predictions = np.zeros(n_query, dtype=y_store.dtype)
    
    for start in range(0, n_query, chunk_size):
        end = min(start + chunk_size, n_query)
        chunk = X_query[start:end]
        
        # Only compute distances for this chunk
        distances = cdist(chunk, X_store)
        nearest_indices = np.argmin(distances, axis=1)
        predictions[start:end] = y_store[nearest_indices]
    
    return predictions

When to Use Condensed Nearest Neighbors

CNN is a powerful tool, but it's not universally applicable. Understanding its ideal use cases—and its limitations—helps you make informed decisions.

Ideal Conditions for CNN

CNN Works Best When

•Prediction speed is critical: Real-time classification with large training sets benefits enormously from reduced instance counts.
•Memory is constrained: Mobile devices or embedded systems often can't store full training datasets; CNN provides a principled reduction.
•Classes are well-separated: Compact clusters with clear margins achieve high reduction ratios (often 90%+).
•Data is relatively clean: CNN treats all training labels as ground truth; low label noise means reliable condensation.
•Training time is flexible: CNN's training overhead is acceptable if it enables much faster inference later.

CNN May Not Be Appropriate When

•Data is noisy: Label noise creates spurious boundary points that CNN retains, reducing effectiveness and potentially harming generalization.
•Classes heavily overlap: Interleaved distributions have many boundary points; reduction is minimal while computation is high.
•High dimensionality: Curse of dimensionality means most points are equidistant; CNN provides little reduction and k-d trees can't accelerate it.
•Data distribution shifts: If test data differs significantly from training, CNN's border points may be wrong for the new distribution.
•Minimum training is needed: If training time is critical and prediction volume is low, CNN's overhead may not pay off.

Good Use Case Example

Spam detection for email client: Training set of 500K emails (90% ham, 10% spam). After CNN, reduced to 50K emails. Client-side classification is now 10× faster, works offline, and fits in device memory.

Poor Use Case Example

Medical diagnosis with noisy labels: Each misdiagnosed training case becomes a 'critical boundary point.' CNN retains most of the noisy data while providing false confidence in the reduction. Better to use noise-tolerant methods first.

Decision Framework

Use this framework to decide whether CNN fits your problem:

Estimate class separability: Plot or compute inter-class vs. intra-class distance ratios. High separation → CNN effective.
Assess label quality: Quantify noise rate if possible. > 5% noise → Consider Edited Nearest Neighbors first (covered in next page).
Measure dimensionality impact: For $d > 20$, test CNN on a sample first. If reduction < 50%, alternative methods may be better.
Compute ROI: Estimate $\text{(Prediction volume)} \times \text{(Speedup per prediction)} - \text{(CNN training time)}$. Positive ROI = use CNN.
Validate rigorously: Always evaluate CNN performance on held-out test data, not just training set consistency.

Summary: Condensed Nearest Neighbors

We've conducted a deep exploration of Condensed Nearest Neighbors—from its theoretical foundations to practical implementation. Let's consolidate the essential knowledge:

Key Takeaways

•CNN finds a consistent subset: A reduced training set that correctly classifies all original training points using 1-NN.
•It exploits redundancy: Interior points contribute nothing to decision boundaries; only border points matter.
•Greedy but effective: While not guaranteed minimal, CNN typically achieves 80-95% reduction for well-clustered data.
•Convergence is guaranteed: The algorithm always terminates with a valid consistent subset.
•Order matters: Different initializations and processing orders yield different (valid) results.
•Combine with RNN: Post-processing with Reduced Nearest Neighbors often shrinks the subset further.
•Noise is problematic: Label noise creates spurious boundary points that undermine reduction effectiveness.
•Preprocessing is essential: Feature scaling affects distance computations and thus CNN's behavior.

Connection to Next Topics

CNN focuses on retaining necessary instances—but what about removing harmful instances? Noisy labels and outliers can degrade KNN performance even after CNN condensation.

The next page explores Edited Nearest Neighbors (ENN), which takes the opposite approach: instead of selecting points to keep, it identifies points to remove based on neighborhood consistency. Together, CNN and ENN form a powerful data cleaning pipeline for instance-based learning.

Page Complete

You now possess a thorough understanding of Condensed Nearest Neighbors—its theoretical basis, algorithm, complexity, variants, and practical applications. You can identify when CNN is appropriate and implement it effectively. Next, we examine Edited Nearest Neighbors for noise reduction and outlier removal.

1 / 5

Loading learning content...

Machine LearningK-Nearest Neighbors

KNN Variants

LevelAdvanced

Duration90 mins

TopicK-Nearest Neighbors

1 / 5

Condensed Nearest Neighbors

The Storage and Speed Problem in Instance-Based Learning

What You Will Master

Theoretical Foundation: Consistent Subsets

The mathematical foundation of CNN rests on the concept of a consistent subset. Before diving into the algorithm, we must precisely define what we're seeking.

Formal Definition of Consistency

Formally, $T$ is consistent if:

$$\forall (\mathbf{x}_i, y_i) \in S: \text{1-NN}(\mathbf{x}_i, T) = y_i$$

where $\text{1-NN}(\mathbf{x}, T)$ returns the label of the nearest neighbor to $\mathbf{x}$ within $T$.

The Minimum Consistent Subset Problem

Computational Intractability

Why Consistency Alone Isn't Enough

Too Large (Full Dataset): All training points retained; no storage benefit.
Too Small (Aggressive Reduction): May lose critical boundary information; generalization degrades.
Optimal: Retains boundary-defining points while removing redundant interior points.

Geometric Interpretation

Consider a two-class problem in 2D. The training data forms two clusters. The decision boundary of 1-NN is the Voronoi tessellation where each cell is assigned the label of its generating point.

Now observe:

An interior point $\mathbf{x}_i$ whose Voronoi cell is entirely surrounded by cells of the same class contributes nothing to the inter-class boundary.
Removing such interior points doesn't change where the decision boundary lies between classes.
Only points whose Voronoi cells touch cells of a different class—the border points—define the decision boundary.

This geometric understanding motivates the CNN algorithm: iteratively identify and retain only those points that are necessary for correct classification.

Types of Training Points in KNN
Point Type	Definition	Contribution to Decision Boundary	CNN Treatment
Border Point	Nearest to a point of different class	Directly shapes the decision boundary	Always retained
Interior Point	Surrounded only by same-class points	No contribution to inter-class boundary	Removed (redundant)
Outlier/Noise	Isolated point far from class bulk	Creates spurious local boundary	May be retained (problematic)
Bridge Point	Connects two clusters of same class	Maintains cluster connectivity	Situationally retained

The Condensed Nearest Neighbors Algorithm

Hart's Original Algorithm

The algorithm maintains two sets:

STORE: The condensed subset we're building (initially nearly empty)
GRABBAG: Remaining points from the original training set

Algorithm (CNN - Hart, 1968):

condensed_nearest_neighbors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def condensed_nearest_neighbors(X, y, distance_metric='euclidean'):
    """
    Condensed Nearest Neighbors Algorithm (Hart, 1968)
    
    Computes a consistent subset of training data that correctly classifies
    all original training instances using 1-NN.
    
    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_features)
        Training feature vectors
    y : ndarray of shape (n_samples,)
        Training class labels
    distance_metric : str or callable
        Distance metric for nearest neighbor computation
    
    Returns:
    --------
    store_indices : list
        Indices of points retained in the condensed set
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    n_samples = len(X)
    classes = np.unique(y)
    
    # Step 1: Initialize STORE with one point from each class
    # This ensures STORE can represent every class
    store_indices = []
    for c in classes:
        class_indices = np.where(y == c)[0]
        store_indices.append(class_indices[0])
    
    store_indices = set(store_indices)
    grabbag_indices = set(range(n_samples)) - store_indices
    
    # Step 2: Iterate until no changes occur (convergence)
    changed = True
    iteration = 0
    
    while changed:
        changed = False
        iteration += 1
        
        # Convert to lists for indexing
        store_list = list(store_indices)
        grabbag_list = list(grabbag_indices)
        
        if len(grabbag_list) == 0:
            break
        
        # Compute distances from grabbag points to store points
        X_store = X[store_list]
        X_grabbag = X[grabbag_list]
        
        distances = cdist(X_grabbag, X_store, metric=distance_metric)
        
        # For each grabbag point, find nearest store neighbor
        nearest_store_idx = np.argmin(distances, axis=1)
        predicted_labels = y[np.array(store_list)[nearest_store_idx]]
        true_labels = y[grabbag_list]
        
        # Find misclassified points
        misclassified_mask = predicted_labels != true_labels
        
        if np.any(misclassified_mask):
            changed = True
            # Add misclassified points to STORE
            misclassified_grabbag = np.array(grabbag_list)[misclassified_mask]
            for idx in misclassified_grabbag:
                store_indices.add(idx)
                grabbag_indices.remove(idx)
    
    print(f"CNN completed in {iteration} iterations")
    print(f"Reduced from {n_samples} to {len(store_indices)} points")
    print(f"Reduction ratio: {100 * (1 - len(store_indices)/n_samples):.1f}%")
    
    return list(store_indices)

Step-by-Step Execution Trace

Let's trace through a simple example to build intuition. Consider 8 training points in 2D from two classes (+ and -):

Point	Coordinates	Class
A	(1, 1)	+
B	(2, 1)	+
C	(1.5, 2)	+
D	(5, 5)	+
E	(7, 1)	-
F	(8, 2)	-
G	(7.5, 1.5)	-
H	(9, 3)	-

Iteration 0 (Initialization):

STORE = {A, E} (one point per class)
GRABBAG = {B, C, D, F, G, H}

Iteration 1:

Check B: Nearest in STORE = A (same class +). Correctly classified. Stays in GRABBAG.
Check C: Nearest in STORE = A (same class +). Correctly classified. Stays in GRABBAG.
Check D: Nearest in STORE = A (same class +). Correctly classified. Stays in GRABBAG.
Check F: Nearest in STORE = E (same class -). Correctly classified. Stays in GRABBAG.
Check G: Nearest in STORE = E (same class -). Correctly classified. Stays in GRABBAG.
Check H: Nearest in STORE = E (same class -). Correctly classified. Stays in GRABBAG.

If all GRABBAG points are correctly classified, the algorithm terminates with STORE = {A, E}.

This example illustrates why CNN achieves dramatic reduction: two well-separated clusters need only one representative each.

Order Dependence

Computational Complexity Analysis

Understanding CNN's computational behavior is critical for determining its applicability to large-scale problems. Let's analyze both time and space complexity rigorously.

Time Complexity

Let $n$ be the number of training instances, $d$ the dimensionality, and $k$ the number of iterations until convergence.

Per-Iteration Cost: In each iteration, we compute distances between all GRABBAG points and all STORE points. If GRABBAG has $g$ points and STORE has $s$ points:

$$\text{Distance computation: } O(g \cdot s \cdot d)$$

Number of Iterations: The algorithm terminates when no point moves from GRABBAG to STORE. In the worst case, exactly one point is added per iteration, requiring $O(n)$ iterations.

Worst-Case Total Complexity:

$$O(n^3 \cdot d)$$

This occurs when:

Points are added one at a time
STORE grows linearly while GRABBAG shrinks linearly

CNN Complexity Breakdown
Operation	Best Case	Average Case	Worst Case
Iterations	O(1)	O(log n) to O(√n)	O(n)
Per-iteration distance computation	O(n · d)	O(n · \|STORE\| · d)	O(n² · d)
Total time complexity	O(n · d)	O(n · √n · d)	O(n³ · d)
Space complexity	O(n · d)	O(n · d)	O(n · d)

Space Complexity

CNN requires:

Input storage: $O(n \cdot d)$ for the original training set
Index sets: $O(n)$ for STORE and GRABBAG indices
Distance matrix: $O(|\text{GRABBAG}| \cdot |\text{STORE}|)$ per iteration

The distance matrix is the dominant factor. A memory-efficient implementation computes distances row-by-row rather than materializing the full matrix.

Reduction Ratio Analysis

The reduction ratio is the key practical metric:

$$\text{Reduction Ratio} = 1 - \frac{|\text{STORE}|}{n}$$

Factors Affecting Reduction:

Class Separation: Well-separated classes → Higher reduction (fewer border points needed)
Cluster Compactness: Tight clusters → Higher reduction (fewer representatives needed per cluster)
Label Noise: Noisy labels → Lower reduction (noise points become "necessary" borders)
Dimensionality: Higher dimensions → Generally lower reduction (curse of dimensionality spreads points)
Number of Classes: More classes → Lower reduction (more borders between class pairs)

Practical Speedup

Convergence and Correctness Guarantees

The CNN algorithm enjoys important theoretical guarantees, though with notable limitations. Understanding these properties is essential for principled application.

Convergence Guarantee

Theorem (CNN Convergence): The CNN algorithm terminates in finite time.

Proof Sketch:

The STORE set is monotonically non-decreasing: once a point enters STORE, it never leaves.
GRABBAG is finite and monotonically non-increasing.
In each iteration where the algorithm doesn't terminate, at least one point moves from GRABBAG to STORE.
Since GRABBAG is finite and strictly decreases when non-empty after an iteration, the algorithm must terminate within $n$ iterations. ∎

Consistency Guarantee

Theorem (CNN Consistency): Upon termination, STORE is a consistent subset of the original training set.

Proof: By the termination condition, the algorithm stops only when every point in GRABBAG (and trivially, every point in STORE) is correctly classified by 1-NN using STORE.

What CNN Does NOT Guarantee

Limitations of CNN

•Minimum Size: CNN does not guarantee finding the minimum consistent subset. Different initializations and orderings produce different (valid) results.
•Improved Generalization: Training set consistency doesn't imply test set performance. Overfitting to training data is still possible.
•Noise Robustness: Noisy labels are treated as ground truth, so mislabeled points near class boundaries become 'necessary' border points.
•Deterministic Output: The algorithm is sensitive to initialization; multiple runs may produce different subsets.
•Order Independence: The final STORE depends on which points are processed in which order during iterations.

Relationship to Nearest Neighbor Covering

The fundamental combinatorial object is the consistent covering:

Each point $\mathbf{x}_i$ with label $y_i$ must have its nearest neighbor in STORE also labeled $y_i$
The covering must use only elements from $S$ (not arbitrary prototypes)

This perspective connects CNN to other prototype selection methods in computational geometry, particularly the minimum dominating set problem on the mutual k-nearest-neighbor graph.

Convergence Rate Characterization

While worst-case convergence is $O(n)$ iterations, the actual convergence rate depends on data geometry:

Fast Convergence (Few Iterations):

Well-separated, compact clusters
Low label noise
Balanced class sizes

Slow Convergence (Many Iterations):

Highly overlapping class distributions
Complex, interleaved decision boundaries
High label noise
Severe class imbalance (rare class points each become border points)

CNN Variants and Algorithmic Improvements

The original CNN algorithm has inspired numerous improvements addressing its limitations. These variants offer different tradeoffs between reduction rate, accuracy, and computational cost.

Reduced Nearest Neighbors (RNN)

RNN is a post-processing step applied after CNN. It attempts to further shrink the condensed set by removing points that are no longer necessary for consistency.

RNN Algorithm:

Apply CNN to get initial STORE
For each point $\mathbf{x}$ in STORE (in some order):
- Temporarily remove $\mathbf{x}$ from STORE
- Check if all original training points are still correctly classified
- If yes: permanently remove $\mathbf{x}$
- If no: restore $\mathbf{x}$ to STORE

reduced_nearest_neighbors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def reduced_nearest_neighbors(X, y, store_indices, distance_metric='euclidean'):
    """
    Reduced Nearest Neighbors - Post-processing to further shrink CNN output
    
    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_features)
        Original training feature vectors
    y : ndarray of shape (n_samples,)
        Training class labels
    store_indices : list
        Indices from CNN output
    
    Returns:
    --------
    reduced_indices : list
        Further reduced set of indices
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    store_indices = list(store_indices)
    
    # Process in reverse order (points added later are more likely dispensable)
    for idx in reversed(store_indices.copy()):
        # Temporarily remove this point
        temp_store = [i for i in store_indices if i != idx]
        
        if len(temp_store) == 0:
            continue
        
        # Check if all original points are still correctly classified
        X_store = X[temp_store]
        distances = cdist(X, X_store, metric=distance_metric)
        nearest_indices = np.argmin(distances, axis=1)
        predicted = y[np.array(temp_store)[nearest_indices]]
        
        if np.all(predicted == y):
            # This point is dispensable
            store_indices.remove(idx)
    
    print(f"RNN reduced from {len(store_indices)} to {len(store_indices)} points")
    return store_indices

Generalized Condensed Nearest Neighbors (GCNN)

GCNN extends CNN to work with k-NN for k > 1 rather than just 1-NN. The consistency criterion becomes: all original points must be correctly classified by k-NN using STORE.

This is more challenging because:

Need at least $k$ points of each class in STORE
Consistency checking requires k-nearest-neighbor computation, not just 1-NN
More points are typically retained to ensure k-NN consistency

Modification Condensed Nearest Neighbors (MCNN)

MCNN modifies the iteration strategy to process points in a specific order based on distance to the decision boundary:

Sort training points by their "boundary proximity" score
Process points starting from those closest to estimated decision boundary
This tends to find smaller consistent subsets by prioritizing true border points

Comparison of CNN Variants
Variant	Key Modification	Advantage	Disadvantage
CNN (Original)	Greedy addition of misclassified points	Simple, guaranteed consistent	Not minimal, order-sensitive
RNN (Reduced)	Post-processing removal step	Smaller subsets than CNN alone	Additional O(n²) overhead
GCNN (Generalized)	Works with k-NN for k > 1	Consistent for k-NN classification	Larger subsets required
MCNN (Modified)	Boundary-prioritized ordering	Often finds smaller subsets	Requires boundary estimation
Fast CNN	Random sampling + incremental	Scales to larger datasets	Approximate consistency

Practical Recommendation

Practical Implementation Considerations

Implementing CNN effectively in production systems requires attention to several practical details that aren't evident from the theoretical algorithm.

Initialization Strategy

The choice of which points to initially add to STORE significantly affects the final result. Several strategies exist:

Random per Class: Select one random point from each class. Simple but may require multiple runs.

Centroid Seeding: Select the point closest to each class centroid. Tends to start with "typical" representatives.

Boundary Seeding: Use a quick heuristic to estimate boundary points and start with those. Often produces smaller final sets.

cnn_initialization.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
from sklearn.neighbors import NearestNeighbors
 
def initialize_store_centroid(X, y):
    """Initialize STORE with points nearest to class centroids"""
    classes = np.unique(y)
    initial_indices = []
    
    for c in classes:
        class_mask = y == c
        class_points = X[class_mask]
        class_indices = np.where(class_mask)[0]
        
        # Compute class centroid
        centroid = class_points.mean(axis=0)
        
        # Find point nearest to centroid
        distances = np.linalg.norm(class_points - centroid, axis=1)
        nearest_idx = class_indices[np.argmin(distances)]
        initial_indices.append(nearest_idx)
    
    return set(initial_indices)
 
def initialize_store_boundary(X, y, n_neighbors=5):
    """Initialize STORE with estimated boundary points"""
    n_samples = len(X)
    initial_indices = set()
    
    # Fit nearest neighbors on full dataset
    nn = NearestNeighbors(n_neighbors=n_neighbors + 1)
    nn.fit(X)
    
    # For each point, check if neighbors include different classes
    for i in range(n_samples):
        _, indices = nn.kneighbors(X[i].reshape(1, -1))
        neighbor_labels = y[indices[0][1:]]  # Exclude self
        
        # If neighbors contain different classes, this is a boundary point
        if len(np.unique(neighbor_labels)) > 1 or y[i] not in neighbor_labels:
            initial_indices.add(i)
    
    # Ensure at least one point per class
    classes = np.unique(y)
    for c in classes:
        if not any(y[i] == c for i in initial_indices):
            class_indices = np.where(y == c)[0]
            initial_indices.add(class_indices[0])
    
    return initial_indices

Batch vs. Incremental Processing

The original CNN processes the entire GRABBAG in each iteration. For very large datasets, this becomes inefficient. Incremental CNN processes points one at a time:

incremental_cnn.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def incremental_cnn(X, y, distance_metric='euclidean'):
    """
    Incremental CNN - Processes one point at a time
    More memory efficient for large datasets
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    n_samples = len(X)
    classes = np.unique(y)
    
    # Initialize with one point per class
    store_indices = []
    for c in classes:
        class_idx = np.where(y == c)[0][0]
        store_indices.append(class_idx)
    
    store_set = set(store_indices)
    remaining = set(range(n_samples)) - store_set
    
    # Single pass through remaining points
    for idx in list(remaining):
        X_store = X[list(store_set)]
        x_query = X[idx].reshape(1, -1)
        
        # Find nearest neighbor in current STORE
        distances = cdist(x_query, X_store, metric=distance_metric)
        nearest_store_idx = list(store_set)[np.argmin(distances)]
        
        # If misclassified, add to STORE
        if y[nearest_store_idx] != y[idx]:
            store_set.add(idx)
    
    return list(store_set)

Incremental CNN Trade-off

Handling Feature Scaling

Like all distance-based methods, CNN is sensitive to feature scales. Features with larger numeric ranges dominate the distance computation.

Best Practice: Standardize or normalize features before applying CNN:

Standardization (z-score): Centers features at 0, scales to unit variance
Min-max normalization: Scales all features to [0, 1]

The choice should match whatever preprocessing you'll use at prediction time. The same transformation must be applied to test data.

Memory-Efficient Distance Computation

For large datasets, materializing an $n \times n$ distance matrix is impractical. Use chunked distance computation:

chunked_distance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def chunked_nearest_neighbor(X_query, X_store, y_store, chunk_size=1000):
    """
    Memory-efficient nearest neighbor search
    Processes query points in chunks to avoid large distance matrices
    """
    import numpy as np
    from scipy.spatial.distance import cdist
    
    n_query = len(X_query)
    predictions = np.zeros(n_query, dtype=y_store.dtype)
    
    for start in range(0, n_query, chunk_size):
        end = min(start + chunk_size, n_query)
        chunk = X_query[start:end]
        
        # Only compute distances for this chunk
        distances = cdist(chunk, X_store)
        nearest_indices = np.argmin(distances, axis=1)
        predictions[start:end] = y_store[nearest_indices]
    
    return predictions

When to Use Condensed Nearest Neighbors

CNN is a powerful tool, but it's not universally applicable. Understanding its ideal use cases—and its limitations—helps you make informed decisions.

Ideal Conditions for CNN

CNN Works Best When

•Prediction speed is critical: Real-time classification with large training sets benefits enormously from reduced instance counts.
•Memory is constrained: Mobile devices or embedded systems often can't store full training datasets; CNN provides a principled reduction.
•Classes are well-separated: Compact clusters with clear margins achieve high reduction ratios (often 90%+).
•Data is relatively clean: CNN treats all training labels as ground truth; low label noise means reliable condensation.
•Training time is flexible: CNN's training overhead is acceptable if it enables much faster inference later.

CNN May Not Be Appropriate When

•Data is noisy: Label noise creates spurious boundary points that CNN retains, reducing effectiveness and potentially harming generalization.
•Classes heavily overlap: Interleaved distributions have many boundary points; reduction is minimal while computation is high.
•High dimensionality: Curse of dimensionality means most points are equidistant; CNN provides little reduction and k-d trees can't accelerate it.
•Data distribution shifts: If test data differs significantly from training, CNN's border points may be wrong for the new distribution.
•Minimum training is needed: If training time is critical and prediction volume is low, CNN's overhead may not pay off.

Good Use Case Example

Poor Use Case Example

Decision Framework

Use this framework to decide whether CNN fits your problem:

Estimate class separability: Plot or compute inter-class vs. intra-class distance ratios. High separation → CNN effective.
Assess label quality: Quantify noise rate if possible. > 5% noise → Consider Edited Nearest Neighbors first (covered in next page).
Measure dimensionality impact: For $d > 20$, test CNN on a sample first. If reduction < 50%, alternative methods may be better.
Compute ROI: Estimate $\text{(Prediction volume)} \times \text{(Speedup per prediction)} - \text{(CNN training time)}$. Positive ROI = use CNN.
Validate rigorously: Always evaluate CNN performance on held-out test data, not just training set consistency.

Summary: Condensed Nearest Neighbors

We've conducted a deep exploration of Condensed Nearest Neighbors—from its theoretical foundations to practical implementation. Let's consolidate the essential knowledge:

Key Takeaways

•CNN finds a consistent subset: A reduced training set that correctly classifies all original training points using 1-NN.
•It exploits redundancy: Interior points contribute nothing to decision boundaries; only border points matter.
•Greedy but effective: While not guaranteed minimal, CNN typically achieves 80-95% reduction for well-clustered data.
•Convergence is guaranteed: The algorithm always terminates with a valid consistent subset.
•Order matters: Different initializations and processing orders yield different (valid) results.
•Combine with RNN: Post-processing with Reduced Nearest Neighbors often shrinks the subset further.
•Noise is problematic: Label noise creates spurious boundary points that undermine reduction effectiveness.
•Preprocessing is essential: Feature scaling affects distance computations and thus CNN's behavior.

Connection to Next Topics

CNN focuses on retaining necessary instances—but what about removing harmful instances? Noisy labels and outliers can degrade KNN performance even after CNN condensation.

Page Complete

1 / 5