Loading learning content...
KNN regression is one of the most universally applicable nonparametric methods in machine learning. Given any regression dataset—regardless of the true relationship between features and target—KNN can provide reasonable predictions without assuming parametric forms.
This universality comes from a deep theoretical result: Stone's Consistency Theorem (1977) proves that KNN regression is consistent under mild conditions. As the number of training points $n \to \infty$ and the number of neighbors $k \to \infty$ with $k/n \to 0$, the KNN estimate converges to the true regression function.
But consistency doesn't mean optimality. In practice, KNN regression involves critical decisions about weighting, neighborhood size, local model complexity, and feature preparation. This page synthesizes everything we've learned into a complete framework for regression with KNN.
By the end of this page, you will understand the complete KNN regression pipeline from data preparation to deployment, master hyperparameter selection strategies, know when KNN regression excels and when to prefer alternatives, and be able to implement production-grade KNN regression systems.
Let's formalize the complete KNN regression framework, incorporating all the variants we've studied.
The General KNN Regression Estimator:
For a query point $\mathbf{x}$:
$$\hat{f}(\mathbf{x}) = \sum_{i=1}^{n} K^*(\mathbf{x}, \mathbf{x}_i) \cdot y_i$$
where $K^*$ is the effective kernel that may incorporate:
Hierarchy of Methods (Increasing Sophistication):
| Method | Local Model | Weighting | Complexity |
|---|---|---|---|
| k-NN Uniform | Constant | Equal | Lowest |
| k-NN Weighted | Constant | Distance-based | Low |
| Kernel Regression | Constant | Kernel-based | Low-Medium |
| Local Linear | Linear | Kernel-based | Medium |
| LOESS | Linear/Quadratic | Tricube + Robust | Medium-High |
| Local Polynomial | Arbitrary degree | Kernel-based | High |
Theoretical Properties:
1. Bias-Variance Decomposition:
For local constant (weighted KNN): $$\text{MSE}(\hat{f}(\mathbf{x})) = \underbrace{\frac{h^2 \mu_2(K)}{2} f''(\mathbf{x})^2}{\text{Bias}^2} + \underbrace{\frac{\sigma^2 R(K)}{nh^d p(\mathbf{x})}}{\text{Variance}}$$
2. Optimal Rate:
For interior points with smooth $f$, optimal bandwidth gives: $$\text{MSE} = O(n^{-4/(d+4)})$$
This deteriorates rapidly with dimension $d$ (curse of dimensionality).
3. Consistency:
KNN regression is consistent if $k \to \infty$ and $k/n \to 0$ as $n \to \infty$. A practical choice: $k \approx n^{4/(d+4)}$.
The optimal rate n^(-4/(d+4)) means: in 1D, MSE ~ n^(-4/5); in 10D, MSE ~ n^(-4/14) ≈ n^(-0.29); in 100D, MSE ~ n^(-4/104) ≈ n^(-0.04). High-dimensional KNN regression requires exponentially more data for the same accuracy.
Let's build a production-quality KNN regressor incorporating best practices from everything we've learned.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254
import numpy as npfrom scipy.spatial import KDTreefrom typing import Tuple, Optional, Literal class KNNRegressor: """ Production-quality K-Nearest Neighbors Regressor. Features: - Multiple weighting schemes - Adaptive bandwidth - Local linear option - Efficient KD-tree search - Feature scaling - Confidence estimates """ def __init__(self, k: int = 5, weights: Literal['uniform', 'distance', 'gaussian'] = 'distance', algorithm: Literal['constant', 'linear'] = 'constant', power: float = 2.0, leaf_size: int = 30): """ Parameters ---------- k : number of neighbors weights : 'uniform', 'distance' (inverse distance), 'gaussian' algorithm : 'constant' (weighted average) or 'linear' (local linear) power : power for inverse distance weighting leaf_size : leaf size for KD-tree """ self.k = k self.weights = weights self.algorithm = algorithm self.power = power self.leaf_size = leaf_size # Fitted attributes self.X_train_ = None self.y_train_ = None self.tree_ = None self.feature_means_ = None self.feature_stds_ = None self.y_mean_ = None self.y_std_ = None def fit(self, X: np.ndarray, y: np.ndarray, scale_features: bool = True, scale_target: bool = False): """ Fit KNN regressor. Parameters ---------- X : training features, shape (n_samples, n_features) y : training targets, shape (n_samples,) scale_features : whether to standardize features scale_target : whether to standardize target (for stability) """ X = np.atleast_2d(X) y = np.asarray(y) # Feature scaling if scale_features: self.feature_means_ = X.mean(axis=0) self.feature_stds_ = X.std(axis=0) self.feature_stds_[self.feature_stds_ < 1e-10] = 1.0 X_scaled = (X - self.feature_means_) / self.feature_stds_ else: self.feature_means_ = np.zeros(X.shape[1]) self.feature_stds_ = np.ones(X.shape[1]) X_scaled = X # Target scaling if scale_target: self.y_mean_ = y.mean() self.y_std_ = y.std() if self.y_std_ < 1e-10: self.y_std_ = 1.0 y_scaled = (y - self.y_mean_) / self.y_std_ else: self.y_mean_ = 0.0 self.y_std_ = 1.0 y_scaled = y self.X_train_ = X_scaled self.y_train_ = y_scaled self.tree_ = KDTree(X_scaled, leafsize=self.leaf_size) return self def _compute_weights(self, distances: np.ndarray) -> np.ndarray: """Compute weights from distances.""" if self.weights == 'uniform': return np.ones_like(distances) elif self.weights == 'distance': # Inverse distance with handling for zero safe_distances = np.maximum(distances, 1e-10) return 1.0 / (safe_distances ** self.power) elif self.weights == 'gaussian': # Adaptive bandwidth = k-th neighbor distance h = distances[-1] + 1e-10 u = distances / h return np.exp(-0.5 * u**2) else: raise ValueError(f"Unknown weights: {self.weights}") def _predict_constant(self, x_query: np.ndarray, indices: np.ndarray, distances: np.ndarray) -> Tuple[float, float]: """Predict using local constant (weighted average).""" weights = self._compute_weights(distances) weights = weights / weights.sum() y_neighbors = self.y_train_[indices] prediction = np.sum(weights * y_neighbors) # Variance estimate (for confidence) variance = np.sum(weights * (y_neighbors - prediction)**2) std_est = np.sqrt(variance + 1e-10) return prediction, std_est def _predict_linear(self, x_query: np.ndarray, indices: np.ndarray, distances: np.ndarray) -> Tuple[float, float]: """Predict using local linear regression.""" d = x_query.shape[0] weights = self._compute_weights(distances) X_local = self.X_train_[indices] y_local = self.y_train_[indices] # Centered design matrix X_centered = X_local - x_query design = np.column_stack([np.ones(len(indices)), X_centered]) # Weighted least squares W = np.diag(weights) XtWX = design.T @ W @ design XtWy = design.T @ W @ y_local # Regularization for stability XtWX += 1e-6 * np.eye(d + 1) try: beta = np.linalg.solve(XtWX, XtWy) prediction = beta[0] # Residual variance estimate residuals = y_local - design @ beta mse = np.sum(weights * residuals**2) / (weights.sum() + 1e-10) std_est = np.sqrt(mse) except np.linalg.LinAlgError: # Fall back to weighted average return self._predict_constant(x_query, indices, distances) return prediction, std_est def predict(self, X: np.ndarray, return_std: bool = False) -> np.ndarray: """ Predict regression target. Parameters ---------- X : query points, shape (n_queries, n_features) return_std : if True, also return standard deviation estimates Returns ------- predictions : shape (n_queries,) std_estimates : shape (n_queries,) if return_std=True """ X = np.atleast_2d(X) # Scale features X_scaled = (X - self.feature_means_) / self.feature_stds_ # Query KD-tree distances, indices = self.tree_.query(X_scaled, k=self.k) predictions = [] std_estimates = [] for i, (x_query, dists, idx) in enumerate(zip(X_scaled, distances, indices)): if self.algorithm == 'constant': pred, std = self._predict_constant(x_query, idx, dists) else: # linear pred, std = self._predict_linear(x_query, idx, dists) predictions.append(pred) std_estimates.append(std) # Inverse transform predictions predictions = np.array(predictions) * self.y_std_ + self.y_mean_ std_estimates = np.array(std_estimates) * self.y_std_ if return_std: return predictions, std_estimates return predictions def score(self, X: np.ndarray, y: np.ndarray) -> float: """ Compute R² score. """ y_pred = self.predict(X) ss_res = np.sum((y - y_pred)**2) ss_tot = np.sum((y - y.mean())**2) return 1 - ss_res / (ss_tot + 1e-10) # Demonstrationnp.random.seed(42) # Complex nonlinear functiondef true_function(x): return np.sin(2 * x[:, 0]) + 0.5 * x[:, 1]**2 - x[:, 0] * x[:, 1] # Generate datan_train, n_test = 500, 100X_train = np.random.uniform(-2, 2, (n_train, 2))y_train = true_function(X_train) + 0.3 * np.random.randn(n_train) X_test = np.random.uniform(-2, 2, (n_test, 2))y_test = true_function(X_test) print("KNN Regressor Comparison:")print("=" * 60) configs = [ {'k': 5, 'weights': 'uniform', 'algorithm': 'constant'}, {'k': 10, 'weights': 'distance', 'algorithm': 'constant'}, {'k': 20, 'weights': 'gaussian', 'algorithm': 'constant'}, {'k': 20, 'weights': 'gaussian', 'algorithm': 'linear'},] for cfg in configs: model = KNNRegressor(**cfg) model.fit(X_train, y_train) y_pred, y_std = model.predict(X_test, return_std=True) r2 = model.score(X_test, y_test) rmse = np.sqrt(np.mean((y_pred - y_test)**2)) print(f"\nk={cfg['k']}, weights={cfg['weights']}, algo={cfg['algorithm']}") print(f" R² = {r2:.4f}") print(f" RMSE = {rmse:.4f}") print(f" Mean uncertainty = {y_std.mean():.4f}")Selecting the right hyperparameters is crucial for KNN regression performance. The main hyperparameters are:
1. Number of Neighbors (k):
The most important hyperparameter. Rules of thumb:
2. Weighting Scheme:
3. Algorithm (Constant vs Linear):
4. Distance Metric:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
import numpy as npfrom sklearn.model_selection import GridSearchCV, cross_val_scorefrom sklearn.neighbors import KNeighborsRegressorfrom sklearn.preprocessing import StandardScalerfrom sklearn.pipeline import Pipeline def tune_knn_regressor(X: np.ndarray, y: np.ndarray, k_range: tuple = (3, 50), cv: int = 5) -> dict: """ Tune KNN regressor using grid search with cross-validation. Returns optimal hyperparameters and cross-validation scores. """ # Create pipeline with scaling pipeline = Pipeline([ ('scaler', StandardScaler()), ('knn', KNeighborsRegressor()) ]) # Determine k candidates based on data size n = len(y) k_min = max(k_range[0], 1) k_max = min(k_range[1], n // 2) # Logarithmically spaced k values k_candidates = np.unique(np.logspace( np.log10(k_min), np.log10(k_max), num=15 ).astype(int)) # Parameter grid param_grid = { 'knn__n_neighbors': k_candidates, 'knn__weights': ['uniform', 'distance'], 'knn__p': [1, 2], # Manhattan (1) vs Euclidean (2) } # Grid search grid_search = GridSearchCV( pipeline, param_grid, cv=cv, scoring='neg_mean_squared_error', n_jobs=-1, verbose=0 ) grid_search.fit(X, y) # Extract results best_params = grid_search.best_params_ best_score = -grid_search.best_score_ # Negate to get MSE # Get CV scores for best estimator cv_results = grid_search.cv_results_ best_idx = grid_search.best_index_ results = { 'best_k': best_params['knn__n_neighbors'], 'best_weights': best_params['knn__weights'], 'best_p': best_params['knn__p'], 'best_mse': best_score, 'best_rmse': np.sqrt(best_score), 'cv_std': cv_results['std_test_score'][best_idx], 'all_k_tested': k_candidates.tolist(), } return results def quick_k_selection(X: np.ndarray, y: np.ndarray, cv: int = 5) -> int: """ Quick heuristic k selection using cross-validation on a few candidates. """ n = len(y) # Candidates based on data size candidates = [ max(3, int(np.sqrt(n) / 2)), max(5, int(np.sqrt(n))), max(7, int(np.sqrt(n) * 2)), max(10, int(n ** 0.4)), ] candidates = sorted(list(set(candidates))) best_k = candidates[0] best_score = -np.inf for k in candidates: knn = Pipeline([ ('scaler', StandardScaler()), ('knn', KNeighborsRegressor(n_neighbors=k, weights='distance')) ]) scores = cross_val_score(knn, X, y, cv=cv, scoring='neg_mean_squared_error') mean_score = scores.mean() if mean_score > best_score: best_score = mean_score best_k = k return best_k # Demonstrationnp.random.seed(42) # Generate datasetn = 300X = np.random.uniform(-3, 3, (n, 3))y = np.sin(X[:, 0]) + X[:, 1]**2 - X[:, 0]*X[:, 2] + 0.5*np.random.randn(n) print("KNN Hyperparameter Tuning:")print("=" * 60) # Quick selectionquick_k = quick_k_selection(X, y)print(f"\nQuick k selection: k = {quick_k}") # Full grid searchresults = tune_knn_regressor(X, y)print(f"\nGrid Search Results:")for key, val in results.items(): if key != 'all_k_tested': print(f" {key}: {val}")For honest performance estimation, use nested cross-validation: an outer loop for performance estimation and an inner loop for hyperparameter selection. This prevents optimistic bias from hyperparameter tuning on the same data used for evaluation.
KNN regression is highly sensitive to feature preprocessing. Unlike tree-based methods, KNN uses distances directly, so feature scales and transformations profoundly affect results.
Essential Preprocessing Steps:
1. Standardization (Critical):
Always standardize features to zero mean and unit variance: $$x_j' = \frac{x_j - \mu_j}{\sigma_j}$$
Without standardization, features with larger scales dominate the distance calculation.
2. Handling Categorical Features:
KNN with Euclidean distance doesn't naturally handle categoricals. Options:
3. Missing Value Handling:
Options:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
import numpy as npfrom sklearn.preprocessing import StandardScaler, RobustScalerfrom sklearn.decomposition import PCAfrom sklearn.impute import SimpleImputerfrom sklearn.compose import ColumnTransformerfrom sklearn.pipeline import Pipelinefrom sklearn.neighbors import KNeighborsRegressor def create_knn_pipeline(numeric_features: list, categorical_features: list = None, use_pca: bool = False, pca_components: int = 10, robust_scaling: bool = False, k: int = 10) -> Pipeline: """ Create a complete preprocessing + KNN pipeline. """ from sklearn.preprocessing import OneHotEncoder # Choose scaler scaler = RobustScaler() if robust_scaling else StandardScaler() # Numeric preprocessing numeric_transformer = Pipeline([ ('imputer', SimpleImputer(strategy='median')), ('scaler', scaler), ]) # Categorical preprocessing if categorical_features: categorical_transformer = Pipeline([ ('imputer', SimpleImputer(strategy='constant', fill_value='missing')), ('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False)), ]) preprocessor = ColumnTransformer([ ('num', numeric_transformer, numeric_features), ('cat', categorical_transformer, categorical_features), ]) else: preprocessor = ColumnTransformer([ ('num', numeric_transformer, numeric_features), ]) # Build pipeline steps = [('preprocess', preprocessor)] if use_pca: steps.append(('pca', PCA(n_components=pca_components))) steps.append(('knn', KNeighborsRegressor(n_neighbors=k, weights='distance'))) return Pipeline(steps) # Demonstration: Impact of preprocessingnp.random.seed(42) # Generate data with very different feature scalesX_raw = np.column_stack([ np.random.uniform(0, 1, 200), # Feature 1: [0, 1] np.random.uniform(0, 1000, 200), # Feature 2: [0, 1000] np.random.uniform(-0.001, 0.001, 200) # Feature 3: [-0.001, 0.001]]) # Target depends on all features equallyy = X_raw[:, 0] * 10 + X_raw[:, 1] / 100 + X_raw[:, 2] * 10000 + 0.5*np.random.randn(200) # SplitX_train, X_test = X_raw[:150], X_raw[150:]y_train, y_test = y[:150], y[150:] print("Impact of Feature Scaling on KNN Regression:")print("=" * 60) # Without scalingknn_raw = KNeighborsRegressor(n_neighbors=10, weights='distance')knn_raw.fit(X_train, y_train)r2_raw = knn_raw.score(X_test, y_test)print(f"\nWithout scaling: R² = {r2_raw:.4f}") # With scalingfrom sklearn.pipeline import make_pipelineknn_scaled = make_pipeline(StandardScaler(), KNeighborsRegressor(n_neighbors=10, weights='distance'))knn_scaled.fit(X_train, y_train)r2_scaled = knn_scaled.score(X_test, y_test)print(f"With StandardScaler: R² = {r2_scaled:.4f}") # With robust scalingknn_robust = make_pipeline(RobustScaler(), KNeighborsRegressor(n_neighbors=10, weights='distance'))knn_robust.fit(X_train, y_train)r2_robust = knn_robust.score(X_test, y_test)print(f"With RobustScaler: R² = {r2_robust:.4f}") print(f"\nImprovement from scaling: {(r2_scaled - r2_raw) / (1 - r2_raw) * 100:.1f}% of remaining error")KNN regression has specific scenarios where it outperforms other methods. Understanding these helps you choose appropriately.
• Small to medium d (d ≤ 10) • Large n (n ≥ 1000) • Complex, unknown relationships • Spatial or temporal data • Need for local explanations • Prototype-based prediction
• High d (d > 20) • Small n • Simple linear relationships • Extrapolation required • Real-time low-latency needs • Mixed feature types
Empirical Comparison:
Studies comparing KNN to other regressors (linear, trees, neural networks) consistently find:
The practical lesson: try KNN as a baseline, but expect tree ensembles to outperform on most tabular benchmarks.
How does KNN regression compare to the main alternatives? Let's analyze systematically.
| Criterion | KNN | Linear Regression | Random Forest | Gradient Boosting | Neural Network |
|---|---|---|---|---|---|
| Dimensionality | Struggles d>10 | Any d | Handles high d | Handles high d | Handles very high d |
| Sample size needed | Moderate to high | Low | Moderate | Moderate | High |
| Nonlinearity | Automatic | Needs features | Automatic | Automatic | Automatic |
| Interpretability | High (examples) | High (coefficients) | Moderate (importance) | Low | Very low |
| Training speed | O(1) or O(n log n) | O(nd²) | O(T·n log n) | O(T·n log n) | O(T·n·d) |
| Prediction speed | O(n) or O(log n) | O(d) | O(T log n) | O(T log n) | O(d) |
| Extrapolation | Poor | Linear extrapolation | Poor | Poor | Variable |
| Missing data | Needs handling | Needs handling | Native support | Native support | Needs handling |
Key Differentiators:
vs. Linear Regression:
vs. Random Forests:
vs. Gradient Boosting (XGBoost, LightGBM):
vs. Neural Networks:
For general tabular regression, gradient boosting (XGBoost, LightGBM, CatBoost) is the current best practice. Use KNN when: (1) you need neighbor-based explanations, (2) dimension is low and you have dense data, (3) you're building a quick baseline, or (4) you're doing spatial/geographic prediction.
Deploying KNN regression in production involves unique challenges compared to parametric models.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
import numpy as npfrom sklearn.neighbors import KNeighborsRegressorfrom sklearn.preprocessing import StandardScalerimport pickleimport time class ProductionKNNRegressor: """ Production-ready KNN regressor with serialization, monitoring, and fallback capabilities. """ def __init__(self, k: int = 10, min_samples: int = 50, fallback_value: float = None, max_memory_mb: float = 1000): self.k = k self.min_samples = min_samples self.fallback_value = fallback_value self.max_memory_mb = max_memory_mb self.model = None self.scaler = None self.n_samples = 0 self.feature_names = None self.training_stats = {} def fit(self, X: np.ndarray, y: np.ndarray, feature_names: list = None): """Fit with production safeguards.""" X = np.asarray(X) y = np.asarray(y) # Check memory constraint memory_mb = X.nbytes / 1e6 if memory_mb > self.max_memory_mb: # Subsample to fit memory n_keep = int(len(X) * self.max_memory_mb / memory_mb) indices = np.random.choice(len(X), n_keep, replace=False) X = X[indices] y = y[indices] print(f"Subsampled to {n_keep} samples to fit memory constraint") self.n_samples = len(X) self.feature_names = feature_names # Store training statistics for monitoring self.training_stats = { 'feature_means': X.mean(axis=0), 'feature_stds': X.std(axis=0), 'target_mean': y.mean(), 'target_std': y.std(), 'n_samples': len(X), } # Set fallback if not provided if self.fallback_value is None: self.fallback_value = y.mean() # Fit scaler and model self.scaler = StandardScaler() X_scaled = self.scaler.fit_transform(X) self.model = KNeighborsRegressor( n_neighbors=min(self.k, len(X) - 1), weights='distance', algorithm='kd_tree' ) self.model.fit(X_scaled, y) return self def predict(self, X: np.ndarray, return_confidence: bool = False): """Predict with fallback and confidence.""" X = np.atleast_2d(X) # Check if model is fitted if self.model is None or self.n_samples < self.min_samples: predictions = np.full(len(X), self.fallback_value) confidences = np.zeros(len(X)) if return_confidence: return predictions, confidences return predictions # Scale features try: X_scaled = self.scaler.transform(X) except Exception as e: # Feature mismatch or other error predictions = np.full(len(X), self.fallback_value) if return_confidence: return predictions, np.zeros(len(X)) return predictions # Predict predictions = self.model.predict(X_scaled) if return_confidence: # Estimate confidence from neighbor distance consistency distances, indices = self.model.kneighbors(X_scaled) # Confidence = inverse of relative distance spread mean_dist = distances.mean(axis=1) std_dist = distances.std(axis=1) confidences = 1 / (1 + std_dist / (mean_dist + 1e-10)) return predictions, confidences return predictions def check_feature_drift(self, X: np.ndarray, threshold: float = 2.0) -> dict: """Check for feature drift from training distribution.""" X = np.atleast_2d(X) # Compute z-scores relative to training z_scores = np.abs( (X.mean(axis=0) - self.training_stats['feature_means']) / (self.training_stats['feature_stds'] + 1e-10) ) drift_detected = z_scores > threshold return { 'drift_detected': drift_detected.any(), 'features_with_drift': np.where(drift_detected)[0].tolist(), 'z_scores': z_scores.tolist(), } def save(self, filepath: str): """Serialize model to disk.""" state = { 'model': self.model, 'scaler': self.scaler, 'k': self.k, 'n_samples': self.n_samples, 'fallback_value': self.fallback_value, 'training_stats': self.training_stats, } with open(filepath, 'wb') as f: pickle.dump(state, f) @classmethod def load(cls, filepath: str): """Load model from disk.""" with open(filepath, 'rb') as f: state = pickle.load(f) instance = cls(k=state['k']) instance.model = state['model'] instance.scaler = state['scaler'] instance.n_samples = state['n_samples'] instance.fallback_value = state['fallback_value'] instance.training_stats = state['training_stats'] return instance # Usage demonstrationnp.random.seed(42) # Training dataX_train = np.random.randn(1000, 5)y_train = X_train.sum(axis=1) + 0.5 * np.random.randn(1000) # Create and fitmodel = ProductionKNNRegressor(k=15, min_samples=50)model.fit(X_train, y_train) # Normal prediction with confidenceX_test = np.random.randn(10, 5)preds, confs = model.predict(X_test, return_confidence=True)print("Production KNN Regression:")print("=" * 50)print(f"Predictions: {preds[:3]}")print(f"Confidences: {confs[:3]}") # Check for driftX_drifted = np.random.randn(100, 5) + 3 # Shifted distributiondrift_report = model.check_feature_drift(X_drifted)print(f"\nDrift check: {drift_report}")Congratulations! You have completed the Weighted KNN module. You now understand distance weighting, kernel methods, adaptive neighborhoods, local models, and KNN regression at a depth suitable for research and production applications. The next module explores KNN Variants—specialized modifications for specific scenarios like condensed NN, edited NN, and metric learning.