Loading learning content...
Understanding grid search theory is essential, but successfully applying it in real-world projects requires practical wisdom accumulated from thousands of experiments. This page distills that experience into actionable guidelines.
We address the questions that arise in every hyperparameter optimization project:
These guidelines apply across algorithms and domains, forming a practical framework for effective hyperparameter optimization.
By the end of this page, you will have a practical toolkit for designing grid searches: algorithm-specific range recommendations, resolution heuristics, common grid patterns, debugging strategies, and a workflow that minimizes wasted computation while maximizing optimization quality.
Selecting appropriate ranges for each hyperparameter is the most critical design decision in grid search. Too narrow ranges miss optimal values; too wide ranges waste budget on implausible regions.
Sources for Range Selection:
Library Defaults: Default values in scikit-learn, XGBoost, etc., are carefully chosen. Build ranges around these defaults.
Published Research: Papers often report hyperparameter ranges that worked for similar problems.
Theoretical Bounds: Some hyperparameters have natural constraints (e.g., probabilities in [0,1]).
Prior Experience: Your own experiments on similar datasets provide valuable guidance.
Domain Knowledge: Understanding the algorithm helps set sensible bounds.
| Algorithm | Hyperparameter | Recommended Range | Scale | Notes |
|---|---|---|---|---|
| Gradient Boosting | learning_rate | 0.01 - 0.3 | Log | Inverse relationship with n_estimators |
| Gradient Boosting | n_estimators | 100 - 1000 | Linear | More = better but slower; use early stopping |
| Gradient Boosting | max_depth | 3 - 10 | Linear | Deeper = more complex; 3-7 common |
| Gradient Boosting | min_samples_split | 2 - 20 | Linear | Higher = more regularization |
| Random Forest | n_estimators | 100 - 500 | Linear | Diminishing returns after ~300 |
| Random Forest | max_features | sqrt, log2, 0.3-0.8 | Categorical/Linear | Controls variance-bias trade-off |
| SVM | C | 0.01 - 100 | Log | Regularization strength (inverse) |
| SVM | gamma (RBF) | 0.001 - 10 | Log | Kernel bandwidth (with scale) |
| Neural Network | learning_rate | 0.0001 - 0.01 | Log | Algorithm-specific defaults vary |
| Neural Network | alpha (L2) | 0.0001 - 0.1 | Log | Weight decay regularization |
| Ridge/Lasso | alpha | 0.001 - 100 | Log | Wide range; data-dependent |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
import numpy as npfrom typing import Dict, List, Any # ============================================================# Curated Grid Templates for Common Algorithms# ============================================================ def get_gradient_boosting_grid(granularity: str = 'medium') -> Dict[str, List]: """ Production-tested grid for Gradient Boosting (GBM/XGBoost/LightGBM). These ranges come from extensive empirical testing across hundreds of tabular datasets. """ grids = { 'coarse': { 'learning_rate': [0.01, 0.05, 0.1, 0.2], 'n_estimators': [100, 200, 500], 'max_depth': [3, 5, 7], 'min_samples_split': [2, 10], 'subsample': [0.8, 1.0], }, 'medium': { 'learning_rate': [0.01, 0.03, 0.05, 0.1, 0.15, 0.2], 'n_estimators': [100, 200, 300, 500], 'max_depth': [3, 4, 5, 6, 7, 8], 'min_samples_split': [2, 5, 10, 20], 'min_samples_leaf': [1, 2, 5], 'subsample': [0.7, 0.8, 0.9, 1.0], 'max_features': ['sqrt', 'log2', 0.5, 0.8], }, 'fine': { 'learning_rate': list(np.logspace(-2, -0.5, 8)), # 0.01 to 0.316 'n_estimators': [100, 150, 200, 300, 400, 500, 750, 1000], 'max_depth': list(range(3, 12)), 'min_samples_split': [2, 3, 5, 10, 15, 20], 'min_samples_leaf': [1, 2, 3, 5, 10], 'subsample': list(np.arange(0.6, 1.05, 0.1)), 'max_features': ['sqrt', 'log2', 0.3, 0.5, 0.7, 0.9, None], }, } return grids[granularity] def get_random_forest_grid(granularity: str = 'medium') -> Dict[str, List]: """ Production-tested grid for Random Forest. Random forests are relatively robust to hyperparameters; n_estimators and max_features are most important. """ grids = { 'coarse': { 'n_estimators': [100, 200, 300], 'max_depth': [None, 10, 20], 'max_features': ['sqrt', 'log2'], 'min_samples_split': [2, 5], }, 'medium': { 'n_estimators': [100, 200, 300, 500], 'max_depth': [None, 10, 15, 20, 30], 'max_features': ['sqrt', 'log2', 0.5], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4], 'bootstrap': [True, False], }, 'fine': { 'n_estimators': [100, 150, 200, 300, 400, 500], 'max_depth': [None, 5, 10, 15, 20, 25, 30], 'max_features': ['sqrt', 'log2', 0.3, 0.5, 0.7, None], 'min_samples_split': [2, 3, 5, 10, 15], 'min_samples_leaf': [1, 2, 3, 5], 'bootstrap': [True, False], 'class_weight': [None, 'balanced'], }, } return grids[granularity] def get_svm_grid(granularity: str = 'medium') -> Dict[str, List]: """ Production-tested grid for SVM. C and gamma are the most important hyperparameters. Wide log-scale ranges are essential. """ grids = { 'coarse': { 'C': [0.1, 1, 10, 100], 'gamma': ['scale', 'auto', 0.01, 0.1, 1], 'kernel': ['rbf', 'linear'], }, 'medium': { 'C': list(np.logspace(-2, 3, 6)), # 0.01 to 1000 'gamma': ['scale', 'auto'] + list(np.logspace(-3, 1, 5)), 'kernel': ['rbf', 'linear', 'poly'], 'degree': [2, 3, 4], # Only for poly kernel }, 'fine': { 'C': list(np.logspace(-3, 4, 8)), 'gamma': ['scale', 'auto'] + list(np.logspace(-4, 2, 7)), 'kernel': ['rbf', 'linear', 'poly', 'sigmoid'], 'degree': [2, 3, 4, 5], 'coef0': [0, 0.1, 0.5, 1], }, } return grids[granularity] def get_neural_network_grid(granularity: str = 'medium') -> Dict[str, List]: """ Production-tested grid for scikit-learn MLPClassifier/Regressor. Architecture and learning rate are most impactful. """ grids = { 'coarse': { 'hidden_layer_sizes': [(50,), (100,), (100, 50)], 'alpha': [0.0001, 0.001, 0.01], 'learning_rate_init': [0.001, 0.01], 'activation': ['relu', 'tanh'], }, 'medium': { 'hidden_layer_sizes': [ (50,), (100,), (200,), (100, 50), (100, 100), (200, 100), ], 'alpha': [0.0001, 0.001, 0.01, 0.1], 'learning_rate_init': [0.0001, 0.001, 0.005, 0.01], 'activation': ['relu', 'tanh'], 'batch_size': [32, 64, 128], 'early_stopping': [True], }, 'fine': { 'hidden_layer_sizes': [ (32,), (64,), (100,), (128,), (200,), (256,), (64, 32), (100, 50), (128, 64), (200, 100), (100, 100), (128, 128), (200, 100, 50), ], 'alpha': list(np.logspace(-5, -1, 5)), 'learning_rate_init': list(np.logspace(-4, -2, 5)), 'activation': ['relu', 'tanh', 'logistic'], 'batch_size': [16, 32, 64, 128, 256], 'solver': ['adam', 'sgd'], 'early_stopping': [True], }, } return grids[granularity] def estimate_grid_cost(grid: Dict[str, List]) -> Dict: """Calculate grid statistics.""" n_configs = 1 for values in grid.values(): n_configs *= len(values) return { 'n_params': len(grid), 'dimensions': [len(v) for v in grid.values()], 'n_configs': n_configs, 'with_5fold_cv': n_configs * 5, } # Display grid templatesfor algo, getter in [ ('Gradient Boosting', get_gradient_boosting_grid), ('Random Forest', get_random_forest_grid), ('SVM', get_svm_grid), ('Neural Network', get_neural_network_grid),]: print(f"{'='*50}") print(f"{algo} Grids") print('='*50) for level in ['coarse', 'medium', 'fine']: grid = getter(level) stats = estimate_grid_cost(grid) print(f"{level.upper()}: {stats['n_configs']:,} configs " f"({stats['with_5fold_cv']:,} with 5-fold CV)")Begin with ranges centered around defaults. If the best values are at range boundaries, expand in that direction and re-run. If best values are in the interior, consider refining with a finer local grid. This iterative approach is more efficient than starting with overly wide ranges.
Choosing the number of values per hyperparameter involves balancing coverage against computational cost. Several heuristics guide this decision.
Heuristic 1: The 3-5-10 Rule
For continuous hyperparameters:
More than 10 values rarely improves results unless the objective is known to be highly sensitive.
Heuristic 2: Log-Spacing for Orders of Magnitude
When a hyperparameter spans multiple orders of magnitude (e.g., regularization from 0.001 to 100), use logarithmic spacing:
# Linear spacing (BAD): 0.001, 25.25, 50.5, 75.75, 100
# Log spacing (GOOD): 0.001, 0.01, 0.1, 1, 10, 100
Log spacing ensures each order of magnitude is represented.
Heuristic 3: Budget-First Calculation
Given a computational budget (e.g., 1000 model evaluations), work backward:
$$n_{\text{per dim}} = \lfloor \text{budget}^{1/d} \rfloor$$
With 1000 budget and 5 hyperparameters: $1000^{1/5} \approx 4$ values per dimension.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
import numpy as npfrom typing import Dict, List, Tuple, Any def calculate_resolution_from_budget( n_hyperparameters: int, total_budget: int, cv_folds: int = 5, min_per_dim: int = 2, max_per_dim: int = 10) -> int: """ Calculate recommended grid resolution given a computational budget. Parameters: ----------- n_hyperparameters: Number of hyperparameters to tune total_budget: Maximum number of model training runs cv_folds: Number of cross-validation folds Returns: -------- Recommended number of values per hyperparameter """ # Budget available for configurations (accounting for CV) config_budget = total_budget // cv_folds # Calculate resolution resolution = int(config_budget ** (1 / n_hyperparameters)) # Apply bounds resolution = max(min_per_dim, min(max_per_dim, resolution)) # Actual grid size actual_configs = resolution ** n_hyperparameters actual_trains = actual_configs * cv_folds print(f"Budget Analysis:") print(f" Hyperparameters: {n_hyperparameters}") print(f" Total budget: {total_budget} training runs") print(f" CV folds: {cv_folds}") print(f" → Recommended resolution: {resolution} values per dim") print(f" → Grid size: {actual_configs} configurations") print(f" → Total trains: {actual_trains} (uses {100*actual_trains/total_budget:.1f}% of budget)") return resolution def adaptive_resolution_grid( param_sensitivities: Dict[str, float], total_budget: int, cv_folds: int = 5, min_resolution: int = 2, max_resolution: int = 8) -> Dict[str, int]: """ Assign different resolutions based on hyperparameter sensitivity. More sensitive hyperparameters get finer resolution. Less sensitive ones get coarser resolution. This allocates budget where it matters most. """ # Normalize sensitivities total_sens = sum(param_sensitivities.values()) normalized = {k: v/total_sens for k, v in param_sensitivities.items()} # Initial equal resolution n_params = len(param_sensitivities) config_budget = total_budget // cv_folds base_resolution = int(config_budget ** (1 / n_params)) # Adjust based on sensitivity resolutions = {} for param, sens in normalized.items(): # Higher sensitivity → higher resolution # Scale from -1 to +1 adjustment adjustment = (sens - 1/n_params) * n_params # Deviation from uniform adjust_factor = 1 + adjustment # Factor from ~0.5 to ~1.5 resolution = int(base_resolution * adjust_factor) resolution = max(min_resolution, min(max_resolution, resolution)) resolutions[param] = resolution # Calculate actual grid size actual_configs = 1 for r in resolutions.values(): actual_configs *= r print("Adaptive Resolution Allocation:") print("-" * 50) for param, sens in sorted(normalized.items(), key=lambda x: -x[1]): print(f" {param}: sensitivity={sens:.3f}, resolution={resolutions[param]}") print(f"Total configurations: {actual_configs}") print(f"With {cv_folds}-fold CV: {actual_configs * cv_folds} training runs") return resolutions def create_multi_resolution_grid( param_config: Dict[str, Tuple[float, float, str, int]],) -> Dict[str, List[float]]: """ Create a grid with different resolutions per hyperparameter. Parameters: ----------- param_config: Dict mapping param name to (min, max, scale, resolution) scale is 'linear' or 'log' resolution is number of values for this parameter Example: param_config = { 'learning_rate': (0.001, 0.1, 'log', 6), # 6 points, log-spaced 'max_depth': (3, 10, 'linear', 8), # 8 points, linear 'subsample': (0.6, 1.0, 'linear', 3), # 3 points, linear } """ grid = {} for param, (lo, hi, scale, resolution) in param_config.items(): if scale == 'log': values = list(np.logspace(np.log10(lo), np.log10(hi), resolution)) else: values = list(np.linspace(lo, hi, resolution)) grid[param] = values # Print grid details print("Multi-Resolution Grid:") total = 1 for param, values in grid.items(): print(f" {param}: {len(values)} values") print(f" {[f'{v:.4f}' if isinstance(v, float) else v for v in values]}") total *= len(values) print(f"Total configurations: {total}") return grid # Demonstrationprint("=== Resolution Calculation Examples ===") # Example 1: Fixed budgetcalculate_resolution_from_budget( n_hyperparameters=5, total_budget=5000, cv_folds=5) # Example 2: Adaptive resolutionprint("" + "="*50)sensitivities = { 'learning_rate': 0.4, # Most sensitive 'n_estimators': 0.25, # Moderately sensitive 'max_depth': 0.2, # Moderately sensitive 'min_samples_split': 0.1, # Less sensitive 'subsample': 0.05, # Least sensitive}adaptive_resolution_grid(sensitivities, total_budget=5000) # Example 3: Multi-resolution gridprint("" + "="*50)config = { 'learning_rate': (0.01, 0.3, 'log', 6), 'n_estimators': (100, 500, 'linear', 5), 'max_depth': (3, 10, 'linear', 8), 'subsample': (0.7, 1.0, 'linear', 3),}create_multi_resolution_grid(config)Performance improvement typically follows a logarithmic curve with grid resolution. Going from 3 to 5 values per hyperparameter often yields significant improvement. Going from 7 to 10 rarely does. Invest in more hyperparameters at coarser resolution rather than fewer at fine resolution.
Experienced practitioners employ established patterns that encode best practices for specific scenarios.
Pattern 1: Coarse-to-Fine Refinement
Two-phase approach:
Pattern 2: Staged Tuning by Importance
Sequential stages:
Pattern 3: Grid + Random Hybrid
Combine grid and random search:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230
import numpy as npfrom sklearn.model_selection import GridSearchCV, RandomizedSearchCVfrom sklearn.ensemble import GradientBoostingClassifierfrom scipy.stats import uniform, randintfrom typing import Dict, Any def coarse_to_fine_pattern(estimator, X, y, initial_grid: Dict[str, list], refinement_factor: float = 0.5, cv: int = 5): """ Pattern 1: Coarse-to-Fine Refinement 1. Run coarse grid search 2. Identify best values 3. Create refined grid in neighborhood of best values 4. Run fine grid search """ print("=== Coarse-to-Fine Grid Search ===") # Phase 1: Coarse search print("Phase 1: Coarse Grid") coarse_size = np.prod([len(v) for v in initial_grid.values()]) print(f" Grid size: {coarse_size}") coarse_search = GridSearchCV(estimator, initial_grid, cv=cv, n_jobs=-1) coarse_search.fit(X, y) best_coarse = coarse_search.best_params_ print(f" Best params: {best_coarse}") print(f" Best score: {coarse_search.best_score_:.4f}") # Phase 2: Create refined grid around best values print("Phase 2: Refined Grid") refined_grid = {} for param, values in initial_grid.items(): best_val = best_coarse[param] if isinstance(best_val, (int, float)): # Numeric parameter: create neighborhood values_arr = np.array(values) best_idx = np.argmin(np.abs(values_arr - best_val)) # Spacing in original grid if len(values) > 1: spacing = values[1] - values[0] if best_idx == 0 else values[best_idx] - values[best_idx-1] else: spacing = best_val * 0.5 # Create refined values around best refined_range = spacing * refinement_factor new_values = np.linspace( max(values[0], best_val - refined_range), min(values[-1], best_val + refined_range), 5 # 5 refined points ) refined_grid[param] = list(new_values) else: # Categorical: keep just the best value refined_grid[param] = [best_val] refined_size = np.prod([len(v) for v in refined_grid.values()]) print(f" Refined grid size: {refined_size}") fine_search = GridSearchCV(estimator, refined_grid, cv=cv, n_jobs=-1) fine_search.fit(X, y) print(f" Best params: {fine_search.best_params_}") print(f" Best score: {fine_search.best_score_:.4f}") print(f"Total evaluations: {coarse_size + refined_size}") return fine_search.best_params_, fine_search.best_score_ def staged_tuning_pattern(estimator_class, X, y, cv: int = 5): """ Pattern 2: Staged Tuning by Importance Tune hyperparameters in order of importance: 1. Learning dynamics (learning_rate) 2. Model capacity (depth, estimators) 3. Regularization and minor params Each stage uses optimized values from previous stages. """ print("=== Staged Tuning Pattern ===") current_params = {} # Stage 1: Learning dynamics print("Stage 1: Learning Dynamics") stage1_grid = { 'learning_rate': [0.01, 0.03, 0.05, 0.1, 0.15, 0.2], 'n_estimators': [100, 200, 300, 500], } stage1_size = np.prod([len(v) for v in stage1_grid.values()]) print(f" Grid size: {stage1_size}") search1 = GridSearchCV(estimator_class(**current_params), stage1_grid, cv=cv, n_jobs=-1) search1.fit(X, y) current_params.update(search1.best_params_) print(f" Best: {search1.best_params_}, score: {search1.best_score_:.4f}") # Stage 2: Model capacity print("Stage 2: Model Capacity") stage2_grid = { 'max_depth': [3, 4, 5, 6, 7, 8], 'min_samples_split': [2, 5, 10, 20], } stage2_size = np.prod([len(v) for v in stage2_grid.values()]) print(f" Grid size: {stage2_size}") search2 = GridSearchCV(estimator_class(**current_params), stage2_grid, cv=cv, n_jobs=-1) search2.fit(X, y) current_params.update(search2.best_params_) print(f" Best: {search2.best_params_}, score: {search2.best_score_:.4f}") # Stage 3: Regularization print("Stage 3: Regularization") stage3_grid = { 'min_samples_leaf': [1, 2, 5], 'subsample': [0.7, 0.8, 0.9, 1.0], 'max_features': ['sqrt', 'log2', None], } stage3_size = np.prod([len(v) for v in stage3_grid.values()]) print(f" Grid size: {stage3_size}") search3 = GridSearchCV(estimator_class(**current_params), stage3_grid, cv=cv, n_jobs=-1) search3.fit(X, y) current_params.update(search3.best_params_) print(f" Best: {search3.best_params_}, score: {search3.best_score_:.4f}") total_evals = stage1_size + stage2_size + stage3_size print(f"Final Parameters: {current_params}") print(f"Final Score: {search3.best_score_:.4f}") print(f"Total evaluations: {total_evals} (vs {stage1_size * stage2_size * stage3_size} for full grid)") return current_params, search3.best_score_ def grid_random_hybrid_pattern(estimator, X, y, primary_grid: Dict[str, list], secondary_distributions: Dict[str, Any], n_random_iter: int = 50, cv: int = 5): """ Pattern 3: Grid + Random Hybrid - Grid search on primary (most important) hyperparameters - Random search on secondary hyperparameters Combines the thoroughness of grid on key hyperparameters with the efficiency of random on less critical ones. """ print("=== Grid + Random Hybrid ===") # Step 1: Grid search on primary hyperparameters print("Step 1: Grid search on primary hyperparameters") primary_size = np.prod([len(v) for v in primary_grid.values()]) print(f" Primary grid size: {primary_size}") print(f" Primary params: {list(primary_grid.keys())}") grid_search = GridSearchCV(estimator, primary_grid, cv=cv, n_jobs=-1) grid_search.fit(X, y) best_primary = grid_search.best_params_ print(f" Best primary: {best_primary}") print(f" Score: {grid_search.best_score_:.4f}") # Step 2: Random search on secondary, with primary fixed print("Step 2: Random search on secondary hyperparameters") print(f" Random iterations: {n_random_iter}") print(f" Secondary params: {list(secondary_distributions.keys())}") # Combine fixed primary with random secondary full_distributions = { **{k: [v] for k, v in best_primary.items()}, # Fixed primary **secondary_distributions, # Random secondary } random_search = RandomizedSearchCV( estimator, full_distributions, n_iter=n_random_iter, cv=cv, n_jobs=-1, random_state=42 ) random_search.fit(X, y) print(f" Best params: {random_search.best_params_}") print(f" Score: {random_search.best_score_:.4f}") total_evals = primary_size + n_random_iter print(f"Total evaluations: {total_evals}") return random_search.best_params_, random_search.best_score_ # Example usageif __name__ == "__main__": from sklearn.datasets import make_classification X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, random_state=42) print("" + "="*60) print("PATTERN DEMONSTRATIONS") print("="*60) # Pattern 1 initial_grid = { 'learning_rate': [0.01, 0.1, 0.2], 'n_estimators': [100, 200, 300], 'max_depth': [3, 5, 7], } coarse_to_fine_pattern(GradientBoostingClassifier(), X, y, initial_grid) print("" + "="*60) # Pattern 2 staged_tuning_pattern(GradientBoostingClassifier, X, y)When grid search produces poor or unexpected results, systematic debugging reveals the root cause.
Problem 1: Best Value at Grid Boundary
Symptom: The optimal hyperparameter is at the minimum or maximum of your grid range.
Cause: The true optimum lies outside your grid.
Solution: Extend the grid in that direction and re-run.
Problem 2: High Variance Across CV Folds
Symptom: Standard deviation of CV scores is large relative to the mean.
Cause: Insufficient data, high model variance, or overfitting on certain folds.
Solution: Use more CV folds, more data, or regularization.
Problem 3: Training Score >> Validation Score
Symptom: Large gap between train and validation performance.
Cause: Overfitting — model is too complex for the data.
Solution: Increase regularization, reduce model complexity.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210
import numpy as npimport pandas as pdfrom sklearn.model_selection import GridSearchCVfrom typing import Dict, Any def diagnose_grid_search_results(grid_search_cv: GridSearchCV) -> Dict[str, Any]: """ Comprehensive diagnostic analysis of grid search results. Identifies common problems and suggests remedies. """ results = pd.DataFrame(grid_search_cv.cv_results_) diagnostics = { 'issues': [], 'recommendations': [], } # === Check 1: Best at boundary === for param in grid_search_cv.param_grid.keys(): param_col = f'param_{param}' if param_col in results.columns: param_values = results[param_col].dropna().unique() if len(param_values) > 1: best_value = grid_search_cv.best_params_[param] # Check if best is at min or max try: sorted_values = sorted(param_values) if best_value == sorted_values[0]: diagnostics['issues'].append( f"Best {param}={best_value} is at MINIMUM of grid" ) diagnostics['recommendations'].append( f"Extend {param} grid to smaller values" ) elif best_value == sorted_values[-1]: diagnostics['issues'].append( f"Best {param}={best_value} is at MAXIMUM of grid" ) diagnostics['recommendations'].append( f"Extend {param} grid to larger values" ) except TypeError: pass # Non-comparable types # === Check 2: High CV variance === best_idx = grid_search_cv.best_index_ n_splits = grid_search_cv.n_splits_ cv_scores = [results[f'split{i}_test_score'][best_idx] for i in range(n_splits)] cv_mean = np.mean(cv_scores) cv_std = np.std(cv_scores) cv_coeff_var = cv_std / cv_mean if cv_mean != 0 else float('inf') if cv_coeff_var > 0.1: # >10% coefficient of variation diagnostics['issues'].append( f"High CV variance: {cv_std:.4f} (CV={cv_coeff_var:.2%})" ) diagnostics['recommendations'].append( "Consider more CV folds, more data, or regularization" ) # === Check 3: Overfitting (train >> test gap) === if 'mean_train_score' in results.columns: train_score = results['mean_train_score'][best_idx] test_score = results['mean_test_score'][best_idx] gap = train_score - test_score relative_gap = gap / train_score if train_score != 0 else 0 if relative_gap > 0.1: # >10% relative gap diagnostics['issues'].append( f"Overfitting detected: train={train_score:.4f}, " f"test={test_score:.4f}, gap={gap:.4f}" ) diagnostics['recommendations'].append( "Increase regularization or reduce model complexity" ) # === Check 4: Flat response surface === score_range = results['mean_test_score'].max() - results['mean_test_score'].min() if score_range < 0.01: # Less than 1% variation diagnostics['issues'].append( f"Near-flat response: score range = {score_range:.4f}" ) diagnostics['recommendations'].append( "Hyperparameters may not matter much; use defaults" ) # === Check 5: Multiple near-optimal configurations === best_score = grid_search_cv.best_score_ near_optimal = results[results['mean_test_score'] >= best_score - 0.005] if len(near_optimal) > 5: diagnostics['issues'].append( f"{len(near_optimal)} configurations within 0.5% of best" ) diagnostics['recommendations'].append( "Choose simpler/faster configuration among near-optimal" ) # Print diagnostic report print("="*60) print("GRID SEARCH DIAGNOSTIC REPORT") print("="*60) print(f"Best Score: {grid_search_cv.best_score_:.4f}") print(f"Best Parameters: {grid_search_cv.best_params_}") print(f"Configurations Evaluated: {len(results)}") if diagnostics['issues']: print(f"{'⚠ ISSUES DETECTED':=^60}") for i, issue in enumerate(diagnostics['issues'], 1): print(f" {i}. {issue}") print(f"{'💡 RECOMMENDATIONS':=^60}") for i, rec in enumerate(diagnostics['recommendations'], 1): print(f" {i}. {rec}") else: print(f"{'✓ No issues detected':=^60}") return diagnostics def visualize_grid_search(grid_search_cv: GridSearchCV, param1: str, param2: str = None): """ Visualize grid search results for 1D or 2D parameter spaces. For debugging, visualization often reveals patterns that tables miss: local optima, interactions, smoothness. """ results = pd.DataFrame(grid_search_cv.cv_results_) if param2 is None: # 1D visualization param_values = results[f'param_{param1}'].values scores = results['mean_test_score'].values stds = results['std_test_score'].values print(f"{param1} vs Score:") print("-" * 40) for val, score, std in sorted(zip(param_values, scores, stds)): bar = '█' * int(score * 50) print(f" {val:>10}: {score:.4f} ± {std:.4f} {bar}") else: # 2D visualization (heatmap-style) p1_vals = sorted(results[f'param_{param1}'].unique()) p2_vals = sorted(results[f'param_{param2}'].unique()) print(f"{param1} vs {param2} (scores):") print("-" * 60) # Header header = f"{'':>12} | " + " | ".join(f"{v:>8}" for v in p2_vals) print(header) print("-" * len(header)) for p1 in p1_vals: row_scores = [] for p2 in p2_vals: mask = (results[f'param_{param1}'] == p1) & (results[f'param_{param2}'] == p2) if mask.any(): score = results.loc[mask, 'mean_test_score'].values[0] row_scores.append(f"{score:.4f}") else: row_scores.append(" - ") print(f"{p1:>12} | " + " | ".join(row_scores)) # Example usagedef demonstrate_diagnostics(): from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification X, y = make_classification(n_samples=500, n_features=20, random_state=42) # Intentionally problematic grid (best at boundary) param_grid = { 'n_estimators': [100, 150, 200], # Best might be > 200 'max_depth': [2, 3, 4], # Too shallow 'min_samples_split': [2, 5, 10], } grid_search = GridSearchCV( RandomForestClassifier(random_state=42), param_grid, cv=5, return_train_score=True, n_jobs=-1 ) grid_search.fit(X, y) # Run diagnostics diagnostics = diagnose_grid_search_results(grid_search) # Visualize visualize_grid_search(grid_search, 'max_depth') visualize_grid_search(grid_search, 'n_estimators', 'max_depth') return diagnostics demonstrate_diagnostics()Before, during, and after grid search, follow this checklist to ensure robust results.
Never use the test set during hyperparameter tuning. If you peek at test performance and adjust hyperparameters accordingly, the test set is contaminated and no longer provides unbiased performance estimates. Use a separate validation set for tuning.
Different algorithms have specific tuning patterns that experienced practitioners know.
Gradient Boosting (XGBoost, LightGBM, CatBoost):
Random Forest:
SVM:
| Algorithm | Highest Priority | Medium Priority | Lower Priority |
|---|---|---|---|
| XGBoost/LightGBM | learning_rate, n_estimators | max_depth, min_child_weight | subsample, colsample, reg_lambda |
| Random Forest | max_features, n_estimators | max_depth, min_samples_split | bootstrap, class_weight |
| SVM (RBF) | C, gamma | kernel (if uncertain) | degree (poly), coef0 |
| Neural Network | learning_rate, hidden_sizes | alpha (regularization), batch_size | activation, solver |
| Ridge/Lasso | alpha | — (only alpha matters) | — |
For gradient boosting and neural networks, don't grid search n_estimators or epochs. Instead, set a high value and use early stopping with a validation set. This is faster and often produces better results.
We have distilled practical wisdom for designing and executing effective grid searches. Let's consolidate the key guidelines:
What's Next:
With ranges, resolution, patterns, and debugging strategies in hand, the final question is: when does grid search actually work well? The next page synthesizes everything into clear decision criteria for when grid search is the right choice versus when to use alternatives.
You now have a practical toolkit for grid search: range templates, resolution heuristics, common patterns, debugging techniques, and algorithm-specific guidance. Apply these guidelines to design efficient, effective hyperparameter searches.