Machine LearningHyperparameter Optimization

Grid Search

LevelIntermediate

Duration90 mins

TopicHyperparameter Optimization

4 / 5

Practical Guidelines

From Theory to Production-Ready Practice

Understanding grid search theory is essential, but successfully applying it in real-world projects requires practical wisdom accumulated from thousands of experiments. This page distills that experience into actionable guidelines.

We address the questions that arise in every hyperparameter optimization project:

What ranges should I search? — Guidelines for each hyperparameter type
How many values per dimension? — Resolution-performance trade-offs
In what order should I tune? — Prioritization and sequencing strategies
How do I know it's working? — Monitoring and debugging techniques
What are common pitfalls? — Mistakes to avoid

These guidelines apply across algorithms and domains, forming a practical framework for effective hyperparameter optimization.

What You Will Learn

By the end of this page, you will have a practical toolkit for designing grid searches: algorithm-specific range recommendations, resolution heuristics, common grid patterns, debugging strategies, and a workflow that minimizes wasted computation while maximizing optimization quality.

Range Selection Guidelines

Selecting appropriate ranges for each hyperparameter is the most critical design decision in grid search. Too narrow ranges miss optimal values; too wide ranges waste budget on implausible regions.

Sources for Range Selection:

Library Defaults: Default values in scikit-learn, XGBoost, etc., are carefully chosen. Build ranges around these defaults.
Published Research: Papers often report hyperparameter ranges that worked for similar problems.
Theoretical Bounds: Some hyperparameters have natural constraints (e.g., probabilities in [0,1]).
Prior Experience: Your own experiments on similar datasets provide valuable guidance.
Domain Knowledge: Understanding the algorithm helps set sensible bounds.

Recommended Hyperparameter Ranges by Algorithm
Algorithm	Hyperparameter	Recommended Range	Scale	Notes
Gradient Boosting	learning_rate	0.01 - 0.3	Log	Inverse relationship with n_estimators
Gradient Boosting	n_estimators	100 - 1000	Linear	More = better but slower; use early stopping
Gradient Boosting	max_depth	3 - 10	Linear	Deeper = more complex; 3-7 common
Gradient Boosting	min_samples_split	2 - 20	Linear	Higher = more regularization
Random Forest	n_estimators	100 - 500	Linear	Diminishing returns after ~300
Random Forest	max_features	sqrt, log2, 0.3-0.8	Categorical/Linear	Controls variance-bias trade-off
SVM	C	0.01 - 100	Log	Regularization strength (inverse)
SVM	gamma (RBF)	0.001 - 10	Log	Kernel bandwidth (with scale)
Neural Network	learning_rate	0.0001 - 0.01	Log	Algorithm-specific defaults vary
Neural Network	alpha (L2)	0.0001 - 0.1	Log	Weight decay regularization
Ridge/Lasso	alpha	0.001 - 100	Log	Wide range; data-dependent

range_templates.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
import numpy as np
from typing import Dict, List, Any
 
# ============================================================
# Curated Grid Templates for Common Algorithms
# ============================================================
 
def get_gradient_boosting_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for Gradient Boosting (GBM/XGBoost/LightGBM).
    
    These ranges come from extensive empirical testing across
    hundreds of tabular datasets.
    """
    grids = {
        'coarse': {
            'learning_rate': [0.01, 0.05, 0.1, 0.2],
            'n_estimators': [100, 200, 500],
            'max_depth': [3, 5, 7],
            'min_samples_split': [2, 10],
            'subsample': [0.8, 1.0],
        },
        'medium': {
            'learning_rate': [0.01, 0.03, 0.05, 0.1, 0.15, 0.2],
            'n_estimators': [100, 200, 300, 500],
            'max_depth': [3, 4, 5, 6, 7, 8],
            'min_samples_split': [2, 5, 10, 20],
            'min_samples_leaf': [1, 2, 5],
            'subsample': [0.7, 0.8, 0.9, 1.0],
            'max_features': ['sqrt', 'log2', 0.5, 0.8],
        },
        'fine': {
            'learning_rate': list(np.logspace(-2, -0.5, 8)),  # 0.01 to 0.316
            'n_estimators': [100, 150, 200, 300, 400, 500, 750, 1000],
            'max_depth': list(range(3, 12)),
            'min_samples_split': [2, 3, 5, 10, 15, 20],
            'min_samples_leaf': [1, 2, 3, 5, 10],
            'subsample': list(np.arange(0.6, 1.05, 0.1)),
            'max_features': ['sqrt', 'log2', 0.3, 0.5, 0.7, 0.9, None],
        },
    }
    return grids[granularity]
 
 
def get_random_forest_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for Random Forest.
    
    Random forests are relatively robust to hyperparameters;
    n_estimators and max_features are most important.
    """
    grids = {
        'coarse': {
            'n_estimators': [100, 200, 300],
            'max_depth': [None, 10, 20],
            'max_features': ['sqrt', 'log2'],
            'min_samples_split': [2, 5],
        },
        'medium': {
            'n_estimators': [100, 200, 300, 500],
            'max_depth': [None, 10, 15, 20, 30],
            'max_features': ['sqrt', 'log2', 0.5],
            'min_samples_split': [2, 5, 10],
            'min_samples_leaf': [1, 2, 4],
            'bootstrap': [True, False],
        },
        'fine': {
            'n_estimators': [100, 150, 200, 300, 400, 500],
            'max_depth': [None, 5, 10, 15, 20, 25, 30],
            'max_features': ['sqrt', 'log2', 0.3, 0.5, 0.7, None],
            'min_samples_split': [2, 3, 5, 10, 15],
            'min_samples_leaf': [1, 2, 3, 5],
            'bootstrap': [True, False],
            'class_weight': [None, 'balanced'],
        },
    }
    return grids[granularity]
 
 
def get_svm_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for SVM.
    
    C and gamma are the most important hyperparameters.
    Wide log-scale ranges are essential.
    """
    grids = {
        'coarse': {
            'C': [0.1, 1, 10, 100],
            'gamma': ['scale', 'auto', 0.01, 0.1, 1],
            'kernel': ['rbf', 'linear'],
        },
        'medium': {
            'C': list(np.logspace(-2, 3, 6)),  # 0.01 to 1000
            'gamma': ['scale', 'auto'] + list(np.logspace(-3, 1, 5)),
            'kernel': ['rbf', 'linear', 'poly'],
            'degree': [2, 3, 4],  # Only for poly kernel
        },
        'fine': {
            'C': list(np.logspace(-3, 4, 8)),
            'gamma': ['scale', 'auto'] + list(np.logspace(-4, 2, 7)),
            'kernel': ['rbf', 'linear', 'poly', 'sigmoid'],
            'degree': [2, 3, 4, 5],
            'coef0': [0, 0.1, 0.5, 1],
        },
    }
    return grids[granularity]
 
 
def get_neural_network_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for scikit-learn MLPClassifier/Regressor.
    
    Architecture and learning rate are most impactful.
    """
    grids = {
        'coarse': {
            'hidden_layer_sizes': [(50,), (100,), (100, 50)],
            'alpha': [0.0001, 0.001, 0.01],
            'learning_rate_init': [0.001, 0.01],
            'activation': ['relu', 'tanh'],
        },
        'medium': {
            'hidden_layer_sizes': [
                (50,), (100,), (200,),
                (100, 50), (100, 100), (200, 100),
            ],
            'alpha': [0.0001, 0.001, 0.01, 0.1],
            'learning_rate_init': [0.0001, 0.001, 0.005, 0.01],
            'activation': ['relu', 'tanh'],
            'batch_size': [32, 64, 128],
            'early_stopping': [True],
        },
        'fine': {
            'hidden_layer_sizes': [
                (32,), (64,), (100,), (128,), (200,), (256,),
                (64, 32), (100, 50), (128, 64), (200, 100),
                (100, 100), (128, 128), (200, 100, 50),
            ],
            'alpha': list(np.logspace(-5, -1, 5)),
            'learning_rate_init': list(np.logspace(-4, -2, 5)),
            'activation': ['relu', 'tanh', 'logistic'],
            'batch_size': [16, 32, 64, 128, 256],
            'solver': ['adam', 'sgd'],
            'early_stopping': [True],
        },
    }
    return grids[granularity]
 
 
def estimate_grid_cost(grid: Dict[str, List]) -> Dict:
    """Calculate grid statistics."""
    n_configs = 1
    for values in grid.values():
        n_configs *= len(values)
    
    return {
        'n_params': len(grid),
        'dimensions': [len(v) for v in grid.values()],
        'n_configs': n_configs,
        'with_5fold_cv': n_configs * 5,
    }
 
 
# Display grid templates
for algo, getter in [
    ('Gradient Boosting', get_gradient_boosting_grid),
    ('Random Forest', get_random_forest_grid),
    ('SVM', get_svm_grid),
    ('Neural Network', get_neural_network_grid),
]:
    print(f"
{'='*50}")
    print(f"{algo} Grids")
    print('='*50)
    for level in ['coarse', 'medium', 'fine']:
        grid = getter(level)
        stats = estimate_grid_cost(grid)
        print(f"
{level.upper()}: {stats['n_configs']:,} configs "
              f"({stats['with_5fold_cv']:,} with 5-fold CV)")

Start Narrow, Then Expand

Begin with ranges centered around defaults. If the best values are at range boundaries, expand in that direction and re-run. If best values are in the interior, consider refining with a finer local grid. This iterative approach is more efficient than starting with overly wide ranges.

Resolution Selection Heuristics

Choosing the number of values per hyperparameter involves balancing coverage against computational cost. Several heuristics guide this decision.

Heuristic 1: The 3-5-10 Rule

For continuous hyperparameters:

3 values: Quick exploration (minimum meaningful)
5 values: Standard resolution (good trade-off)
10 values: Fine search (diminishing returns)

More than 10 values rarely improves results unless the objective is known to be highly sensitive.

Heuristic 2: Log-Spacing for Orders of Magnitude

When a hyperparameter spans multiple orders of magnitude (e.g., regularization from 0.001 to 100), use logarithmic spacing:

# Linear spacing (BAD): 0.001, 25.25, 50.5, 75.75, 100
# Log spacing (GOOD): 0.001, 0.01, 0.1, 1, 10, 100

Log spacing ensures each order of magnitude is represented.

Heuristic 3: Budget-First Calculation

Given a computational budget (e.g., 1000 model evaluations), work backward:

$$n_{\text{per dim}} = \lfloor \text{budget}^{1/d} \rfloor$$

With 1000 budget and 5 hyperparameters: $1000^{1/5} \approx 4$ values per dimension.

resolution_heuristics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import numpy as np
from typing import Dict, List, Tuple, Any
 
def calculate_resolution_from_budget(
    n_hyperparameters: int,
    total_budget: int,
    cv_folds: int = 5,
    min_per_dim: int = 2,
    max_per_dim: int = 10
) -> int:
    """
    Calculate recommended grid resolution given a computational budget.
    
    Parameters:
    -----------
    n_hyperparameters: Number of hyperparameters to tune
    total_budget: Maximum number of model training runs
    cv_folds: Number of cross-validation folds
    
    Returns:
    --------
    Recommended number of values per hyperparameter
    """
    # Budget available for configurations (accounting for CV)
    config_budget = total_budget // cv_folds
    
    # Calculate resolution
    resolution = int(config_budget ** (1 / n_hyperparameters))
    
    # Apply bounds
    resolution = max(min_per_dim, min(max_per_dim, resolution))
    
    # Actual grid size
    actual_configs = resolution ** n_hyperparameters
    actual_trains = actual_configs * cv_folds
    
    print(f"Budget Analysis:")
    print(f"  Hyperparameters: {n_hyperparameters}")
    print(f"  Total budget: {total_budget} training runs")
    print(f"  CV folds: {cv_folds}")
    print(f"  → Recommended resolution: {resolution} values per dim")
    print(f"  → Grid size: {actual_configs} configurations")
    print(f"  → Total trains: {actual_trains} (uses {100*actual_trains/total_budget:.1f}% of budget)")
    
    return resolution
 
 
def adaptive_resolution_grid(
    param_sensitivities: Dict[str, float],
    total_budget: int,
    cv_folds: int = 5,
    min_resolution: int = 2,
    max_resolution: int = 8
) -> Dict[str, int]:
    """
    Assign different resolutions based on hyperparameter sensitivity.
    
    More sensitive hyperparameters get finer resolution.
    Less sensitive ones get coarser resolution.
    
    This allocates budget where it matters most.
    """
    # Normalize sensitivities
    total_sens = sum(param_sensitivities.values())
    normalized = {k: v/total_sens for k, v in param_sensitivities.items()}
    
    # Initial equal resolution
    n_params = len(param_sensitivities)
    config_budget = total_budget // cv_folds
    base_resolution = int(config_budget ** (1 / n_params))
    
    # Adjust based on sensitivity
    resolutions = {}
    for param, sens in normalized.items():
        # Higher sensitivity → higher resolution
        # Scale from -1 to +1 adjustment
        adjustment = (sens - 1/n_params) * n_params  # Deviation from uniform
        adjust_factor = 1 + adjustment  # Factor from ~0.5 to ~1.5
        
        resolution = int(base_resolution * adjust_factor)
        resolution = max(min_resolution, min(max_resolution, resolution))
        resolutions[param] = resolution
    
    # Calculate actual grid size
    actual_configs = 1
    for r in resolutions.values():
        actual_configs *= r
    
    print("
Adaptive Resolution Allocation:")
    print("-" * 50)
    for param, sens in sorted(normalized.items(), key=lambda x: -x[1]):
        print(f"  {param}: sensitivity={sens:.3f}, resolution={resolutions[param]}")
    print(f"
Total configurations: {actual_configs}")
    print(f"With {cv_folds}-fold CV: {actual_configs * cv_folds} training runs")
    
    return resolutions
 
 
def create_multi_resolution_grid(
    param_config: Dict[str, Tuple[float, float, str, int]],
) -> Dict[str, List[float]]:
    """
    Create a grid with different resolutions per hyperparameter.
    
    Parameters:
    -----------
    param_config: Dict mapping param name to (min, max, scale, resolution)
        scale is 'linear' or 'log'
        resolution is number of values for this parameter
    
    Example:
    param_config = {
        'learning_rate': (0.001, 0.1, 'log', 6),    # 6 points, log-spaced
        'max_depth': (3, 10, 'linear', 8),          # 8 points, linear
        'subsample': (0.6, 1.0, 'linear', 3),       # 3 points, linear
    }
    """
    grid = {}
    
    for param, (lo, hi, scale, resolution) in param_config.items():
        if scale == 'log':
            values = list(np.logspace(np.log10(lo), np.log10(hi), resolution))
        else:
            values = list(np.linspace(lo, hi, resolution))
        grid[param] = values
    
    # Print grid details
    print("Multi-Resolution Grid:")
    total = 1
    for param, values in grid.items():
        print(f"  {param}: {len(values)} values")
        print(f"    {[f'{v:.4f}' if isinstance(v, float) else v for v in values]}")
        total *= len(values)
    print(f"
Total configurations: {total}")
    
    return grid
 
 
# Demonstration
print("=== Resolution Calculation Examples ===
")
 
# Example 1: Fixed budget
calculate_resolution_from_budget(
    n_hyperparameters=5,
    total_budget=5000,
    cv_folds=5
)
 
# Example 2: Adaptive resolution
print("
" + "="*50)
sensitivities = {
    'learning_rate': 0.4,   # Most sensitive
    'n_estimators': 0.25,   # Moderately sensitive
    'max_depth': 0.2,       # Moderately sensitive
    'min_samples_split': 0.1,  # Less sensitive
    'subsample': 0.05,      # Least sensitive
}
adaptive_resolution_grid(sensitivities, total_budget=5000)
 
# Example 3: Multi-resolution grid
print("
" + "="*50)
config = {
    'learning_rate': (0.01, 0.3, 'log', 6),
    'n_estimators': (100, 500, 'linear', 5),
    'max_depth': (3, 10, 'linear', 8),
    'subsample': (0.7, 1.0, 'linear', 3),
}
create_multi_resolution_grid(config)

The Law of Diminishing Returns

Performance improvement typically follows a logarithmic curve with grid resolution. Going from 3 to 5 values per hyperparameter often yields significant improvement. Going from 7 to 10 rarely does. Invest in more hyperparameters at coarser resolution rather than fewer at fine resolution.

Common Grid Search Patterns

Experienced practitioners employ established patterns that encode best practices for specific scenarios.

Pattern 1: Coarse-to-Fine Refinement

Two-phase approach:

Phase 1: Coarse grid (3 values per dim) to identify promising region
Phase 2: Fine grid around best region from Phase 1

Pattern 2: Staged Tuning by Importance

Sequential stages:

Stage 1: Tune most impactful hyperparameters (learning_rate, regularization)
Stage 2: Tune model capacity (depth, width, n_estimators)
Stage 3: Fine-tune minor hyperparameters (min_samples, subsample)

Pattern 3: Grid + Random Hybrid

Combine grid and random search:

Grid search on 2-3 most important hyperparameters
Random search on remaining hyperparameters within reasonable ranges

grid_patterns.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
import numpy as np
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingClassifier
from scipy.stats import uniform, randint
from typing import Dict, Any
 
def coarse_to_fine_pattern(estimator, X, y, 
                           initial_grid: Dict[str, list],
                           refinement_factor: float = 0.5,
                           cv: int = 5):
    """
    Pattern 1: Coarse-to-Fine Refinement
    
    1. Run coarse grid search
    2. Identify best values
    3. Create refined grid in neighborhood of best values
    4. Run fine grid search
    """
    print("=== Coarse-to-Fine Grid Search ===")
    
    # Phase 1: Coarse search
    print("
Phase 1: Coarse Grid")
    coarse_size = np.prod([len(v) for v in initial_grid.values()])
    print(f"  Grid size: {coarse_size}")
    
    coarse_search = GridSearchCV(estimator, initial_grid, cv=cv, n_jobs=-1)
    coarse_search.fit(X, y)
    
    best_coarse = coarse_search.best_params_
    print(f"  Best params: {best_coarse}")
    print(f"  Best score: {coarse_search.best_score_:.4f}")
    
    # Phase 2: Create refined grid around best values
    print("
Phase 2: Refined Grid")
    refined_grid = {}
    
    for param, values in initial_grid.items():
        best_val = best_coarse[param]
        
        if isinstance(best_val, (int, float)):
            # Numeric parameter: create neighborhood
            values_arr = np.array(values)
            best_idx = np.argmin(np.abs(values_arr - best_val))
            
            # Spacing in original grid
            if len(values) > 1:
                spacing = values[1] - values[0] if best_idx == 0 else values[best_idx] - values[best_idx-1]
            else:
                spacing = best_val * 0.5
            
            # Create refined values around best
            refined_range = spacing * refinement_factor
            new_values = np.linspace(
                max(values[0], best_val - refined_range),
                min(values[-1], best_val + refined_range),
                5  # 5 refined points
            )
            refined_grid[param] = list(new_values)
        else:
            # Categorical: keep just the best value
            refined_grid[param] = [best_val]
    
    refined_size = np.prod([len(v) for v in refined_grid.values()])
    print(f"  Refined grid size: {refined_size}")
    
    fine_search = GridSearchCV(estimator, refined_grid, cv=cv, n_jobs=-1)
    fine_search.fit(X, y)
    
    print(f"  Best params: {fine_search.best_params_}")
    print(f"  Best score: {fine_search.best_score_:.4f}")
    print(f"
Total evaluations: {coarse_size + refined_size}")
    
    return fine_search.best_params_, fine_search.best_score_
 
 
def staged_tuning_pattern(estimator_class, X, y, cv: int = 5):
    """
    Pattern 2: Staged Tuning by Importance
    
    Tune hyperparameters in order of importance:
    1. Learning dynamics (learning_rate)
    2. Model capacity (depth, estimators)
    3. Regularization and minor params
    
    Each stage uses optimized values from previous stages.
    """
    print("=== Staged Tuning Pattern ===")
    current_params = {}
    
    # Stage 1: Learning dynamics
    print("
Stage 1: Learning Dynamics")
    stage1_grid = {
        'learning_rate': [0.01, 0.03, 0.05, 0.1, 0.15, 0.2],
        'n_estimators': [100, 200, 300, 500],
    }
    stage1_size = np.prod([len(v) for v in stage1_grid.values()])
    print(f"  Grid size: {stage1_size}")
    
    search1 = GridSearchCV(estimator_class(**current_params), stage1_grid, cv=cv, n_jobs=-1)
    search1.fit(X, y)
    current_params.update(search1.best_params_)
    print(f"  Best: {search1.best_params_}, score: {search1.best_score_:.4f}")
    
    # Stage 2: Model capacity
    print("
Stage 2: Model Capacity")
    stage2_grid = {
        'max_depth': [3, 4, 5, 6, 7, 8],
        'min_samples_split': [2, 5, 10, 20],
    }
    stage2_size = np.prod([len(v) for v in stage2_grid.values()])
    print(f"  Grid size: {stage2_size}")
    
    search2 = GridSearchCV(estimator_class(**current_params), stage2_grid, cv=cv, n_jobs=-1)
    search2.fit(X, y)
    current_params.update(search2.best_params_)
    print(f"  Best: {search2.best_params_}, score: {search2.best_score_:.4f}")
    
    # Stage 3: Regularization
    print("
Stage 3: Regularization")
    stage3_grid = {
        'min_samples_leaf': [1, 2, 5],
        'subsample': [0.7, 0.8, 0.9, 1.0],
        'max_features': ['sqrt', 'log2', None],
    }
    stage3_size = np.prod([len(v) for v in stage3_grid.values()])
    print(f"  Grid size: {stage3_size}")
    
    search3 = GridSearchCV(estimator_class(**current_params), stage3_grid, cv=cv, n_jobs=-1)
    search3.fit(X, y)
    current_params.update(search3.best_params_)
    print(f"  Best: {search3.best_params_}, score: {search3.best_score_:.4f}")
    
    total_evals = stage1_size + stage2_size + stage3_size
    print(f"
Final Parameters: {current_params}")
    print(f"Final Score: {search3.best_score_:.4f}")
    print(f"Total evaluations: {total_evals} (vs {stage1_size * stage2_size * stage3_size} for full grid)")
    
    return current_params, search3.best_score_
 
 
def grid_random_hybrid_pattern(estimator, X, y,
                               primary_grid: Dict[str, list],
                               secondary_distributions: Dict[str, Any],
                               n_random_iter: int = 50,
                               cv: int = 5):
    """
    Pattern 3: Grid + Random Hybrid
    
    - Grid search on primary (most important) hyperparameters
    - Random search on secondary hyperparameters
    
    Combines the thoroughness of grid on key hyperparameters
    with the efficiency of random on less critical ones.
    """
    print("=== Grid + Random Hybrid ===")
    
    # Step 1: Grid search on primary hyperparameters
    print("
Step 1: Grid search on primary hyperparameters")
    primary_size = np.prod([len(v) for v in primary_grid.values()])
    print(f"  Primary grid size: {primary_size}")
    print(f"  Primary params: {list(primary_grid.keys())}")
    
    grid_search = GridSearchCV(estimator, primary_grid, cv=cv, n_jobs=-1)
    grid_search.fit(X, y)
    
    best_primary = grid_search.best_params_
    print(f"  Best primary: {best_primary}")
    print(f"  Score: {grid_search.best_score_:.4f}")
    
    # Step 2: Random search on secondary, with primary fixed
    print("
Step 2: Random search on secondary hyperparameters")
    print(f"  Random iterations: {n_random_iter}")
    print(f"  Secondary params: {list(secondary_distributions.keys())}")
    
    # Combine fixed primary with random secondary
    full_distributions = {
        **{k: [v] for k, v in best_primary.items()},  # Fixed primary
        **secondary_distributions,                     # Random secondary
    }
    
    random_search = RandomizedSearchCV(
        estimator, full_distributions, 
        n_iter=n_random_iter, cv=cv, n_jobs=-1, random_state=42
    )
    random_search.fit(X, y)
    
    print(f"  Best params: {random_search.best_params_}")
    print(f"  Score: {random_search.best_score_:.4f}")
    
    total_evals = primary_size + n_random_iter
    print(f"
Total evaluations: {total_evals}")
    
    return random_search.best_params_, random_search.best_score_
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.datasets import make_classification
    
    X, y = make_classification(n_samples=1000, n_features=20, 
                               n_informative=10, random_state=42)
    
    print("
" + "="*60)
    print("PATTERN DEMONSTRATIONS")
    print("="*60)
    
    # Pattern 1
    initial_grid = {
        'learning_rate': [0.01, 0.1, 0.2],
        'n_estimators': [100, 200, 300],
        'max_depth': [3, 5, 7],
    }
    coarse_to_fine_pattern(GradientBoostingClassifier(), X, y, initial_grid)
    
    print("
" + "="*60)
    
    # Pattern 2
    staged_tuning_pattern(GradientBoostingClassifier, X, y)

Debugging Grid Search

When grid search produces poor or unexpected results, systematic debugging reveals the root cause.

Problem 1: Best Value at Grid Boundary

Symptom: The optimal hyperparameter is at the minimum or maximum of your grid range.

Cause: The true optimum lies outside your grid.

Solution: Extend the grid in that direction and re-run.

Problem 2: High Variance Across CV Folds

Symptom: Standard deviation of CV scores is large relative to the mean.

Cause: Insufficient data, high model variance, or overfitting on certain folds.

Solution: Use more CV folds, more data, or regularization.

Problem 3: Training Score >> Validation Score

Symptom: Large gap between train and validation performance.

Cause: Overfitting — model is too complex for the data.

Solution: Increase regularization, reduce model complexity.

debugging_grid_search.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from typing import Dict, Any
 
def diagnose_grid_search_results(grid_search_cv: GridSearchCV) -> Dict[str, Any]:
    """
    Comprehensive diagnostic analysis of grid search results.
    
    Identifies common problems and suggests remedies.
    """
    results = pd.DataFrame(grid_search_cv.cv_results_)
    
    diagnostics = {
        'issues': [],
        'recommendations': [],
    }
    
    # === Check 1: Best at boundary ===
    for param in grid_search_cv.param_grid.keys():
        param_col = f'param_{param}'
        if param_col in results.columns:
            param_values = results[param_col].dropna().unique()
            if len(param_values) > 1:
                best_value = grid_search_cv.best_params_[param]
                
                # Check if best is at min or max
                try:
                    sorted_values = sorted(param_values)
                    if best_value == sorted_values[0]:
                        diagnostics['issues'].append(
                            f"Best {param}={best_value} is at MINIMUM of grid"
                        )
                        diagnostics['recommendations'].append(
                            f"Extend {param} grid to smaller values"
                        )
                    elif best_value == sorted_values[-1]:
                        diagnostics['issues'].append(
                            f"Best {param}={best_value} is at MAXIMUM of grid"
                        )
                        diagnostics['recommendations'].append(
                            f"Extend {param} grid to larger values"
                        )
                except TypeError:
                    pass  # Non-comparable types
    
    # === Check 2: High CV variance ===
    best_idx = grid_search_cv.best_index_
    n_splits = grid_search_cv.n_splits_
    
    cv_scores = [results[f'split{i}_test_score'][best_idx] for i in range(n_splits)]
    cv_mean = np.mean(cv_scores)
    cv_std = np.std(cv_scores)
    cv_coeff_var = cv_std / cv_mean if cv_mean != 0 else float('inf')
    
    if cv_coeff_var > 0.1:  # >10% coefficient of variation
        diagnostics['issues'].append(
            f"High CV variance: {cv_std:.4f} (CV={cv_coeff_var:.2%})"
        )
        diagnostics['recommendations'].append(
            "Consider more CV folds, more data, or regularization"
        )
    
    # === Check 3: Overfitting (train >> test gap) ===
    if 'mean_train_score' in results.columns:
        train_score = results['mean_train_score'][best_idx]
        test_score = results['mean_test_score'][best_idx]
        gap = train_score - test_score
        relative_gap = gap / train_score if train_score != 0 else 0
        
        if relative_gap > 0.1:  # >10% relative gap
            diagnostics['issues'].append(
                f"Overfitting detected: train={train_score:.4f}, "
                f"test={test_score:.4f}, gap={gap:.4f}"
            )
            diagnostics['recommendations'].append(
                "Increase regularization or reduce model complexity"
            )
    
    # === Check 4: Flat response surface ===
    score_range = results['mean_test_score'].max() - results['mean_test_score'].min()
    if score_range < 0.01:  # Less than 1% variation
        diagnostics['issues'].append(
            f"Near-flat response: score range = {score_range:.4f}"
        )
        diagnostics['recommendations'].append(
            "Hyperparameters may not matter much; use defaults"
        )
    
    # === Check 5: Multiple near-optimal configurations ===
    best_score = grid_search_cv.best_score_
    near_optimal = results[results['mean_test_score'] >= best_score - 0.005]
    
    if len(near_optimal) > 5:
        diagnostics['issues'].append(
            f"{len(near_optimal)} configurations within 0.5% of best"
        )
        diagnostics['recommendations'].append(
            "Choose simpler/faster configuration among near-optimal"
        )
    
    # Print diagnostic report
    print("="*60)
    print("GRID SEARCH DIAGNOSTIC REPORT")
    print("="*60)
    
    print(f"
Best Score: {grid_search_cv.best_score_:.4f}")
    print(f"Best Parameters: {grid_search_cv.best_params_}")
    print(f"Configurations Evaluated: {len(results)}")
    
    if diagnostics['issues']:
        print(f"
{'⚠ ISSUES DETECTED':=^60}")
        for i, issue in enumerate(diagnostics['issues'], 1):
            print(f"  {i}. {issue}")
        
        print(f"
{'💡 RECOMMENDATIONS':=^60}")
        for i, rec in enumerate(diagnostics['recommendations'], 1):
            print(f"  {i}. {rec}")
    else:
        print(f"
{'✓ No issues detected':=^60}")
    
    return diagnostics
 
 
def visualize_grid_search(grid_search_cv: GridSearchCV, 
                          param1: str, param2: str = None):
    """
    Visualize grid search results for 1D or 2D parameter spaces.
    
    For debugging, visualization often reveals patterns that
    tables miss: local optima, interactions, smoothness.
    """
    results = pd.DataFrame(grid_search_cv.cv_results_)
    
    if param2 is None:
        # 1D visualization
        param_values = results[f'param_{param1}'].values
        scores = results['mean_test_score'].values
        stds = results['std_test_score'].values
        
        print(f"
{param1} vs Score:")
        print("-" * 40)
        for val, score, std in sorted(zip(param_values, scores, stds)):
            bar = '█' * int(score * 50)
            print(f"  {val:>10}: {score:.4f} ± {std:.4f} {bar}")
    else:
        # 2D visualization (heatmap-style)
        p1_vals = sorted(results[f'param_{param1}'].unique())
        p2_vals = sorted(results[f'param_{param2}'].unique())
        
        print(f"
{param1} vs {param2} (scores):")
        print("-" * 60)
        
        # Header
        header = f"{'':>12} | " + " | ".join(f"{v:>8}" for v in p2_vals)
        print(header)
        print("-" * len(header))
        
        for p1 in p1_vals:
            row_scores = []
            for p2 in p2_vals:
                mask = (results[f'param_{param1}'] == p1) & (results[f'param_{param2}'] == p2)
                if mask.any():
                    score = results.loc[mask, 'mean_test_score'].values[0]
                    row_scores.append(f"{score:.4f}")
                else:
                    row_scores.append("   -   ")
            print(f"{p1:>12} | " + " | ".join(row_scores))
 
 
# Example usage
def demonstrate_diagnostics():
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.datasets import make_classification
    
    X, y = make_classification(n_samples=500, n_features=20, random_state=42)
    
    # Intentionally problematic grid (best at boundary)
    param_grid = {
        'n_estimators': [100, 150, 200],  # Best might be > 200
        'max_depth': [2, 3, 4],           # Too shallow
        'min_samples_split': [2, 5, 10],
    }
    
    grid_search = GridSearchCV(
        RandomForestClassifier(random_state=42),
        param_grid,
        cv=5,
        return_train_score=True,
        n_jobs=-1
    )
    grid_search.fit(X, y)
    
    # Run diagnostics
    diagnostics = diagnose_grid_search_results(grid_search)
    
    # Visualize
    visualize_grid_search(grid_search, 'max_depth')
    visualize_grid_search(grid_search, 'n_estimators', 'max_depth')
    
    return diagnostics
 
 
demonstrate_diagnostics()

Best Practices Checklist

Before, during, and after grid search, follow this checklist to ensure robust results.

Before Grid Search

•Estimate cost: Calculate total training runs before starting
•Check dimensionality: If d > 4, consider random search instead
•Validate ranges: Ensure ranges include defaults and known good values
•Use appropriate spacing: Log-scale for magnitude-spanning params
•Set random seeds: Ensure reproducibility
•Prepare holdout test set: Never tune on final test data

During Grid Search

•Monitor progress: Watch for early signs of issues
•Check for errors: Some configurations may fail to train
•Track resource usage: Memory and CPU to anticipate problems
•Save intermediate results: In case of crashes
•Use parallelization: Match workers to available cores

After Grid Search

•Check boundary effects: Are best values at grid edges?
•Analyze CV variance: High variance suggests instability
•Compare train vs validation: Detect overfitting
•Retrain on full training data: Using best hyperparameters
•Evaluate on holdout test set: Final unbiased performance
•Document everything: Parameters, ranges, results for reproducibility

The Holdout Test Set is Sacred

Never use the test set during hyperparameter tuning. If you peek at test performance and adjust hyperparameters accordingly, the test set is contaminated and no longer provides unbiased performance estimates. Use a separate validation set for tuning.

Algorithm-Specific Guidelines

Different algorithms have specific tuning patterns that experienced practitioners know.

Gradient Boosting (XGBoost, LightGBM, CatBoost):

First tune learning_rate and n_estimators together (inverse relationship)
Then tune tree structure: max_depth, min_child_weight
Finally tune regularization: subsample, colsample_bytree, reg_alpha/lambda
Use early stopping with a validation set rather than fixed n_estimators

Random Forest:

n_estimators: More is generally better (diminishing returns after ~300)
max_features: Most impactful; try 'sqrt', 'log2', and 0.3-0.5
max_depth: Usually None (full depth) works well
class_weight: 'balanced' for imbalanced datasets

SVM:

Kernel choice first: 'rbf' is most common, try 'linear' for high-d sparse data
C and gamma must be tuned jointly (interact strongly)
Use very wide log-scale ranges: C ∈ [0.001, 1000], gamma ∈ [0.0001, 10]
Consider feature scaling — SVM is sensitive to feature magnitude

Priority Hyperparameters by Algorithm
Algorithm	Highest Priority	Medium Priority	Lower Priority
XGBoost/LightGBM	learning_rate, n_estimators	max_depth, min_child_weight	subsample, colsample, reg_lambda
Random Forest	max_features, n_estimators	max_depth, min_samples_split	bootstrap, class_weight
SVM (RBF)	C, gamma	kernel (if uncertain)	degree (poly), coef0
Neural Network	learning_rate, hidden_sizes	alpha (regularization), batch_size	activation, solver
Ridge/Lasso	alpha	— (only alpha matters)	—

Use Early Stopping for Iterative Algorithms

For gradient boosting and neural networks, don't grid search n_estimators or epochs. Instead, set a high value and use early stopping with a validation set. This is faster and often produces better results.

Summary: Practical Wisdom for Grid Search

We have distilled practical wisdom for designing and executing effective grid searches. Let's consolidate the key guidelines:

Key Takeaways

•Use curated range templates — Don't reinvent the wheel; leverage established ranges for common algorithms.
•Match resolution to budget — The 3-5-10 rule and budget-first calculation help choose appropriate granularity.
•Apply established patterns — Coarse-to-fine, staged tuning, and grid+random hybrids encode best practices.
•Debug systematically — Check for boundary effects, high variance, overfitting, and flat response surfaces.
•Follow the checklist — Before, during, and after grid search, systematic checks prevent common mistakes.
•Know your algorithm — Each algorithm has priority hyperparameters; focus effort where it matters most.

What's Next:

With ranges, resolution, patterns, and debugging strategies in hand, the final question is: when does grid search actually work well? The next page synthesizes everything into clear decision criteria for when grid search is the right choice versus when to use alternatives.

Page Complete

You now have a practical toolkit for grid search: range templates, resolution heuristics, common patterns, debugging techniques, and algorithm-specific guidance. Apply these guidelines to design efficient, effective hyperparameter searches.

4 / 5

Loading learning content...

Machine LearningHyperparameter Optimization

Grid Search

LevelIntermediate

Duration90 mins

TopicHyperparameter Optimization

4 / 5

Practical Guidelines

From Theory to Production-Ready Practice

We address the questions that arise in every hyperparameter optimization project:

What ranges should I search? — Guidelines for each hyperparameter type
How many values per dimension? — Resolution-performance trade-offs
In what order should I tune? — Prioritization and sequencing strategies
How do I know it's working? — Monitoring and debugging techniques
What are common pitfalls? — Mistakes to avoid

These guidelines apply across algorithms and domains, forming a practical framework for effective hyperparameter optimization.

What You Will Learn

Range Selection Guidelines

Selecting appropriate ranges for each hyperparameter is the most critical design decision in grid search. Too narrow ranges miss optimal values; too wide ranges waste budget on implausible regions.

Sources for Range Selection:

Library Defaults: Default values in scikit-learn, XGBoost, etc., are carefully chosen. Build ranges around these defaults.
Published Research: Papers often report hyperparameter ranges that worked for similar problems.
Theoretical Bounds: Some hyperparameters have natural constraints (e.g., probabilities in [0,1]).
Prior Experience: Your own experiments on similar datasets provide valuable guidance.
Domain Knowledge: Understanding the algorithm helps set sensible bounds.

Recommended Hyperparameter Ranges by Algorithm
Algorithm	Hyperparameter	Recommended Range	Scale	Notes
Gradient Boosting	learning_rate	0.01 - 0.3	Log	Inverse relationship with n_estimators
Gradient Boosting	n_estimators	100 - 1000	Linear	More = better but slower; use early stopping
Gradient Boosting	max_depth	3 - 10	Linear	Deeper = more complex; 3-7 common
Gradient Boosting	min_samples_split	2 - 20	Linear	Higher = more regularization
Random Forest	n_estimators	100 - 500	Linear	Diminishing returns after ~300
Random Forest	max_features	sqrt, log2, 0.3-0.8	Categorical/Linear	Controls variance-bias trade-off
SVM	C	0.01 - 100	Log	Regularization strength (inverse)
SVM	gamma (RBF)	0.001 - 10	Log	Kernel bandwidth (with scale)
Neural Network	learning_rate	0.0001 - 0.01	Log	Algorithm-specific defaults vary
Neural Network	alpha (L2)	0.0001 - 0.1	Log	Weight decay regularization
Ridge/Lasso	alpha	0.001 - 100	Log	Wide range; data-dependent

range_templates.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
import numpy as np
from typing import Dict, List, Any
 
# ============================================================
# Curated Grid Templates for Common Algorithms
# ============================================================
 
def get_gradient_boosting_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for Gradient Boosting (GBM/XGBoost/LightGBM).
    
    These ranges come from extensive empirical testing across
    hundreds of tabular datasets.
    """
    grids = {
        'coarse': {
            'learning_rate': [0.01, 0.05, 0.1, 0.2],
            'n_estimators': [100, 200, 500],
            'max_depth': [3, 5, 7],
            'min_samples_split': [2, 10],
            'subsample': [0.8, 1.0],
        },
        'medium': {
            'learning_rate': [0.01, 0.03, 0.05, 0.1, 0.15, 0.2],
            'n_estimators': [100, 200, 300, 500],
            'max_depth': [3, 4, 5, 6, 7, 8],
            'min_samples_split': [2, 5, 10, 20],
            'min_samples_leaf': [1, 2, 5],
            'subsample': [0.7, 0.8, 0.9, 1.0],
            'max_features': ['sqrt', 'log2', 0.5, 0.8],
        },
        'fine': {
            'learning_rate': list(np.logspace(-2, -0.5, 8)),  # 0.01 to 0.316
            'n_estimators': [100, 150, 200, 300, 400, 500, 750, 1000],
            'max_depth': list(range(3, 12)),
            'min_samples_split': [2, 3, 5, 10, 15, 20],
            'min_samples_leaf': [1, 2, 3, 5, 10],
            'subsample': list(np.arange(0.6, 1.05, 0.1)),
            'max_features': ['sqrt', 'log2', 0.3, 0.5, 0.7, 0.9, None],
        },
    }
    return grids[granularity]
 
 
def get_random_forest_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for Random Forest.
    
    Random forests are relatively robust to hyperparameters;
    n_estimators and max_features are most important.
    """
    grids = {
        'coarse': {
            'n_estimators': [100, 200, 300],
            'max_depth': [None, 10, 20],
            'max_features': ['sqrt', 'log2'],
            'min_samples_split': [2, 5],
        },
        'medium': {
            'n_estimators': [100, 200, 300, 500],
            'max_depth': [None, 10, 15, 20, 30],
            'max_features': ['sqrt', 'log2', 0.5],
            'min_samples_split': [2, 5, 10],
            'min_samples_leaf': [1, 2, 4],
            'bootstrap': [True, False],
        },
        'fine': {
            'n_estimators': [100, 150, 200, 300, 400, 500],
            'max_depth': [None, 5, 10, 15, 20, 25, 30],
            'max_features': ['sqrt', 'log2', 0.3, 0.5, 0.7, None],
            'min_samples_split': [2, 3, 5, 10, 15],
            'min_samples_leaf': [1, 2, 3, 5],
            'bootstrap': [True, False],
            'class_weight': [None, 'balanced'],
        },
    }
    return grids[granularity]
 
 
def get_svm_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for SVM.
    
    C and gamma are the most important hyperparameters.
    Wide log-scale ranges are essential.
    """
    grids = {
        'coarse': {
            'C': [0.1, 1, 10, 100],
            'gamma': ['scale', 'auto', 0.01, 0.1, 1],
            'kernel': ['rbf', 'linear'],
        },
        'medium': {
            'C': list(np.logspace(-2, 3, 6)),  # 0.01 to 1000
            'gamma': ['scale', 'auto'] + list(np.logspace(-3, 1, 5)),
            'kernel': ['rbf', 'linear', 'poly'],
            'degree': [2, 3, 4],  # Only for poly kernel
        },
        'fine': {
            'C': list(np.logspace(-3, 4, 8)),
            'gamma': ['scale', 'auto'] + list(np.logspace(-4, 2, 7)),
            'kernel': ['rbf', 'linear', 'poly', 'sigmoid'],
            'degree': [2, 3, 4, 5],
            'coef0': [0, 0.1, 0.5, 1],
        },
    }
    return grids[granularity]
 
 
def get_neural_network_grid(granularity: str = 'medium') -> Dict[str, List]:
    """
    Production-tested grid for scikit-learn MLPClassifier/Regressor.
    
    Architecture and learning rate are most impactful.
    """
    grids = {
        'coarse': {
            'hidden_layer_sizes': [(50,), (100,), (100, 50)],
            'alpha': [0.0001, 0.001, 0.01],
            'learning_rate_init': [0.001, 0.01],
            'activation': ['relu', 'tanh'],
        },
        'medium': {
            'hidden_layer_sizes': [
                (50,), (100,), (200,),
                (100, 50), (100, 100), (200, 100),
            ],
            'alpha': [0.0001, 0.001, 0.01, 0.1],
            'learning_rate_init': [0.0001, 0.001, 0.005, 0.01],
            'activation': ['relu', 'tanh'],
            'batch_size': [32, 64, 128],
            'early_stopping': [True],
        },
        'fine': {
            'hidden_layer_sizes': [
                (32,), (64,), (100,), (128,), (200,), (256,),
                (64, 32), (100, 50), (128, 64), (200, 100),
                (100, 100), (128, 128), (200, 100, 50),
            ],
            'alpha': list(np.logspace(-5, -1, 5)),
            'learning_rate_init': list(np.logspace(-4, -2, 5)),
            'activation': ['relu', 'tanh', 'logistic'],
            'batch_size': [16, 32, 64, 128, 256],
            'solver': ['adam', 'sgd'],
            'early_stopping': [True],
        },
    }
    return grids[granularity]
 
 
def estimate_grid_cost(grid: Dict[str, List]) -> Dict:
    """Calculate grid statistics."""
    n_configs = 1
    for values in grid.values():
        n_configs *= len(values)
    
    return {
        'n_params': len(grid),
        'dimensions': [len(v) for v in grid.values()],
        'n_configs': n_configs,
        'with_5fold_cv': n_configs * 5,
    }
 
 
# Display grid templates
for algo, getter in [
    ('Gradient Boosting', get_gradient_boosting_grid),
    ('Random Forest', get_random_forest_grid),
    ('SVM', get_svm_grid),
    ('Neural Network', get_neural_network_grid),
]:
    print(f"
{'='*50}")
    print(f"{algo} Grids")
    print('='*50)
    for level in ['coarse', 'medium', 'fine']:
        grid = getter(level)
        stats = estimate_grid_cost(grid)
        print(f"
{level.upper()}: {stats['n_configs']:,} configs "
              f"({stats['with_5fold_cv']:,} with 5-fold CV)")

Start Narrow, Then Expand

Resolution Selection Heuristics

Choosing the number of values per hyperparameter involves balancing coverage against computational cost. Several heuristics guide this decision.

Heuristic 1: The 3-5-10 Rule

For continuous hyperparameters:

3 values: Quick exploration (minimum meaningful)
5 values: Standard resolution (good trade-off)
10 values: Fine search (diminishing returns)

More than 10 values rarely improves results unless the objective is known to be highly sensitive.

Heuristic 2: Log-Spacing for Orders of Magnitude

When a hyperparameter spans multiple orders of magnitude (e.g., regularization from 0.001 to 100), use logarithmic spacing:

# Linear spacing (BAD): 0.001, 25.25, 50.5, 75.75, 100
# Log spacing (GOOD): 0.001, 0.01, 0.1, 1, 10, 100

Log spacing ensures each order of magnitude is represented.

Heuristic 3: Budget-First Calculation

Given a computational budget (e.g., 1000 model evaluations), work backward:

$$n_{\text{per dim}} = \lfloor \text{budget}^{1/d} \rfloor$$

With 1000 budget and 5 hyperparameters: $1000^{1/5} \approx 4$ values per dimension.

resolution_heuristics.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import numpy as np
from typing import Dict, List, Tuple, Any
 
def calculate_resolution_from_budget(
    n_hyperparameters: int,
    total_budget: int,
    cv_folds: int = 5,
    min_per_dim: int = 2,
    max_per_dim: int = 10
) -> int:
    """
    Calculate recommended grid resolution given a computational budget.
    
    Parameters:
    -----------
    n_hyperparameters: Number of hyperparameters to tune
    total_budget: Maximum number of model training runs
    cv_folds: Number of cross-validation folds
    
    Returns:
    --------
    Recommended number of values per hyperparameter
    """
    # Budget available for configurations (accounting for CV)
    config_budget = total_budget // cv_folds
    
    # Calculate resolution
    resolution = int(config_budget ** (1 / n_hyperparameters))
    
    # Apply bounds
    resolution = max(min_per_dim, min(max_per_dim, resolution))
    
    # Actual grid size
    actual_configs = resolution ** n_hyperparameters
    actual_trains = actual_configs * cv_folds
    
    print(f"Budget Analysis:")
    print(f"  Hyperparameters: {n_hyperparameters}")
    print(f"  Total budget: {total_budget} training runs")
    print(f"  CV folds: {cv_folds}")
    print(f"  → Recommended resolution: {resolution} values per dim")
    print(f"  → Grid size: {actual_configs} configurations")
    print(f"  → Total trains: {actual_trains} (uses {100*actual_trains/total_budget:.1f}% of budget)")
    
    return resolution
 
 
def adaptive_resolution_grid(
    param_sensitivities: Dict[str, float],
    total_budget: int,
    cv_folds: int = 5,
    min_resolution: int = 2,
    max_resolution: int = 8
) -> Dict[str, int]:
    """
    Assign different resolutions based on hyperparameter sensitivity.
    
    More sensitive hyperparameters get finer resolution.
    Less sensitive ones get coarser resolution.
    
    This allocates budget where it matters most.
    """
    # Normalize sensitivities
    total_sens = sum(param_sensitivities.values())
    normalized = {k: v/total_sens for k, v in param_sensitivities.items()}
    
    # Initial equal resolution
    n_params = len(param_sensitivities)
    config_budget = total_budget // cv_folds
    base_resolution = int(config_budget ** (1 / n_params))
    
    # Adjust based on sensitivity
    resolutions = {}
    for param, sens in normalized.items():
        # Higher sensitivity → higher resolution
        # Scale from -1 to +1 adjustment
        adjustment = (sens - 1/n_params) * n_params  # Deviation from uniform
        adjust_factor = 1 + adjustment  # Factor from ~0.5 to ~1.5
        
        resolution = int(base_resolution * adjust_factor)
        resolution = max(min_resolution, min(max_resolution, resolution))
        resolutions[param] = resolution
    
    # Calculate actual grid size
    actual_configs = 1
    for r in resolutions.values():
        actual_configs *= r
    
    print("
Adaptive Resolution Allocation:")
    print("-" * 50)
    for param, sens in sorted(normalized.items(), key=lambda x: -x[1]):
        print(f"  {param}: sensitivity={sens:.3f}, resolution={resolutions[param]}")
    print(f"
Total configurations: {actual_configs}")
    print(f"With {cv_folds}-fold CV: {actual_configs * cv_folds} training runs")
    
    return resolutions
 
 
def create_multi_resolution_grid(
    param_config: Dict[str, Tuple[float, float, str, int]],
) -> Dict[str, List[float]]:
    """
    Create a grid with different resolutions per hyperparameter.
    
    Parameters:
    -----------
    param_config: Dict mapping param name to (min, max, scale, resolution)
        scale is 'linear' or 'log'
        resolution is number of values for this parameter
    
    Example:
    param_config = {
        'learning_rate': (0.001, 0.1, 'log', 6),    # 6 points, log-spaced
        'max_depth': (3, 10, 'linear', 8),          # 8 points, linear
        'subsample': (0.6, 1.0, 'linear', 3),       # 3 points, linear
    }
    """
    grid = {}
    
    for param, (lo, hi, scale, resolution) in param_config.items():
        if scale == 'log':
            values = list(np.logspace(np.log10(lo), np.log10(hi), resolution))
        else:
            values = list(np.linspace(lo, hi, resolution))
        grid[param] = values
    
    # Print grid details
    print("Multi-Resolution Grid:")
    total = 1
    for param, values in grid.items():
        print(f"  {param}: {len(values)} values")
        print(f"    {[f'{v:.4f}' if isinstance(v, float) else v for v in values]}")
        total *= len(values)
    print(f"
Total configurations: {total}")
    
    return grid
 
 
# Demonstration
print("=== Resolution Calculation Examples ===
")
 
# Example 1: Fixed budget
calculate_resolution_from_budget(
    n_hyperparameters=5,
    total_budget=5000,
    cv_folds=5
)
 
# Example 2: Adaptive resolution
print("
" + "="*50)
sensitivities = {
    'learning_rate': 0.4,   # Most sensitive
    'n_estimators': 0.25,   # Moderately sensitive
    'max_depth': 0.2,       # Moderately sensitive
    'min_samples_split': 0.1,  # Less sensitive
    'subsample': 0.05,      # Least sensitive
}
adaptive_resolution_grid(sensitivities, total_budget=5000)
 
# Example 3: Multi-resolution grid
print("
" + "="*50)
config = {
    'learning_rate': (0.01, 0.3, 'log', 6),
    'n_estimators': (100, 500, 'linear', 5),
    'max_depth': (3, 10, 'linear', 8),
    'subsample': (0.7, 1.0, 'linear', 3),
}
create_multi_resolution_grid(config)

The Law of Diminishing Returns

Common Grid Search Patterns

Experienced practitioners employ established patterns that encode best practices for specific scenarios.

Pattern 1: Coarse-to-Fine Refinement

Two-phase approach:

Phase 1: Coarse grid (3 values per dim) to identify promising region
Phase 2: Fine grid around best region from Phase 1

Pattern 2: Staged Tuning by Importance

Sequential stages:

Stage 1: Tune most impactful hyperparameters (learning_rate, regularization)
Stage 2: Tune model capacity (depth, width, n_estimators)
Stage 3: Fine-tune minor hyperparameters (min_samples, subsample)

Pattern 3: Grid + Random Hybrid

Combine grid and random search:

Grid search on 2-3 most important hyperparameters
Random search on remaining hyperparameters within reasonable ranges

grid_patterns.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
import numpy as np
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingClassifier
from scipy.stats import uniform, randint
from typing import Dict, Any
 
def coarse_to_fine_pattern(estimator, X, y, 
                           initial_grid: Dict[str, list],
                           refinement_factor: float = 0.5,
                           cv: int = 5):
    """
    Pattern 1: Coarse-to-Fine Refinement
    
    1. Run coarse grid search
    2. Identify best values
    3. Create refined grid in neighborhood of best values
    4. Run fine grid search
    """
    print("=== Coarse-to-Fine Grid Search ===")
    
    # Phase 1: Coarse search
    print("
Phase 1: Coarse Grid")
    coarse_size = np.prod([len(v) for v in initial_grid.values()])
    print(f"  Grid size: {coarse_size}")
    
    coarse_search = GridSearchCV(estimator, initial_grid, cv=cv, n_jobs=-1)
    coarse_search.fit(X, y)
    
    best_coarse = coarse_search.best_params_
    print(f"  Best params: {best_coarse}")
    print(f"  Best score: {coarse_search.best_score_:.4f}")
    
    # Phase 2: Create refined grid around best values
    print("
Phase 2: Refined Grid")
    refined_grid = {}
    
    for param, values in initial_grid.items():
        best_val = best_coarse[param]
        
        if isinstance(best_val, (int, float)):
            # Numeric parameter: create neighborhood
            values_arr = np.array(values)
            best_idx = np.argmin(np.abs(values_arr - best_val))
            
            # Spacing in original grid
            if len(values) > 1:
                spacing = values[1] - values[0] if best_idx == 0 else values[best_idx] - values[best_idx-1]
            else:
                spacing = best_val * 0.5
            
            # Create refined values around best
            refined_range = spacing * refinement_factor
            new_values = np.linspace(
                max(values[0], best_val - refined_range),
                min(values[-1], best_val + refined_range),
                5  # 5 refined points
            )
            refined_grid[param] = list(new_values)
        else:
            # Categorical: keep just the best value
            refined_grid[param] = [best_val]
    
    refined_size = np.prod([len(v) for v in refined_grid.values()])
    print(f"  Refined grid size: {refined_size}")
    
    fine_search = GridSearchCV(estimator, refined_grid, cv=cv, n_jobs=-1)
    fine_search.fit(X, y)
    
    print(f"  Best params: {fine_search.best_params_}")
    print(f"  Best score: {fine_search.best_score_:.4f}")
    print(f"
Total evaluations: {coarse_size + refined_size}")
    
    return fine_search.best_params_, fine_search.best_score_
 
 
def staged_tuning_pattern(estimator_class, X, y, cv: int = 5):
    """
    Pattern 2: Staged Tuning by Importance
    
    Tune hyperparameters in order of importance:
    1. Learning dynamics (learning_rate)
    2. Model capacity (depth, estimators)
    3. Regularization and minor params
    
    Each stage uses optimized values from previous stages.
    """
    print("=== Staged Tuning Pattern ===")
    current_params = {}
    
    # Stage 1: Learning dynamics
    print("
Stage 1: Learning Dynamics")
    stage1_grid = {
        'learning_rate': [0.01, 0.03, 0.05, 0.1, 0.15, 0.2],
        'n_estimators': [100, 200, 300, 500],
    }
    stage1_size = np.prod([len(v) for v in stage1_grid.values()])
    print(f"  Grid size: {stage1_size}")
    
    search1 = GridSearchCV(estimator_class(**current_params), stage1_grid, cv=cv, n_jobs=-1)
    search1.fit(X, y)
    current_params.update(search1.best_params_)
    print(f"  Best: {search1.best_params_}, score: {search1.best_score_:.4f}")
    
    # Stage 2: Model capacity
    print("
Stage 2: Model Capacity")
    stage2_grid = {
        'max_depth': [3, 4, 5, 6, 7, 8],
        'min_samples_split': [2, 5, 10, 20],
    }
    stage2_size = np.prod([len(v) for v in stage2_grid.values()])
    print(f"  Grid size: {stage2_size}")
    
    search2 = GridSearchCV(estimator_class(**current_params), stage2_grid, cv=cv, n_jobs=-1)
    search2.fit(X, y)
    current_params.update(search2.best_params_)
    print(f"  Best: {search2.best_params_}, score: {search2.best_score_:.4f}")
    
    # Stage 3: Regularization
    print("
Stage 3: Regularization")
    stage3_grid = {
        'min_samples_leaf': [1, 2, 5],
        'subsample': [0.7, 0.8, 0.9, 1.0],
        'max_features': ['sqrt', 'log2', None],
    }
    stage3_size = np.prod([len(v) for v in stage3_grid.values()])
    print(f"  Grid size: {stage3_size}")
    
    search3 = GridSearchCV(estimator_class(**current_params), stage3_grid, cv=cv, n_jobs=-1)
    search3.fit(X, y)
    current_params.update(search3.best_params_)
    print(f"  Best: {search3.best_params_}, score: {search3.best_score_:.4f}")
    
    total_evals = stage1_size + stage2_size + stage3_size
    print(f"
Final Parameters: {current_params}")
    print(f"Final Score: {search3.best_score_:.4f}")
    print(f"Total evaluations: {total_evals} (vs {stage1_size * stage2_size * stage3_size} for full grid)")
    
    return current_params, search3.best_score_
 
 
def grid_random_hybrid_pattern(estimator, X, y,
                               primary_grid: Dict[str, list],
                               secondary_distributions: Dict[str, Any],
                               n_random_iter: int = 50,
                               cv: int = 5):
    """
    Pattern 3: Grid + Random Hybrid
    
    - Grid search on primary (most important) hyperparameters
    - Random search on secondary hyperparameters
    
    Combines the thoroughness of grid on key hyperparameters
    with the efficiency of random on less critical ones.
    """
    print("=== Grid + Random Hybrid ===")
    
    # Step 1: Grid search on primary hyperparameters
    print("
Step 1: Grid search on primary hyperparameters")
    primary_size = np.prod([len(v) for v in primary_grid.values()])
    print(f"  Primary grid size: {primary_size}")
    print(f"  Primary params: {list(primary_grid.keys())}")
    
    grid_search = GridSearchCV(estimator, primary_grid, cv=cv, n_jobs=-1)
    grid_search.fit(X, y)
    
    best_primary = grid_search.best_params_
    print(f"  Best primary: {best_primary}")
    print(f"  Score: {grid_search.best_score_:.4f}")
    
    # Step 2: Random search on secondary, with primary fixed
    print("
Step 2: Random search on secondary hyperparameters")
    print(f"  Random iterations: {n_random_iter}")
    print(f"  Secondary params: {list(secondary_distributions.keys())}")
    
    # Combine fixed primary with random secondary
    full_distributions = {
        **{k: [v] for k, v in best_primary.items()},  # Fixed primary
        **secondary_distributions,                     # Random secondary
    }
    
    random_search = RandomizedSearchCV(
        estimator, full_distributions, 
        n_iter=n_random_iter, cv=cv, n_jobs=-1, random_state=42
    )
    random_search.fit(X, y)
    
    print(f"  Best params: {random_search.best_params_}")
    print(f"  Score: {random_search.best_score_:.4f}")
    
    total_evals = primary_size + n_random_iter
    print(f"
Total evaluations: {total_evals}")
    
    return random_search.best_params_, random_search.best_score_
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.datasets import make_classification
    
    X, y = make_classification(n_samples=1000, n_features=20, 
                               n_informative=10, random_state=42)
    
    print("
" + "="*60)
    print("PATTERN DEMONSTRATIONS")
    print("="*60)
    
    # Pattern 1
    initial_grid = {
        'learning_rate': [0.01, 0.1, 0.2],
        'n_estimators': [100, 200, 300],
        'max_depth': [3, 5, 7],
    }
    coarse_to_fine_pattern(GradientBoostingClassifier(), X, y, initial_grid)
    
    print("
" + "="*60)
    
    # Pattern 2
    staged_tuning_pattern(GradientBoostingClassifier, X, y)

Debugging Grid Search

When grid search produces poor or unexpected results, systematic debugging reveals the root cause.

Problem 1: Best Value at Grid Boundary

Symptom: The optimal hyperparameter is at the minimum or maximum of your grid range.

Cause: The true optimum lies outside your grid.

Solution: Extend the grid in that direction and re-run.

Problem 2: High Variance Across CV Folds

Symptom: Standard deviation of CV scores is large relative to the mean.

Cause: Insufficient data, high model variance, or overfitting on certain folds.

Solution: Use more CV folds, more data, or regularization.

Problem 3: Training Score >> Validation Score

Symptom: Large gap between train and validation performance.

Cause: Overfitting — model is too complex for the data.

Solution: Increase regularization, reduce model complexity.

debugging_grid_search.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
from typing import Dict, Any
 
def diagnose_grid_search_results(grid_search_cv: GridSearchCV) -> Dict[str, Any]:
    """
    Comprehensive diagnostic analysis of grid search results.
    
    Identifies common problems and suggests remedies.
    """
    results = pd.DataFrame(grid_search_cv.cv_results_)
    
    diagnostics = {
        'issues': [],
        'recommendations': [],
    }
    
    # === Check 1: Best at boundary ===
    for param in grid_search_cv.param_grid.keys():
        param_col = f'param_{param}'
        if param_col in results.columns:
            param_values = results[param_col].dropna().unique()
            if len(param_values) > 1:
                best_value = grid_search_cv.best_params_[param]
                
                # Check if best is at min or max
                try:
                    sorted_values = sorted(param_values)
                    if best_value == sorted_values[0]:
                        diagnostics['issues'].append(
                            f"Best {param}={best_value} is at MINIMUM of grid"
                        )
                        diagnostics['recommendations'].append(
                            f"Extend {param} grid to smaller values"
                        )
                    elif best_value == sorted_values[-1]:
                        diagnostics['issues'].append(
                            f"Best {param}={best_value} is at MAXIMUM of grid"
                        )
                        diagnostics['recommendations'].append(
                            f"Extend {param} grid to larger values"
                        )
                except TypeError:
                    pass  # Non-comparable types
    
    # === Check 2: High CV variance ===
    best_idx = grid_search_cv.best_index_
    n_splits = grid_search_cv.n_splits_
    
    cv_scores = [results[f'split{i}_test_score'][best_idx] for i in range(n_splits)]
    cv_mean = np.mean(cv_scores)
    cv_std = np.std(cv_scores)
    cv_coeff_var = cv_std / cv_mean if cv_mean != 0 else float('inf')
    
    if cv_coeff_var > 0.1:  # >10% coefficient of variation
        diagnostics['issues'].append(
            f"High CV variance: {cv_std:.4f} (CV={cv_coeff_var:.2%})"
        )
        diagnostics['recommendations'].append(
            "Consider more CV folds, more data, or regularization"
        )
    
    # === Check 3: Overfitting (train >> test gap) ===
    if 'mean_train_score' in results.columns:
        train_score = results['mean_train_score'][best_idx]
        test_score = results['mean_test_score'][best_idx]
        gap = train_score - test_score
        relative_gap = gap / train_score if train_score != 0 else 0
        
        if relative_gap > 0.1:  # >10% relative gap
            diagnostics['issues'].append(
                f"Overfitting detected: train={train_score:.4f}, "
                f"test={test_score:.4f}, gap={gap:.4f}"
            )
            diagnostics['recommendations'].append(
                "Increase regularization or reduce model complexity"
            )
    
    # === Check 4: Flat response surface ===
    score_range = results['mean_test_score'].max() - results['mean_test_score'].min()
    if score_range < 0.01:  # Less than 1% variation
        diagnostics['issues'].append(
            f"Near-flat response: score range = {score_range:.4f}"
        )
        diagnostics['recommendations'].append(
            "Hyperparameters may not matter much; use defaults"
        )
    
    # === Check 5: Multiple near-optimal configurations ===
    best_score = grid_search_cv.best_score_
    near_optimal = results[results['mean_test_score'] >= best_score - 0.005]
    
    if len(near_optimal) > 5:
        diagnostics['issues'].append(
            f"{len(near_optimal)} configurations within 0.5% of best"
        )
        diagnostics['recommendations'].append(
            "Choose simpler/faster configuration among near-optimal"
        )
    
    # Print diagnostic report
    print("="*60)
    print("GRID SEARCH DIAGNOSTIC REPORT")
    print("="*60)
    
    print(f"
Best Score: {grid_search_cv.best_score_:.4f}")
    print(f"Best Parameters: {grid_search_cv.best_params_}")
    print(f"Configurations Evaluated: {len(results)}")
    
    if diagnostics['issues']:
        print(f"
{'⚠ ISSUES DETECTED':=^60}")
        for i, issue in enumerate(diagnostics['issues'], 1):
            print(f"  {i}. {issue}")
        
        print(f"
{'💡 RECOMMENDATIONS':=^60}")
        for i, rec in enumerate(diagnostics['recommendations'], 1):
            print(f"  {i}. {rec}")
    else:
        print(f"
{'✓ No issues detected':=^60}")
    
    return diagnostics
 
 
def visualize_grid_search(grid_search_cv: GridSearchCV, 
                          param1: str, param2: str = None):
    """
    Visualize grid search results for 1D or 2D parameter spaces.
    
    For debugging, visualization often reveals patterns that
    tables miss: local optima, interactions, smoothness.
    """
    results = pd.DataFrame(grid_search_cv.cv_results_)
    
    if param2 is None:
        # 1D visualization
        param_values = results[f'param_{param1}'].values
        scores = results['mean_test_score'].values
        stds = results['std_test_score'].values
        
        print(f"
{param1} vs Score:")
        print("-" * 40)
        for val, score, std in sorted(zip(param_values, scores, stds)):
            bar = '█' * int(score * 50)
            print(f"  {val:>10}: {score:.4f} ± {std:.4f} {bar}")
    else:
        # 2D visualization (heatmap-style)
        p1_vals = sorted(results[f'param_{param1}'].unique())
        p2_vals = sorted(results[f'param_{param2}'].unique())
        
        print(f"
{param1} vs {param2} (scores):")
        print("-" * 60)
        
        # Header
        header = f"{'':>12} | " + " | ".join(f"{v:>8}" for v in p2_vals)
        print(header)
        print("-" * len(header))
        
        for p1 in p1_vals:
            row_scores = []
            for p2 in p2_vals:
                mask = (results[f'param_{param1}'] == p1) & (results[f'param_{param2}'] == p2)
                if mask.any():
                    score = results.loc[mask, 'mean_test_score'].values[0]
                    row_scores.append(f"{score:.4f}")
                else:
                    row_scores.append("   -   ")
            print(f"{p1:>12} | " + " | ".join(row_scores))
 
 
# Example usage
def demonstrate_diagnostics():
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.datasets import make_classification
    
    X, y = make_classification(n_samples=500, n_features=20, random_state=42)
    
    # Intentionally problematic grid (best at boundary)
    param_grid = {
        'n_estimators': [100, 150, 200],  # Best might be > 200
        'max_depth': [2, 3, 4],           # Too shallow
        'min_samples_split': [2, 5, 10],
    }
    
    grid_search = GridSearchCV(
        RandomForestClassifier(random_state=42),
        param_grid,
        cv=5,
        return_train_score=True,
        n_jobs=-1
    )
    grid_search.fit(X, y)
    
    # Run diagnostics
    diagnostics = diagnose_grid_search_results(grid_search)
    
    # Visualize
    visualize_grid_search(grid_search, 'max_depth')
    visualize_grid_search(grid_search, 'n_estimators', 'max_depth')
    
    return diagnostics
 
 
demonstrate_diagnostics()

Best Practices Checklist

Before, during, and after grid search, follow this checklist to ensure robust results.

Before Grid Search

•Estimate cost: Calculate total training runs before starting
•Check dimensionality: If d > 4, consider random search instead
•Validate ranges: Ensure ranges include defaults and known good values
•Use appropriate spacing: Log-scale for magnitude-spanning params
•Set random seeds: Ensure reproducibility
•Prepare holdout test set: Never tune on final test data

During Grid Search

•Monitor progress: Watch for early signs of issues
•Check for errors: Some configurations may fail to train
•Track resource usage: Memory and CPU to anticipate problems
•Save intermediate results: In case of crashes
•Use parallelization: Match workers to available cores

After Grid Search

•Check boundary effects: Are best values at grid edges?
•Analyze CV variance: High variance suggests instability
•Compare train vs validation: Detect overfitting
•Retrain on full training data: Using best hyperparameters
•Evaluate on holdout test set: Final unbiased performance
•Document everything: Parameters, ranges, results for reproducibility

The Holdout Test Set is Sacred

Algorithm-Specific Guidelines

Different algorithms have specific tuning patterns that experienced practitioners know.

Gradient Boosting (XGBoost, LightGBM, CatBoost):

First tune learning_rate and n_estimators together (inverse relationship)
Then tune tree structure: max_depth, min_child_weight
Finally tune regularization: subsample, colsample_bytree, reg_alpha/lambda
Use early stopping with a validation set rather than fixed n_estimators

Random Forest:

n_estimators: More is generally better (diminishing returns after ~300)
max_features: Most impactful; try 'sqrt', 'log2', and 0.3-0.5
max_depth: Usually None (full depth) works well
class_weight: 'balanced' for imbalanced datasets

SVM:

Kernel choice first: 'rbf' is most common, try 'linear' for high-d sparse data
C and gamma must be tuned jointly (interact strongly)
Use very wide log-scale ranges: C ∈ [0.001, 1000], gamma ∈ [0.0001, 10]
Consider feature scaling — SVM is sensitive to feature magnitude

Priority Hyperparameters by Algorithm
Algorithm	Highest Priority	Medium Priority	Lower Priority
XGBoost/LightGBM	learning_rate, n_estimators	max_depth, min_child_weight	subsample, colsample, reg_lambda
Random Forest	max_features, n_estimators	max_depth, min_samples_split	bootstrap, class_weight
SVM (RBF)	C, gamma	kernel (if uncertain)	degree (poly), coef0
Neural Network	learning_rate, hidden_sizes	alpha (regularization), batch_size	activation, solver
Ridge/Lasso	alpha	— (only alpha matters)	—

Use Early Stopping for Iterative Algorithms

Summary: Practical Wisdom for Grid Search

We have distilled practical wisdom for designing and executing effective grid searches. Let's consolidate the key guidelines:

Key Takeaways

•Use curated range templates — Don't reinvent the wheel; leverage established ranges for common algorithms.
•Match resolution to budget — The 3-5-10 rule and budget-first calculation help choose appropriate granularity.
•Apply established patterns — Coarse-to-fine, staged tuning, and grid+random hybrids encode best practices.
•Debug systematically — Check for boundary effects, high variance, overfitting, and flat response surfaces.
•Follow the checklist — Before, during, and after grid search, systematic checks prevent common mistakes.
•Know your algorithm — Each algorithm has priority hyperparameters; focus effort where it matters most.

What's Next:

Page Complete

4 / 5