Loading learning content...
Grid search has been the default hyperparameter optimization method for decades, but modern alternatives—random search, Bayesian optimization, multi-fidelity methods—challenge its dominance. When should you still choose grid search?
This page provides the definitive decision framework. We synthesize:
By the end, you will have principled criteria for deciding when grid search is optimal, when alternatives are better, and when grid search serves as an effective component in hybrid strategies.
By the end of this page, you will understand the specific scenarios where grid search excels, where it fails, and how to make informed method selection decisions. You'll have a practical decision tree for hyperparameter optimization strategy selection.
Despite its limitations, grid search possesses strengths that no other method fully replicates. Understanding these strengths clarifies when grid search remains the best choice.
Strength 1: Complete Enumeration Guarantee
Grid search evaluates every configuration in the defined grid. This provides mathematical certainty:
$$\lambda^* = \arg\min_{\lambda \in G} \mathcal{L}(\lambda)$$
No stochastic method can guarantee finding the grid-optimal solution. For applications requiring provable coverage, grid search is irreplaceable.
Strength 2: Perfect Reproducibility
Given the same grid and random seeds, grid search produces identical results every time. This determinism is crucial for:
Strength 3: No Meta-Hyperparameters
Random search requires choosing the number of iterations. Bayesian optimization has acquisition functions, surrogate model choices, and exploration-exploitation parameters. Grid search has none—just define the grid and run.
Even when using advanced methods, a quick grid search baseline is valuable. It establishes what 'easy' optimization achieves, provides interpretable insights into hyperparameter sensitivity, and gives a fallback if complex methods fail.
Grid search's sweet spot is clearly defined by dimensionality. Within this regime, it often outperforms alternatives.
The d ≤ 3 Regime:
With 2-3 hyperparameters:
In this regime, grid search often finds better solutions than random search with the same budget because it doesn't waste samples on redundant regions.
The d = 4-5 Transition Zone:
With 4-5 hyperparameters:
The d > 5 Breakdown:
With 6+ hyperparameters:
| Dimensions | Grid Search | Random Search | Bayesian Opt | Recommended |
|---|---|---|---|---|
| 1-2 | Excellent | Wasteful | Overkill | Grid Search |
| 3 | Good | Comparable | Slightly better | Grid or Random |
| 4-5 | Expensive | Good | Better | Random or Bayesian |
| 6-10 | Impractical | Good | Best | Bayesian or Multi-fidelity |
10 | Impossible | Adequate | Good | Bayesian + Multi-fidelity |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
from typing import Dict, List, Tuplefrom enum import Enum class HPOMethod(Enum): GRID = "Grid Search" RANDOM = "Random Search" BAYESIAN = "Bayesian Optimization" MULTI_FIDELITY = "Multi-fidelity (Hyperband/BOHB)" MANUAL = "Manual Tuning" def recommend_hpo_method( n_hyperparameters: int, training_time_seconds: float, budget_hours: float, cv_folds: int = 5, requires_reproducibility: bool = False, has_hpo_expertise: bool = True) -> Dict: """ Recommend HPO method based on problem characteristics. Parameters: ----------- n_hyperparameters: Number of hyperparameters to tune training_time_seconds: Time to train one model budget_hours: Total compute budget in hours cv_folds: Cross-validation folds requires_reproducibility: Must be exactly reproducible has_hpo_expertise: Team has experience with advanced HPO """ budget_seconds = budget_hours * 3600 max_evaluations = budget_seconds / (training_time_seconds * cv_folds) # Calculate feasible grid resolution if n_hyperparameters > 0: grid_resolution = int(max_evaluations ** (1 / n_hyperparameters)) grid_resolution = max(2, min(10, grid_resolution)) grid_size = grid_resolution ** n_hyperparameters else: grid_resolution = 0 grid_size = 0 # Scoring for each method scores = {method: 0.0 for method in HPOMethod} reasoning = {method: [] for method in HPOMethod} # === Dimension-based scoring === if n_hyperparameters <= 2: scores[HPOMethod.GRID] += 3 reasoning[HPOMethod.GRID].append("Low dimensions - grid is optimal") scores[HPOMethod.RANDOM] -= 1 reasoning[HPOMethod.RANDOM].append("Wasteful in low dimensions") elif n_hyperparameters <= 3: scores[HPOMethod.GRID] += 2 scores[HPOMethod.RANDOM] += 1 scores[HPOMethod.BAYESIAN] += 1 reasoning[HPOMethod.GRID].append("3D - grid still effective") elif n_hyperparameters <= 5: scores[HPOMethod.GRID] += 0 scores[HPOMethod.RANDOM] += 2 scores[HPOMethod.BAYESIAN] += 2 reasoning[HPOMethod.RANDOM].append("4-5D - random becomes competitive") reasoning[HPOMethod.BAYESIAN].append("4-5D - can model correlations") elif n_hyperparameters <= 10: scores[HPOMethod.GRID] -= 3 scores[HPOMethod.RANDOM] += 1 scores[HPOMethod.BAYESIAN] += 3 scores[HPOMethod.MULTI_FIDELITY] += 2 reasoning[HPOMethod.GRID].append("6-10D - grid is impractical") reasoning[HPOMethod.BAYESIAN].append("6-10D - Bayesian excels") else: scores[HPOMethod.GRID] -= 5 scores[HPOMethod.RANDOM] += 1 scores[HPOMethod.BAYESIAN] += 2 scores[HPOMethod.MULTI_FIDELITY] += 3 reasoning[HPOMethod.MULTI_FIDELITY].append(">10D - multi-fidelity essential") # === Budget-based scoring === if grid_size <= max_evaluations: scores[HPOMethod.GRID] += 1 reasoning[HPOMethod.GRID].append(f"Budget allows {grid_resolution}-resolution grid") else: scores[HPOMethod.GRID] -= 2 reasoning[HPOMethod.GRID].append("Budget insufficient for meaningful grid") if training_time_seconds > 300: # > 5 minutes per train scores[HPOMethod.MULTI_FIDELITY] += 2 scores[HPOMethod.BAYESIAN] += 1 reasoning[HPOMethod.MULTI_FIDELITY].append("Long training - multi-fidelity saves time") # === Requirements-based scoring === if requires_reproducibility: scores[HPOMethod.GRID] += 3 scores[HPOMethod.RANDOM] -= 1 reasoning[HPOMethod.GRID].append("Reproducibility required - grid is deterministic") reasoning[HPOMethod.RANDOM].append("Randomness complicates reproducibility") if not has_hpo_expertise: scores[HPOMethod.GRID] += 2 scores[HPOMethod.RANDOM] += 1 scores[HPOMethod.BAYESIAN] -= 2 scores[HPOMethod.MULTI_FIDELITY] -= 3 reasoning[HPOMethod.GRID].append("No expertise - grid is simplest") reasoning[HPOMethod.BAYESIAN].append("Requires expertise to tune properly") # === Determine recommendation === best_method = max(scores.keys(), key=lambda m: scores[m]) # Build result result = { 'recommended_method': best_method, 'scores': scores, 'reasoning': {m: r for m, r in reasoning.items() if r}, 'analysis': { 'n_hyperparameters': n_hyperparameters, 'max_evaluations': int(max_evaluations), 'feasible_grid_resolution': grid_resolution, 'feasible_grid_size': grid_size, } } return result def print_recommendation(result: Dict): """Pretty-print the recommendation.""" print("="*60) print("HPO METHOD RECOMMENDATION") print("="*60) print(f"\nRecommended Method: {result['recommended_method'].value}") print(f"\nAnalysis:") for key, value in result['analysis'].items(): print(f" {key}: {value}") print(f"\nMethod Scores:") for method, score in sorted(result['scores'].items(), key=lambda x: -x[1]): print(f" {method.value}: {score:+.1f}") print(f"\nReasoning:") for method, reasons in result['reasoning'].items(): if reasons: print(f" {method.value}:") for reason in reasons: print(f" - {reason}") # Example scenariosscenarios = [ { 'name': "Quick SVM tuning", 'n_hyperparameters': 2, 'training_time_seconds': 5, 'budget_hours': 1, 'requires_reproducibility': True, 'has_hpo_expertise': False, }, { 'name': "XGBoost with many params", 'n_hyperparameters': 7, 'training_time_seconds': 30, 'budget_hours': 8, 'requires_reproducibility': False, 'has_hpo_expertise': True, }, { 'name': "Deep learning NAS", 'n_hyperparameters': 15, 'training_time_seconds': 600, 'budget_hours': 48, 'requires_reproducibility': False, 'has_hpo_expertise': True, },] for scenario in scenarios: print(f"\n{'='*60}") print(f"SCENARIO: {scenario['name']}") del scenario['name'] result = recommend_hpo_method(**scenario) print_recommendation(result)Meta-analyses and benchmarking studies provide empirical evidence for method selection. Key findings:
Bergstra & Bengio (2012) - Random vs Grid:
Auto-WEKA/Auto-sklearn Experiments:
Hyperband/BOHB Studies:
| Scenario | Grid | Random | Bayesian | Hyperband |
|---|---|---|---|---|
| 2D, 100 eval budget | Best | Worse | Overkill | N/A |
| 5D, 100 eval budget | OK | Good | Best | Good |
| 10D, 100 eval budget | Poor | OK | Best | Better |
| 10D, 30 eval budget | Poor | Best | OK (insufficient data) | Good |
| Deep learning, long training | Impractical | OK | Good | Best |
| Many discrete params | Best | OK | OK | OK |
Grid search wins in low dimensions with discrete hyperparameters and when reproducibility matters. Random search wins for quick exploration in medium dimensions. Bayesian optimization wins with sufficient budget in higher dimensions. Multi-fidelity wins when training is expensive and can be cheaply approximated.
Even when grid search isn't the primary method, it often plays a valuable role in hybrid optimization strategies.
Pattern 1: Grid for Primary, Random for Secondary
Use grid search on the 2-3 most important hyperparameters (identified through sensitivity analysis), then random search on the rest. This ensures thorough coverage where it matters while efficiently exploring less critical dimensions.
Pattern 2: Grid Search as Initialization
Run a coarse grid search first to identify promising regions, then use Bayesian optimization to refine. The grid provides a diverse initial sample for the surrogate model, avoiding early convergence to local optima.
Pattern 3: Grid Search for Final Refinement
After Bayesian optimization identifies a promising region, use a fine local grid to exhaustively search that region. This catches small improvements that sequential methods might miss.
Pattern 4: Grid Search for Discrete, Other for Continuous
For mixed hyperparameter spaces, grid search enumerates discrete options while other methods optimize continuous parameters conditional on each discrete setting.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225
import numpy as npfrom sklearn.model_selection import GridSearchCV, RandomizedSearchCVfrom sklearn.gaussian_process import GaussianProcessRegressorfrom scipy.stats import uniform, randintfrom typing import Dict, List, Any, Callablefrom itertools import product def grid_for_primary_random_for_secondary( estimator, X, y, primary_grid: Dict[str, List], secondary_distributions: Dict[str, Any], n_random_iter: int = 50, cv: int = 5): """ Pattern 1: Grid search on primary hyperparameters, random search on secondary hyperparameters. Combines thorough coverage of important dimensions with efficient exploration of less critical ones. """ print("=== Hybrid: Grid (Primary) + Random (Secondary) ===") # Phase 1: Grid on primary print(f"\nPhase 1: Grid Search on {list(primary_grid.keys())}") primary_size = np.prod([len(v) for v in primary_grid.values()]) print(f" Grid size: {primary_size}") grid_search = GridSearchCV(estimator, primary_grid, cv=cv, n_jobs=-1) grid_search.fit(X, y) best_primary = grid_search.best_params_ print(f" Best: {best_primary}, score: {grid_search.best_score_:.4f}") # Phase 2: Random on secondary with primary fixed print(f"\nPhase 2: Random Search on {list(secondary_distributions.keys())}") full_distributions = { **{k: [v] for k, v in best_primary.items()}, **secondary_distributions, } random_search = RandomizedSearchCV( estimator, full_distributions, n_iter=n_random_iter, cv=cv, n_jobs=-1, random_state=42 ) random_search.fit(X, y) print(f" Best: {random_search.best_params_}, score: {random_search.best_score_:.4f}") print(f"\nTotal evaluations: {primary_size + n_random_iter}") return random_search.best_params_, random_search.best_score_ def grid_as_bayesian_initialization( objective_fn: Callable, param_bounds: Dict[str, tuple], init_grid_resolution: int = 3, bo_iterations: int = 50): """ Pattern 2: Grid search provides initial samples for Bayesian optimization. The grid ensures diverse initial samples, preventing the surrogate model from focusing too early on a single region. """ from scipy.optimize import minimize print("=== Hybrid: Grid Initialization + Bayesian Optimization ===") param_names = list(param_bounds.keys()) n_dims = len(param_names) # Phase 1: Coarse grid for initialization print(f"\nPhase 1: Grid Initialization ({init_grid_resolution}^{n_dims} points)") grid_points = [] grid_results = [] for combo in product(*[np.linspace(lo, hi, init_grid_resolution) for lo, hi in param_bounds.values()]): config = dict(zip(param_names, combo)) score = objective_fn(config) grid_points.append(list(combo)) grid_results.append(score) best_grid_idx = np.argmin(grid_results) print(f" Evaluated {len(grid_results)} configurations") print(f" Best grid score: {grid_results[best_grid_idx]:.4f}") # Phase 2: Bayesian optimization starting from grid samples print(f"\nPhase 2: Bayesian Optimization ({bo_iterations} iterations)") X_init = np.array(grid_points) y_init = np.array(grid_results) # Fit initial GP surrogate gp = GaussianProcessRegressor(normalize_y=True, random_state=42) gp.fit(X_init, y_init) # Simple expected improvement acquisition def expected_improvement(x, gp, y_best, xi=0.01): from scipy.stats import norm mu, sigma = gp.predict(x.reshape(1, -1), return_std=True) sigma = sigma + 1e-9 z = (y_best - mu - xi) / sigma ei = sigma * (z * norm.cdf(z) + norm.pdf(z)) return -ei.item() # Minimize negative EI X_all = list(X_init) y_all = list(y_init) for i in range(bo_iterations): y_best = min(y_all) # Optimize acquisition function bounds = list(param_bounds.values()) best_ei = float('inf') best_x = None # Multi-start optimization for _ in range(20): x0 = [np.random.uniform(lo, hi) for lo, hi in bounds] res = minimize( lambda x: expected_improvement(x, gp, y_best), x0, bounds=bounds, method='L-BFGS-B' ) if res.fun < best_ei: best_ei = res.fun best_x = res.x # Evaluate at best point config = dict(zip(param_names, best_x)) score = objective_fn(config) X_all.append(best_x) y_all.append(score) # Update GP gp.fit(np.array(X_all), np.array(y_all)) best_idx = np.argmin(y_all) best_config = dict(zip(param_names, X_all[best_idx])) print(f" Best BO score: {y_all[best_idx]:.4f}") print(f" Best config: {best_config}") print(f"\nTotal evaluations: {len(grid_results) + bo_iterations}") return best_config, y_all[best_idx] def grid_for_discrete_continuous_hybrid( estimator_class, discrete_grid: Dict[str, List], continuous_distributions: Dict[str, Any], X, y, n_random_per_discrete: int = 20, cv: int = 5): """ Pattern 4: Grid search for discrete hyperparameters, random search for continuous hyperparameters. For each discrete combination, run random search on continuous. This ensures all discrete options are explored while efficiently searching continuous spaces. """ print("=== Hybrid: Grid (Discrete) + Random (Continuous) ===") param_names = list(discrete_grid.keys()) all_results = [] # Enumerate all discrete combinations discrete_combos = list(product(*discrete_grid.values())) print(f"\nDiscrete combinations: {len(discrete_combos)}") print(f"Random samples per discrete: {n_random_per_discrete}") print(f"Total evaluations: {len(discrete_combos) * n_random_per_discrete}") for combo in discrete_combos: discrete_config = dict(zip(param_names, combo)) print(f"\n Discrete: {discrete_config}") # Create estimator with fixed discrete params class FixedDiscreteEstimator: def __init__(self, **kwargs): self.model = estimator_class(**discrete_config, **kwargs) def fit(self, X, y): self.model.fit(X, y) return self def predict(self, X): return self.model.predict(X) def score(self, X, y): return self.model.score(X, y) def get_params(self, deep=True): return self.model.get_params(deep) def set_params(self, **params): self.model.set_params(**params) return self # Random search on continuous for this discrete combo random_search = RandomizedSearchCV( FixedDiscreteEstimator(), continuous_distributions, n_iter=n_random_per_discrete, cv=cv, n_jobs=-1, random_state=42 ) random_search.fit(X, y) result = { 'discrete': discrete_config, 'continuous': random_search.best_params_, 'score': random_search.best_score_, } all_results.append(result) print(f" Best continuous: {random_search.best_params_}") print(f" Score: {random_search.best_score_:.4f}") # Find overall best best = max(all_results, key=lambda x: x['score']) print(f"\n=== Best Overall ===") print(f"Discrete: {best['discrete']}") print(f"Continuous: {best['continuous']}") print(f"Score: {best['score']:.4f}") return bestSynthesizing all considerations, here is a practical decision tree for hyperparameter optimization method selection:
Step 1: Count Hyperparameters
Step 2: Assess Budget Relative to Grid Size
If grid search is under consideration:
Step 3: Apply Modifiers
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178
from enum import Enumfrom dataclasses import dataclassfrom typing import Optional class HPOMethod(Enum): GRID = "Grid Search" RANDOM = "Random Search" BAYESIAN = "Bayesian Optimization" HYPERBAND = "Hyperband / Multi-fidelity" BOHB = "BOHB (Bayesian + Hyperband)" MANUAL = "Manual / Default" @dataclassclass HPORecommendation: primary: HPOMethod alternative: Optional[HPOMethod] confidence: str # "high", "medium", "low" reasoning: str def hpo_decision_tree( n_hyperparameters: int, n_discrete: int = 0, training_time_minutes: float = 1.0, budget_evaluations: int = 100, requires_reproducibility: bool = False, has_hpo_expertise: bool = True, expected_low_effective_dim: bool = False,) -> HPORecommendation: """ Decision tree for HPO method selection. Parameters: ----------- n_hyperparameters: Total hyperparameters to tune n_discrete: Number that are discrete/categorical training_time_minutes: Time to train one model budget_evaluations: Maximum model evaluations available requires_reproducibility: Must be exactly reproducible has_hpo_expertise: Team comfortable with advanced methods expected_low_effective_dim: Expect few hyperparameters to matter """ n_continuous = n_hyperparameters - n_discrete # Step 1: Dimensionality check if n_hyperparameters <= 2: # Low dimensions: Grid is almost always best if requires_reproducibility or not has_hpo_expertise: return HPORecommendation( primary=HPOMethod.GRID, alternative=None, confidence="high", reasoning="1-2 dimensions: Grid search is optimal, " "providing complete coverage with tractable cost." ) else: return HPORecommendation( primary=HPOMethod.GRID, alternative=HPOMethod.RANDOM, confidence="high", reasoning="1-2 dimensions: Grid search recommended. " "Random search is acceptable but offers no benefit." ) elif n_hyperparameters == 3: # Transition zone grid_size = 5 ** 3 # 125 if budget_evaluations >= grid_size: return HPORecommendation( primary=HPOMethod.GRID, alternative=HPOMethod.RANDOM, confidence="medium", reasoning=f"3 dimensions with sufficient budget ({budget_evaluations} >= {grid_size}): " "Grid search provides complete coverage." ) else: return HPORecommendation( primary=HPOMethod.RANDOM, alternative=HPOMethod.BAYESIAN, confidence="medium", reasoning=f"3 dimensions with limited budget: " "Random search is more efficient than sparse grid." ) elif n_hyperparameters <= 5: # Medium dimensions if expected_low_effective_dim: return HPORecommendation( primary=HPOMethod.RANDOM, alternative=HPOMethod.BAYESIAN, confidence="high", reasoning="4-5 dimensions with low effective dimensionality: " "Random search efficiently explores important dimensions." ) elif has_hpo_expertise and budget_evaluations >= 50: return HPORecommendation( primary=HPOMethod.BAYESIAN, alternative=HPOMethod.RANDOM, confidence="medium", reasoning="4-5 dimensions with sufficient budget and expertise: " "Bayesian optimization can model response surface." ) else: return HPORecommendation( primary=HPOMethod.RANDOM, alternative=HPOMethod.GRID, confidence="medium", reasoning="4-5 dimensions: Random search is reliable and simple. " "Consider staged grid search as alternative." ) elif n_hyperparameters <= 10: # High dimensions if training_time_minutes > 10: return HPORecommendation( primary=HPOMethod.HYPERBAND if not has_hpo_expertise else HPOMethod.BOHB, alternative=HPOMethod.BAYESIAN, confidence="high", reasoning="6-10 dimensions with long training: " "Multi-fidelity methods essential for efficiency." ) elif has_hpo_expertise and budget_evaluations >= 100: return HPORecommendation( primary=HPOMethod.BAYESIAN, alternative=HPOMethod.RANDOM, confidence="medium", reasoning="6-10 dimensions: Bayesian optimization with " "sufficient budget outperforms random search." ) else: return HPORecommendation( primary=HPOMethod.RANDOM, alternative=HPOMethod.BAYESIAN, confidence="medium", reasoning="6-10 dimensions with constraints: " "Random search is robust and simple." ) else: # Very high dimensions if training_time_minutes > 5: return HPORecommendation( primary=HPOMethod.BOHB, alternative=HPOMethod.HYPERBAND, confidence="high", reasoning=">10 dimensions with expensive training: " "BOHB combines Bayesian optimization with multi-fidelity." ) else: return HPORecommendation( primary=HPOMethod.BAYESIAN, alternative=HPOMethod.RANDOM, confidence="medium", reasoning=">10 dimensions: Bayesian optimization recommended, " "but consider dimensionality reduction." ) # Example usageprint("HPO METHOD DECISION TREE")print("="*60) scenarios = [ {"n_hyperparameters": 2, "budget_evaluations": 50, "training_time_minutes": 0.5}, {"n_hyperparameters": 3, "budget_evaluations": 200, "training_time_minutes": 1}, {"n_hyperparameters": 5, "budget_evaluations": 100, "expected_low_effective_dim": True}, {"n_hyperparameters": 8, "training_time_minutes": 15, "has_hpo_expertise": True}, {"n_hyperparameters": 12, "training_time_minutes": 30, "budget_evaluations": 200},] for scenario in scenarios: print(f"\nScenario: {scenario}") rec = hpo_decision_tree(**scenario) print(f" Primary: {rec.primary.value}") if rec.alternative: print(f" Alternative: {rec.alternative.value}") print(f" Confidence: {rec.confidence}") print(f" Reasoning: {rec.reasoning}")In practice, grid search for 2-3 critical hyperparameters combined with defaults for the rest often achieves 80% of the gain from extensive HPO. Sophisticated methods are not always worth the added complexity and potential failure modes.
Let's examine concrete scenarios where grid search is and isn't appropriate.
Case Study 1: Regularization Tuning for Logistic Regression
Verdict: Grid Search ✓
This is grid search's ideal scenario. A grid of C ∈ [0.001, 0.01, 0.1, 1, 10, 100, 1000] × penalty ∈ ['l1', 'l2'] yields 14 configurations—complete coverage in seconds.
Case Study 2: XGBoost Hyperparameter Tuning
Verdict: Grid Search ✗
Even with 3 values per hyperparameter, $3^8 = 6,561$ configurations × 5-fold CV × 30 seconds ≈ 27 days. Use staged grid search on important dimensions, or Bayesian optimization.
| Scenario | Dimensions | Training Time | Grid Search? | Reasoning |
|---|---|---|---|---|
| Logistic Regression regularization | 2 | < 1s | ✓ Yes | Low-d, fast training, full coverage easy |
| SVM kernel and C/gamma | 3 (kernel, C, gamma) | 5-30s | ✓ Yes | Low-d, standard ranges work well |
| Random Forest tuning | 4-5 | 5s | ⚠ Maybe | Borderline; coarse grid or staged approach |
| Full XGBoost tuning | 6-8 | 30s | ✗ No | Too many dimensions; use Bayesian/random |
| Neural network architecture | 10+ | minutes | ✗ No | High-d, expensive; use multi-fidelity |
| Deep learning NAS | 20+ | hours | ✗ No | Completely impractical; specialized methods |
Most production ML involves 3-6 hyperparameters that matter. For this regime, grid search on the 2-3 most important, combined with defaults or quick random search on the rest, is a pragmatic approach that balances quality with simplicity.
We have established clear criteria for when grid search is the right tool and when alternatives are better. Let's consolidate the decision framework:
You have mastered grid search for hyperparameter optimization. You understand its theoretical foundations, computational costs, dimensional limitations, practical applications, and proper place in the broader HPO toolkit. Apply these principles to make informed method selection decisions in your machine learning projects.
What's Next:
With grid search mastered, the next module explores Random Search, the surprisingly effective alternative that the Bergstra-Bengio paper demonstrated often outperforms grid search. Understanding random search deepens appreciation for why grid search works when it does—and why it fails when it doesn't.