Loading learning content...
Imagine two runners in a race. One starts at the starting line. The other starts 80% of the way to the finish. Who wins?
This analogy captures the essence of warm starting in hyperparameter optimization. Without warm starting, optimization algorithms begin from random configurations—the starting line. With warm starting, they begin from configurations that meta-learning suggests will perform well—often already close to optimal.
The impact is dramatic:
Warm starting is the practical bridge between meta-learning and production AutoML. It's how systems like Auto-sklearn achieve their remarkable efficiency—by never truly starting from scratch when solving a new problem.
This page explores warm starting techniques: from simple configuration transfer to sophisticated transfer acquisition functions, from single-fidelity to multi-fidelity approaches.
By the end of this page, you will understand warm-starting strategies for Bayesian optimization, how to transfer configurations from similar tasks, multi-fidelity warm starting with Hyperband, transfer acquisition functions, and practical implementation in Auto-sklearn and other systems.
Before diving into techniques, let's establish why warm starting matters and quantify its benefits.
The Cold-Start Problem in Optimization:
Standard hyperparameter optimization starts with no knowledge:
The initial and exploration phases are wasteful if we have prior knowledge about what configurations work. Warm starting eliminates this waste.
Empirical Evidence:
Studies show that warm-started Bayesian optimization:
When Warm Starting Helps Most:
| Scenario | Random Init (evals to 90% best) | Warm Start (evals to 90% best) | Speedup |
|---|---|---|---|
| Small dataset (< 1K samples) | ~30 | ~3 | 10x |
| Medium dataset (10K samples) | ~50 | ~8 | 6x |
| Large dataset (100K samples) | ~80 | ~15 | 5x |
| Novel domain | ~100 | ~40 | 2.5x |
Warm starting provides the largest relative gains early in optimization. With unlimited budget, cold-start eventually catches up. The value of warm starting is in reaching good performance faster, not necessarily in reaching better performance.
The simplest warm-starting approach: find similar tasks from the meta-database and use their best configurations as starting points.
The Algorithm:
This is exactly what Auto-sklearn does with k=25 similar datasets.
Why It Works:
If similar datasets benefit from similar configurations, transferred configurations should:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
import numpy as npfrom sklearn.neighbors import NearestNeighborsfrom sklearn.preprocessing import StandardScalerfrom typing import List, Dict, Tuple, Anyfrom dataclasses import dataclass @dataclassclass Configuration: """A hyperparameter configuration with metadata.""" algorithm: str hyperparameters: Dict[str, Any] performance: float source_dataset: str class ConfigurationTransfer: """ Transfer best configurations from similar past tasks. This is the core warm-starting mechanism in Auto-sklearn. """ def __init__(self, k_neighbors: int = 25, metric: str = 'manhattan'): """ Parameters: k_neighbors: Number of similar datasets to retrieve from metric: Distance metric for similarity """ self.k = k_neighbors self.metric = metric self.scaler = StandardScaler() self.knn = NearestNeighbors(n_neighbors=k_neighbors, metric=metric) def fit(self, meta_features: np.ndarray, dataset_ids: List[str], configurations: Dict[str, List[Configuration]]): """ Build the transfer model from historical data. Parameters: meta_features: (n_datasets, n_meta_features) array dataset_ids: List of dataset identifiers configurations: Dict mapping dataset_id to list of tried configurations """ # Store and normalize meta-features self.meta_features = self.scaler.fit_transform(meta_features) self.dataset_ids = dataset_ids self.configurations = configurations # Fit k-NN model self.knn.fit(self.meta_features) # Precompute best configuration per dataset self.best_configs = {} for did in dataset_ids: if did in configurations and configurations[did]: best = max(configurations[did], key=lambda c: c.performance) self.best_configs[did] = best def get_warm_start_configs(self, new_meta_features: np.ndarray, n_configs: int = 25, diversity_aware: bool = True) -> List[Configuration]: """ Get warm-start configurations for a new dataset. Parameters: new_meta_features: Meta-features of the new dataset n_configs: Maximum number of configurations to return diversity_aware: If True, ensure diverse algorithm coverage Returns: List of configurations to evaluate first """ # Find similar datasets mf_scaled = self.scaler.transform(new_meta_features.reshape(1, -1)) distances, indices = self.knn.kneighbors(mf_scaled) # Collect configurations from similar datasets candidates = [] for idx, dist in zip(indices[0], distances[0]): dataset_id = self.dataset_ids[idx] if dataset_id in self.best_configs: config = self.best_configs[dataset_id] candidates.append({ 'config': config, 'distance': dist, 'source': dataset_id }) if diversity_aware: # Ensure we have diverse algorithms, not just 25 random forests selected = self._select_diverse(candidates, n_configs) else: # Simple: take top-n by distance candidates.sort(key=lambda x: x['distance']) selected = [c['config'] for c in candidates[:n_configs]] return selected def _select_diverse(self, candidates: List[Dict], n: int) -> List[Configuration]: """ Select diverse configurations covering different algorithms. Greedy selection: pick best for each algorithm, then fill remaining slots with globally best. """ # Group by algorithm by_algorithm = {} for c in candidates: algo = c['config'].algorithm if algo not in by_algorithm: by_algorithm[algo] = [] by_algorithm[algo].append(c) # Sort within each group for algo in by_algorithm: by_algorithm[algo].sort(key=lambda x: x['distance']) selected = [] selected_set = set() # First pass: one per algorithm for algo, algo_candidates in by_algorithm.items(): if algo_candidates and len(selected) < n: config = algo_candidates[0]['config'] config_key = (config.algorithm, tuple(sorted(config.hyperparameters.items()))) if config_key not in selected_set: selected.append(config) selected_set.add(config_key) # Second pass: fill remaining slots with best overall all_sorted = sorted(candidates, key=lambda x: x['distance']) for c in all_sorted: if len(selected) >= n: break config = c['config'] config_key = (config.algorithm, tuple(sorted(config.hyperparameters.items()))) if config_key not in selected_set: selected.append(config) selected_set.add(config_key) return selected def estimate_transfer_difficulty(self, new_meta_features: np.ndarray) -> float: """ Estimate how difficult transfer will be (how different is this dataset?). High distance to neighbors suggests warm starting may be less effective. """ mf_scaled = self.scaler.transform(new_meta_features.reshape(1, -1)) distances, _ = self.knn.kneighbors(mf_scaled) # Use median distance to k neighbors as difficulty estimate return np.median(distances[0])Practical Considerations:
1. How many configurations to transfer?
Auto-sklearn uses k=25 as a balanced choice:
2. Configuration validity:
Transferred configurations must be valid in the current search space. If the new problem has different:
3. Staleness:
Older configurations may use outdated algorithm versions. Include recency in selection criteria.
If k similar datasets all favor Random Forest, you'll get k Random Forest configurations. Ensure diversity by selecting at least one configuration per algorithm type before filling remaining slots. This prevents missing algorithms that might excel on the new dataset despite not being best on similar ones.
Bayesian optimization builds a surrogate model of the objective function and uses an acquisition function to select promising configurations. Warm starting can accelerate both components.
Approach 1: Warm-Starting the Surrogate Model
The surrogate model (Gaussian Process or Random Forest) maps configurations to expected performance. We can initialize it with:
Approach 2: Warm-Starting the Search
Keep the surrogate cold-started (no prior bias) but:
Why Approach 2 Often Works Better:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198
import numpy as npfrom typing import List, Dict, Callable, Tuple, Anyfrom sklearn.gaussian_process import GaussianProcessRegressorfrom sklearn.gaussian_process.kernels import Matern, WhiteKernelfrom scipy.stats import norm class WarmStartBayesianOptimization: """ Bayesian optimization with warm-starting from transferred configurations. Combines configuration transfer with Gaussian Process-based BO. """ def __init__(self, config_space, warm_start_configs: List[Dict[str, Any]] = None, n_random_init: int = 5): """ Parameters: config_space: Configuration space (ConfigSpace object) warm_start_configs: Configurations to evaluate first n_random_init: Random configs to add if fewer warm-start configs """ self.config_space = config_space self.warm_start_configs = warm_start_configs or [] self.n_random_init = n_random_init # GP surrogate kernel = Matern(nu=2.5) + WhiteKernel(noise_level=0.1) self.gp = GaussianProcessRegressor(kernel=kernel, normalize_y=True) # Track observations self.X_observed = [] self.y_observed = [] def _config_to_vector(self, config: Dict) -> np.ndarray: """Convert configuration to numerical vector.""" # Implementation depends on config_space # Simplified: assume all values are numeric return np.array(list(config.values())) def _expected_improvement(self, X: np.ndarray, y_best: float, xi: float = 0.01) -> np.ndarray: """ Expected Improvement acquisition function. Parameters: X: Candidate configurations (n_candidates, n_dims) y_best: Best observed value so far xi: Exploration-exploitation trade-off Returns: EI value for each candidate """ mu, sigma = self.gp.predict(X, return_std=True) # Handle zero variance with np.errstate(divide='ignore', invalid='ignore'): z = (mu - y_best - xi) / sigma ei = (mu - y_best - xi) * norm.cdf(z) + sigma * norm.pdf(z) ei[sigma < 1e-10] = 0.0 return ei def _select_next_random(self) -> Dict: """Sample a random configuration.""" return dict(self.config_space.sample_configuration()) def _select_next_acquisition(self) -> Dict: """Select next configuration using acquisition function.""" # Generate candidate pool n_candidates = 1000 candidates = [self._config_to_vector(dict(self.config_space.sample_configuration())) for _ in range(n_candidates)] X_candidates = np.array(candidates) # Compute acquisition values y_best = max(self.y_observed) # Assuming maximization ei_values = self._expected_improvement(X_candidates, y_best) # Return best candidate best_idx = np.argmax(ei_values) # Convert back to config (simplified) # In practice, need reverse mapping from vector to config return dict(self.config_space.sample_configuration()) def optimize(self, objective_func: Callable, n_iterations: int = 50) -> Tuple[Dict, float]: """ Run warm-started Bayesian optimization. Parameters: objective_func: Function mapping config to performance (maximize) n_iterations: Total number of configurations to evaluate Returns: (best_config, best_performance) """ print(f"Starting warm BO with {len(self.warm_start_configs)} warm-start configs") iteration = 0 # Phase 1: Evaluate warm-start configurations for config in self.warm_start_configs: if iteration >= n_iterations: break performance = objective_func(config) self.X_observed.append(self._config_to_vector(config)) self.y_observed.append(performance) print(f"Warm-start {iteration + 1}: {performance:.4f}") iteration += 1 # Phase 2: Random fill if needed (to have enough for GP) while len(self.X_observed) < max(self.n_random_init, 3): if iteration >= n_iterations: break config = self._select_next_random() performance = objective_func(config) self.X_observed.append(self._config_to_vector(config)) self.y_observed.append(performance) print(f"Random init {iteration + 1}: {performance:.4f}") iteration += 1 # Fit initial GP X = np.array(self.X_observed) y = np.array(self.y_observed) self.gp.fit(X, y) # Phase 3: Acquisition-guided search while iteration < n_iterations: config = self._select_next_acquisition() performance = objective_func(config) self.X_observed.append(self._config_to_vector(config)) self.y_observed.append(performance) # Refit GP X = np.array(self.X_observed) y = np.array(self.y_observed) self.gp.fit(X, y) print(f"BO iteration {iteration + 1}: {performance:.4f}") iteration += 1 # Return best best_idx = np.argmax(self.y_observed) # Would need to store configs, not just vectors return None, self.y_observed[best_idx] def compare_warm_vs_cold(objective_func: Callable, config_space, warm_configs: List[Dict], n_iterations: int = 50, n_repeats: int = 5): """ Compare warm-started vs cold-started BO. Shows the benefit of warm starting empirically. """ warm_curves = [] cold_curves = [] for repeat in range(n_repeats): print(f"=== Repeat {repeat + 1}/{n_repeats} ===") # Warm-started print("Warm-started:") warm_bo = WarmStartBayesianOptimization(config_space, warm_configs) _, warm_best = warm_bo.optimize(objective_func, n_iterations) warm_curves.append(np.maximum.accumulate(warm_bo.y_observed)) # Cold-started print("Cold-started:") cold_bo = WarmStartBayesianOptimization(config_space, []) _, cold_best = cold_bo.optimize(objective_func, n_iterations) cold_curves.append(np.maximum.accumulate(cold_bo.y_observed)) # Aggregate results warm_mean = np.mean(warm_curves, axis=0) cold_mean = np.mean(cold_curves, axis=0) print("=== Results ===") print(f"Warm-start reaches 90% of final in ~{np.argmax(warm_mean > 0.9 * warm_mean[-1])} evals") print(f"Cold-start reaches 90% of final in ~{np.argmax(cold_mean > 0.9 * cold_mean[-1])} evals") return warm_curves, cold_curvesSMAC (used by Auto-sklearn) implements warm-starting by treating transferred configurations as the initial design. After evaluating these, SMAC's random forest surrogate is fitted on all observations (including warm-start evaluations) and acquisition-guided search begins. The warm-start configurations influence the surrogate through their actual evaluated performance.
A more sophisticated approach: modify the acquisition function itself to incorporate transfer learning. Rather than just warm-starting with configurations, we transfer knowledge at the model level.
Transfer Acquisition Function (TAF):
The idea is to combine:
Formulation:
α_transfer(x) = w · α_source(x) + (1-w) · α_target(x)
Where:
The RGPE Approach (Ranking-Weighted Gaussian Process Ensemble):
A principled way to combine multiple source models:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177
import numpy as npfrom sklearn.gaussian_process import GaussianProcessRegressorfrom sklearn.gaussian_process.kernels import Maternfrom scipy.stats import normfrom typing import List, Dict, Callable, Tuple class RankingWeightedGPEnsemble: """ RGPE: Ranking-Weighted Gaussian Process Ensemble. Combines surrogate models from source tasks with a target task model, weighting by how well source models predict target observations. Reference: Feurer et al., "Scalable Meta-Learning for Bayesian Optimization" """ def __init__(self, source_models: List[GaussianProcessRegressor] = None): """ Parameters: source_models: Pre-trained GP models from source tasks """ self.source_models = source_models or [] # Target model (trained as we collect data) kernel = Matern(nu=2.5) self.target_model = GaussianProcessRegressor(kernel=kernel, normalize_y=True) # Ensemble weights self.weights = np.ones(len(self.source_models) + 1) self.weights /= self.weights.sum() self.X_target = [] self.y_target = [] def update(self, x: np.ndarray, y: float): """ Add a new observation from the target task. Parameters: x: Configuration (as vector) y: Observed performance """ self.X_target.append(x) self.y_target.append(y) # Refit target model if enough data if len(self.X_target) >= 3: X = np.array(self.X_target) y_arr = np.array(self.y_target) self.target_model.fit(X, y_arr) # Update weights based on ranking agreement self._update_weights() def _update_weights(self): """ Update ensemble weights based on how well each model predicts the ranking of target observations. """ if len(self.y_target) < 3: return X = np.array(self.X_target) y_true = np.array(self.y_target) true_ranking = np.argsort(np.argsort(-y_true)) # Higher is better weights = [] # Score each source model for source_model in self.source_models: try: y_pred = source_model.predict(X) pred_ranking = np.argsort(np.argsort(-y_pred)) # Spearman-like correlation rank_diff = true_ranking - pred_ranking score = 1.0 / (1.0 + np.mean(rank_diff ** 2) + 1e-10) weights.append(score) except: weights.append(0.1) # Low weight for failed predictions # Score target model if len(self.X_target) >= 5: y_pred = self.target_model.predict(X) pred_ranking = np.argsort(np.argsort(-y_pred)) rank_diff = true_ranking - pred_ranking target_score = 1.0 / (1.0 + np.mean(rank_diff ** 2) + 1e-10) else: target_score = 0.5 # Moderate weight until we have enough data weights.append(target_score) # Normalize self.weights = np.array(weights) self.weights /= self.weights.sum() def predict(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """ Ensemble prediction for configurations. Returns: (mean, std) predictions """ if len(X.shape) == 1: X = X.reshape(1, -1) predictions = [] variances = [] # Source model predictions for i, source_model in enumerate(self.source_models): try: mu, std = source_model.predict(X, return_std=True) predictions.append(mu * self.weights[i]) variances.append((std ** 2) * (self.weights[i] ** 2)) except: predictions.append(np.zeros(len(X))) variances.append(np.ones(len(X)) * 0.1) # Target model prediction if len(self.X_target) >= 3: mu, std = self.target_model.predict(X, return_std=True) predictions.append(mu * self.weights[-1]) variances.append((std ** 2) * (self.weights[-1] ** 2)) else: predictions.append(np.zeros(len(X))) variances.append(np.ones(len(X)) * 0.5) # Combine ensemble_mean = np.sum(predictions, axis=0) ensemble_var = np.sum(variances, axis=0) ensemble_std = np.sqrt(ensemble_var) return ensemble_mean, ensemble_std def acquisition(self, X: np.ndarray, xi: float = 0.01) -> np.ndarray: """ Expected Improvement using ensemble predictions. """ mu, sigma = self.predict(X) if len(self.y_target) > 0: y_best = max(self.y_target) else: y_best = 0 with np.errstate(divide='ignore', invalid='ignore'): z = (mu - y_best - xi) / sigma ei = (mu - y_best - xi) * norm.cdf(z) + sigma * norm.pdf(z) ei[sigma < 1e-10] = 0.0 return ei def build_source_models(meta_database, config_vectorizer) -> List[GaussianProcessRegressor]: """ Build source GP models from meta-database. Each source model is trained on one past task's data. """ source_models = [] for dataset_id, experiments in meta_database.items(): if len(experiments) >= 10: # Need enough data X = np.array([config_vectorizer(exp['config']) for exp in experiments]) y = np.array([exp['performance'] for exp in experiments]) kernel = Matern(nu=2.5) gp = GaussianProcessRegressor(kernel=kernel, normalize_y=True) try: gp.fit(X, y) source_models.append(gp) except: continue return source_modelsAdvantages of Transfer Acquisition Functions:
Challenges:
When to Use:
Multi-fidelity optimization evaluates configurations at multiple resource levels (epochs, data subset, etc.). Warm starting in this context means:
Hyperband with Warm Starting:
Hyperband successively halves configurations, giving more resources to promising ones. Warm-started Hyperband:
BOHB (Bayesian Optimization Hyperband):
Combines Bayesian optimization with Hyperband. Warm-started BOHB:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214
import numpy as npfrom typing import List, Dict, Callable, Tuple, Any class WarmStartHyperband: """ Hyperband with warm-starting from transferred configurations. Transferred configurations are given priority in the initial brackets, potentially fast-tracking them to full evaluation. """ def __init__(self, config_space, min_budget: float = 1.0, max_budget: float = 81.0, eta: int = 3, warm_start_configs: List[Dict] = None): """ Parameters: config_space: Configuration space min_budget: Minimum resource (e.g., epochs) max_budget: Maximum resource eta: Reduction factor between successive halving rounds warm_start_configs: Configurations to prioritize """ self.config_space = config_space self.min_budget = min_budget self.max_budget = max_budget self.eta = eta self.warm_start_configs = warm_start_configs or [] # Compute number of rungs self.s_max = int(np.log(max_budget / min_budget) / np.log(eta)) def _get_hyperband_schedule(self) -> List[List[Tuple[int, float]]]: """ Compute the Hyperband bracket schedule. Returns: List of brackets, each bracket is list of (n_configs, budget) tuples """ brackets = [] for s in range(self.s_max, -1, -1): n = int(np.ceil((self.s_max + 1) / (s + 1) * (self.eta ** s))) budget = self.max_budget / (self.eta ** s) bracket = [] for i in range(s + 1): n_i = int(n / (self.eta ** i)) budget_i = budget * (self.eta ** i) bracket.append((max(n_i, 1), min(budget_i, self.max_budget))) brackets.append(bracket) return brackets def _sample_configs(self, n: int, include_warm_start: bool = True) -> List[Dict]: """ Sample n configurations, prioritizing warm-start configs. """ configs = [] # Include warm-start configs first if include_warm_start: for config in self.warm_start_configs[:n]: configs.append(config) # Fill remaining with random samples while len(configs) < n: configs.append(dict(self.config_space.sample_configuration())) return configs def run(self, evaluate_fn: Callable[[Dict, float], float]) -> Tuple[Dict, float]: """ Run warm-started Hyperband. Parameters: evaluate_fn: Function that takes (config, budget) -> performance Returns: (best_config, best_performance) """ brackets = self._get_hyperband_schedule() best_config = None best_perf = float('-inf') # Track which warm-start configs we've used warm_start_used = 0 for bracket_idx, bracket in enumerate(brackets): print(f"=== Bracket {bracket_idx + 1}/{len(brackets)} ===") # Initial configs for this bracket n_configs_initial = bracket[0][0] # For first bracket, heavily favor warm-start configs if bracket_idx == 0: configs = self._sample_configs( n_configs_initial, include_warm_start=True ) warm_start_used = min(len(self.warm_start_configs), n_configs_initial) print(f"Including {warm_start_used} warm-start configs") else: configs = self._sample_configs( n_configs_initial, include_warm_start=False ) # Run successive halving rounds for rung_idx, (n_configs, budget) in enumerate(bracket): print(f" Rung {rung_idx + 1}: {len(configs)} configs at budget {budget:.1f}") # Evaluate all configs at this budget results = [] for config in configs: perf = evaluate_fn(config, budget) results.append((config, perf)) if perf > best_perf: best_perf = perf best_config = config # Select top 1/eta for next rung if rung_idx < len(bracket) - 1: results.sort(key=lambda x: x[1], reverse=True) n_keep = max(1, len(results) // self.eta) configs = [r[0] for r in results[:n_keep]] return best_config, best_perf class WarmStartBOHB: """ BOHB (Bayesian Optimization Hyperband) with warm starting. Combines transfer learning with multi-fidelity evaluation for efficient hyperparameter optimization. """ def __init__(self, config_space, min_budget: float = 1.0, max_budget: float = 81.0, warm_start_configs: List[Dict] = None, warm_start_performances: List[float] = None): """ Parameters: config_space: Configuration space min_budget: Minimum resource max_budget: Maximum resource warm_start_configs: Transferred configurations warm_start_performances: Expected performances (from meta-learning) """ self.config_space = config_space self.min_budget = min_budget self.max_budget = max_budget self.warm_start_configs = warm_start_configs or [] self.warm_start_performances = warm_start_performances or [] # Build a simple model from warm-start data for acquisition self._initialize_model() def _initialize_model(self): """ Initialize the surrogate model with warm-start data. """ from sklearn.ensemble import RandomForestRegressor self.model = RandomForestRegressor(n_estimators=10, random_state=42) # If we have warm-start data, fit initial model if len(self.warm_start_configs) > 0 and len(self.warm_start_performances) > 0: X = np.array([self._config_to_vector(c) for c in self.warm_start_configs]) y = np.array(self.warm_start_performances) self.model.fit(X, y) self.model_fitted = True else: self.model_fitted = False def _config_to_vector(self, config: Dict) -> np.ndarray: """Convert config dict to numerical vector.""" return np.array(list(config.values()), dtype=float) def suggest_configuration(self) -> Dict: """ Suggest next configuration to evaluate. Uses transfer-informed acquisition if model is fitted, otherwise samples randomly (with warm-start priority). """ # First, exhaust warm-start configs if hasattr(self, '_warm_start_idx'): self._warm_start_idx += 1 else: self._warm_start_idx = 0 if self._warm_start_idx < len(self.warm_start_configs): return self.warm_start_configs[self._warm_start_idx] # Then use model-based suggestion if available if self.model_fitted: # Sample candidates and pick best by model candidates = [dict(self.config_space.sample_configuration()) for _ in range(100)] X_cand = np.array([self._config_to_vector(c) for c in candidates]) predictions = self.model.predict(X_cand) best_idx = np.argmax(predictions) return candidates[best_idx] # Fallback to random return dict(self.config_space.sample_configuration())Advanced multi-fidelity transfer: predict entire learning curves from partial training. If similar tasks show that LightGBM converges in 50 iterations but neural networks need 500, we can allocate resources accordingly. Learning curve databases enable this.
We've explored how warm starting transforms hyperparameter optimization from cold search to informed optimization. Let's consolidate the key takeaways:
What's Next:
The final page of this module explores Portfolio Methods—an alternative to selection where we run a diverse set of algorithms and combine their predictions. Portfolio methods complement selection by providing robustness when selection is uncertain.
You now understand how warm starting accelerates hyperparameter optimization by leveraging meta-learning. This is the practical link between accumulated knowledge and efficient AutoML—starting from informed positions rather than random initialization.