Loading content...
Successive Halving's fundamental limitation—the requirement to choose the number of initial configurations (n) before knowing problem characteristics—leaves practitioners in a bind. Choose (n) too small, and you may never sample the optimal configuration. Choose (n) too large, and initial budgets become too small to distinguish good from bad configurations.
Hyperband elegantly resolves this dilemma by running multiple Successive Halving brackets in parallel, each with a different (n). Some brackets prioritize exploration (many configurations, low initial budget), while others prioritize exploitation (few configurations, high initial budget). By hedging across the n-vs-B/n spectrum, Hyperband achieves robust performance without requiring prior knowledge of problem characteristics.
Introduced by Li et al. (2017), Hyperband has become the de facto standard for multi-fidelity hyperparameter optimization, offering strong theoretical guarantees and empirical performance across diverse machine learning problems.
By the end of this page, you will understand: • How Hyperband extends Successive Halving with multiple brackets • The mathematical derivation and bracket structure • Theoretical guarantees and optimality properties • Complete implementation with proper resource management • Practical tuning guidelines and common pitfalls
Hyperband operates by running multiple Successive Halving brackets within an outer loop. Each bracket represents a different balance between (n) (number of configurations) and (r) (initial budget per configuration).
Algorithm Parameters:
From these, we derive:
The Outer Loop (Brackets):
Hyperband iterates over (s \in {s_{\max}, s_{\max}-1, ..., 0}), where each (s) defines a Successive Halving bracket with:
| Bracket (s) | n (configs) | r (initial budget) | Rounds | Focus |
|---|---|---|---|---|
| 4 (aggressive) | 81 | 1 | 5 | Maximum exploration |
| 3 | 34 | 3 | 4 | Moderate exploration |
| 2 | 15 | 9 | 3 | Balanced |
| 1 | 8 | 27 | 2 | Moderate exploitation |
| 0 (conservative) | 5 | 81 | 1 | Maximum exploitation |
Understanding Bracket Design:
Each bracket implements a different strategy for the n-vs-B/n tradeoff:
Bracket (s = s_{\max}) (most aggressive): Samples the maximum number of configurations but with minimum initial budget. Effective when optimal configurations are rare and clearly distinguishable at low budgets.
Bracket (s = 0) (most conservative): Samples few configurations but evaluates each to completion. Effective when many configurations are viable and early performance is not predictive.
Intermediate brackets: Balance between the extremes, providing robustness.
The key insight is that one of these brackets will be near-optimal for any given problem, even if we don't know which one in advance.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287
import numpy as npfrom typing import List, Dict, Callable, Any, Tuplefrom dataclasses import dataclass, fieldimport loggingimport math logger = logging.getLogger(__name__) @dataclassclass HBConfiguration: """A hyperparameter configuration in Hyperband.""" config_id: str bracket: int hyperparameters: Dict[str, Any] performances: Dict[int, float] = field(default_factory=dict) # round -> perf status: str = "active" # active, completed, eliminated class SuccessiveHalvingBracket: """A single Successive Halving bracket within Hyperband.""" def __init__( self, bracket_id: int, n_configs: int, min_budget: float, max_budget: float, eta: int, config_sampler: Callable[[], Dict[str, Any]], evaluator: Callable[[Dict[str, Any], float], float], ): self.bracket_id = bracket_id self.n_configs = n_configs self.min_budget = min_budget self.max_budget = max_budget self.eta = eta self.config_sampler = config_sampler self.evaluator = evaluator # Calculate rounds self.num_rounds = 1 + int(np.log(max_budget / min_budget) / np.log(eta)) # Generate budget and config schedule self.budgets = [min_budget * (eta ** i) for i in range(self.num_rounds)] self.budgets = [min(b, max_budget) for b in self.budgets] self.config_counts = [n_configs] for i in range(1, self.num_rounds): self.config_counts.append(max(1, self.config_counts[-1] // eta)) logger.info(f"Bracket {bracket_id}: n={n_configs}, r={min_budget}") logger.info(f" Budgets: {self.budgets}") logger.info(f" Configs: {self.config_counts}") self.configurations: List[HBConfiguration] = [] self.current_round = 0 self.surviving_indices: List[int] = [] def sample_configurations(self): """Sample initial configurations.""" for i in range(self.n_configs): config = HBConfiguration( config_id=f"b{self.bracket_id}_c{i}", bracket=self.bracket_id, hyperparameters=self.config_sampler() ) self.configurations.append(config) self.surviving_indices = list(range(self.n_configs)) def run_round(self) -> bool: """ Run one round of the bracket. Returns False if bracket is complete. """ if self.current_round >= self.num_rounds: return False budget = self.budgets[self.current_round] n_keep = self.config_counts[ min(self.current_round + 1, self.num_rounds - 1) ] logger.info(f"Bracket {self.bracket_id} Round {self.current_round}: " f"{len(self.surviving_indices)} configs at budget {budget}") # Evaluate all surviving configurations results = [] for idx in self.surviving_indices: config = self.configurations[idx] perf = self.evaluator(config.hyperparameters, budget) config.performances[self.current_round] = perf results.append((idx, perf)) # Sort and keep top results.sort(key=lambda x: x[1]) # Lower is better promoted = [idx for idx, _ in results[:n_keep]] eliminated = [idx for idx, _ in results[n_keep:]] for idx in eliminated: self.configurations[idx].status = "eliminated" if self.current_round == self.num_rounds - 1: for idx in promoted: self.configurations[idx].status = "completed" self.surviving_indices = promoted self.current_round += 1 return self.current_round < self.num_rounds def get_best(self) -> Tuple[HBConfiguration, float]: """Get best configuration from this bracket.""" completed = [ c for c in self.configurations if c.status == "completed" and c.performances ] if not completed: return None, float('inf') best = min(completed, key=lambda c: list(c.performances.values())[-1]) return best, list(best.performances.values())[-1] class Hyperband: """ Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. Reference: Li, L., et al. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization." JMLR 2018. """ def __init__( self, max_budget: float = 81, eta: int = 3, config_sampler: Callable[[], Dict[str, Any]] = None, evaluator: Callable[[Dict[str, Any], float], float] = None, ): """ Args: max_budget: Maximum budget per configuration (R) eta: Reduction factor (typically 3) config_sampler: Function to sample hyperparameters evaluator: Function(config, budget) -> validation_loss """ self.max_budget = max_budget self.eta = eta self.config_sampler = config_sampler or self._default_sampler self.evaluator = evaluator or self._simulate_evaluator # Calculate s_max and brackets self.s_max = int(np.log(max_budget) / np.log(eta)) self.total_budget_per_iteration = (self.s_max + 1) * max_budget logger.info(f"Hyperband initialized: R={max_budget}, eta={eta}") logger.info(f"s_max={self.s_max}, brackets={self.s_max + 1}") self.brackets: List[SuccessiveHalvingBracket] = [] self.iteration = 0 self.all_results: List[Tuple[HBConfiguration, float]] = [] def _default_sampler(self) -> Dict[str, Any]: """Default hyperparameter sampler.""" return { "learning_rate": 10 ** np.random.uniform(-5, -1), "weight_decay": 10 ** np.random.uniform(-6, -2), "batch_size": 2 ** np.random.randint(4, 9), "dropout": np.random.uniform(0, 0.5), } def _simulate_evaluator(self, config: Dict, budget: float) -> float: """Simulated evaluator for testing.""" lr = config.get("learning_rate", 0.001) wd = config.get("weight_decay", 0.0001) # Optimal around lr=0.001, wd=0.0001 lr_quality = -((np.log10(lr) + 3) ** 2) / 4 wd_quality = -((np.log10(wd) + 4) ** 2) / 4 base_quality = 0.5 + 0.2 * (lr_quality + wd_quality) progress = 1 - np.exp(-budget / 30) noise = np.random.normal(0, 0.02 / np.sqrt(budget)) return base_quality * (1 - 0.8 * progress) + noise def _create_bracket(self, s: int) -> SuccessiveHalvingBracket: """Create a Successive Halving bracket for given s.""" # n: number of configurations n = int(np.ceil((self.s_max + 1) / (s + 1) * (self.eta ** s))) # r: initial budget per configuration r = self.max_budget * (self.eta ** (-s)) return SuccessiveHalvingBracket( bracket_id=s, n_configs=n, min_budget=r, max_budget=self.max_budget, eta=self.eta, config_sampler=self.config_sampler, evaluator=self.evaluator, ) def run_iteration(self) -> List[Tuple[HBConfiguration, float]]: """ Run one complete Hyperband iteration (all brackets). Returns list of (config, final_performance) for all completed configs. """ self.iteration += 1 logger.info(f"{'='*50}") logger.info(f"HYPERBAND ITERATION {self.iteration}") logger.info(f"{'='*50}") iteration_results = [] # Create and run all brackets from s_max down to 0 for s in range(self.s_max, -1, -1): logger.info(f"--- Bracket s={s} ---") bracket = self._create_bracket(s) bracket.sample_configurations() # Run all rounds of this bracket while bracket.run_round(): pass # Collect results best_config, best_perf = bracket.get_best() if best_config: iteration_results.append((best_config, best_perf)) logger.info(f"Bracket {s} best: {best_perf:.4f}") self.brackets.append(bracket) self.all_results.extend(iteration_results) return iteration_results def run(self, n_iterations: int = 1) -> HBConfiguration: """ Run multiple Hyperband iterations. Returns the best configuration found. """ for _ in range(n_iterations): self.run_iteration() if not self.all_results: raise ValueError("No results collected") best_config, best_perf = min(self.all_results, key=lambda x: x[1]) logger.info(f"{'='*50}") logger.info(f"OPTIMIZATION COMPLETE") logger.info(f"Best performance: {best_perf:.4f}") logger.info(f"Best hyperparameters: {best_config.hyperparameters}") logger.info(f"{'='*50}") return best_config def get_statistics(self) -> Dict: """Get statistics about the optimization.""" total_configs = sum(len(b.configurations) for b in self.brackets) total_evaluations = sum( sum(len(c.performances) for c in b.configurations) for b in self.brackets ) return { "iterations": self.iteration, "brackets": len(self.brackets), "total_configurations": total_configs, "total_evaluations": total_evaluations, "best_result": min((r[1] for r in self.all_results), default=None), } # Example usageif __name__ == "__main__": logging.basicConfig(level=logging.INFO) hb = Hyperband(max_budget=81, eta=3) best = hb.run(n_iterations=1) stats = hb.get_statistics() print(f"Statistics: {stats}")Understanding the precise structure of Hyperband's brackets is essential for effective use and debugging. Let's trace through the complete algorithm with a worked example.
Complete Example: R=81, η=3
With maximum budget (R = 81) and reduction factor (\eta = 3):
| Bracket s | Round | n_i (configs) | r_i (budget) | Round Total |
|---|---|---|---|---|
| s=4 | 0 | 81 | 1 | 81 |
| 1 | 27 | 3 | 81 | |
| 2 | 9 | 9 | 81 | |
| 3 | 3 | 27 | 81 | |
| 4 | 1 | 81 | 81 | |
| Bracket 4 Total | 405 | |||
| s=3 | 0 | 34 | 3 | 102 |
| 1 | 11 | 9 | 99 | |
| 2 | 3 | 27 | 81 | |
| 3 | 1 | 81 | 81 | |
| Bracket 3 Total | 363 | |||
| s=2 | 0 | 15 | 9 | 135 |
| 1 | 5 | 27 | 135 | |
| 2 | 1 | 81 | 81 | |
| Bracket 2 Total | 351 | |||
| s=1 | 0 | 8 | 27 | 216 |
| 1 | 2 | 81 | 162 | |
| Bracket 1 Total | 378 | |||
| s=0 | 0 | 5 | 81 | 405 |
| Bracket 0 Total | 405 |
Key Observations:
Total budget varies slightly by bracket due to rounding, but all are approximately (B = (s_{\max}+1) \times R).
Bracket s=4 samples the most configurations (81) but starts with minimum budget (1). It has 5 rounds of elimination.
Bracket s=0 samples the fewest (5) but starts at maximum budget (81). Every configuration runs to completion—no early stopping.
Configuration overlap: Across one Hyperband iteration, we sample (81 + 34 + 15 + 8 + 5 = 143) configurations total.
Budget distribution: More aggressive brackets allocate budget to early rounds; conservative brackets concentrate budget in later rounds.
Hyperband's power comes from hedging. If early stopping works well for this problem (high rank correlation), bracket s=4 will find good configurations most efficiently. If early stopping fails (low rank correlation), bracket s=0 will still evaluate configurations thoroughly. At least one bracket will perform well, regardless of problem characteristics.
Hyperband inherits and extends the theoretical guarantees of Successive Halving, with additional robustness from the multi-bracket design.
Simple Regret Bound:
Let (\lambda^*) be the optimal configuration and (\hat{\lambda}) be the configuration returned by Hyperband. The simple regret is:
$$\text{Simple Regret} = f(\hat{\lambda}) - f(\lambda^*)$$
Theorem (Li et al., 2017): For Hyperband with budget (B) per iteration:
$$\mathbb{E}[\text{Simple Regret}] = O\left( \sqrt{\frac{\log(B)^3}{B}} \right)$$
This bound is nearly optimal up to logarithmic factors. The (\log(B)^3) overhead comes from:
Comparison to Pure Strategies:
Consider two extreme strategies for budget allocation:
Pure exploration (like bracket s=s_max): Sample (n = B/R_{\min}) configurations, evaluate each at minimum budget (R_{\min}), keep the best.
Pure exploitation (like bracket s=0): Sample (n = B/R_{\max}) configurations, evaluate each to completion at (R_{\max}).
Pure Exploration Regret: $$\mathbb{E}[\text{Regret}] = O\left( \sqrt{\frac{R_{\max}}{B} \cdot \sigma^2 + \varepsilon_{\text{gap}}} \right)$$
where (\varepsilon_{\text{gap}}) is the error from using low-budget estimates.
Pure Exploitation Regret: $$\mathbb{E}[\text{Regret}] = O\left( \sqrt{\frac{R_{\max}}{B}} \right)$$
which suffers from insufficient exploration when optimal configurations are rare.
Hyperband achieves the best of both worlds—it matches the better of these two strategies up to logarithmic factors, without knowing which is better a priori.
Hyperband's theoretical guarantee is "adaptive"—it automatically achieves near-optimal performance whether the problem favors exploration or exploitation. This adaptivity is the key theoretical contribution beyond Successive Halving.
Computational Overhead:
The total number of configurations evaluated across all brackets in one Hyperband iteration is approximately:
$$N_{\text{total}} = \sum_{s=0}^{s_{\max}} n_s \approx \frac{s_{\max}+1}{\log \eta} \cdot \frac{B}{R}$$
For (R = 81, \eta = 3), this is about (5 \times 81 / 2 \approx 200) configurations per iteration.
The total budget used is approximately ((s_{\max}+1) \times B = O(B \log R)), so Hyperband uses a factor of (O(\log R)) more budget than a single Successive Halving run. This overhead buys robustness to the n-vs-B/n tradeoff.
The standard Hyperband algorithm runs brackets sequentially and uses synchronous Successive Halving within each bracket. This creates efficiency challenges in parallel environments. Asynchronous Hyperband addresses these limitations.
Parallelization Strategies:
Bracket-Level Parallelism: Run all brackets of one iteration in parallel. Limited by (s_{\max}+1) parallel threads.
Round-Level Parallelism: Within each bracket, evaluate all configurations at the current round in parallel. Limited by the number of surviving configurations.
Full Asynchrony (ASHA-style): Combine ASHA (Asynchronous Successive Halving) ideas with Hyperband's multi-bracket structure.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242
"""Asynchronous Hyperband implementation using ASHA within brackets."""import numpy as npfrom dataclasses import dataclass, fieldfrom typing import Dict, List, Optional, Callable, Anyfrom concurrent.futures import ThreadPoolExecutor, as_completedimport threadingimport logging logger = logging.getLogger(__name__) @dataclassclass AsyncHBConfig: """Configuration being evaluated in Async Hyperband.""" config_id: str bracket: int rung: int # Current rung within bracket hyperparameters: Dict[str, Any] rung_performances: Dict[int, float] = field(default_factory=dict) status: str = "pending" # pending, running, promoted, stopped, completed class AsyncHyperband: """ Asynchronous Hyperband with ASHA-style promotion within brackets. Key features: - No synchronization barriers within brackets - Workers always busy (no idle time) - Promotion as soon as configuration qualifies """ def __init__( self, max_budget: float = 81, eta: int = 3, n_workers: int = 4, config_sampler: Callable[[], Dict[str, Any]] = None, evaluator: Callable[[Dict[str, Any], float], float] = None, ): self.max_budget = max_budget self.eta = eta self.n_workers = n_workers self.config_sampler = config_sampler self.evaluator = evaluator self.s_max = int(np.log(max_budget) / np.log(eta)) # Compute rung budgets for each bracket self.bracket_rungs: Dict[int, List[float]] = {} for s in range(self.s_max + 1): min_r = max_budget * (eta ** (-s)) rungs = [] r = min_r while r <= max_budget: rungs.append(r) r *= eta self.bracket_rungs[s] = rungs # Shared state (thread-safe) self.lock = threading.Lock() self.configs: Dict[str, AsyncHBConfig] = {} self.rung_results: Dict[tuple, List] = {} # (bracket, rung) -> results self.next_config_id = 0 self.pending_queue: List[AsyncHBConfig] = [] logger.info(f"Async Hyperband: R={max_budget}, eta={eta}, " f"workers={n_workers}") def _sample_bracket(self) -> int: """Sample which bracket to add new configuration to.""" # Uniform or weighted by bracket size return np.random.randint(0, self.s_max + 1) def _sample_new_config(self) -> AsyncHBConfig: """Sample a new configuration.""" bracket = self._sample_bracket() config = AsyncHBConfig( config_id=f"c{self.next_config_id}", bracket=bracket, rung=0, hyperparameters=self.config_sampler() if self.config_sampler else { "lr": 10 ** np.random.uniform(-5, -1), "wd": 10 ** np.random.uniform(-6, -2), } ) self.next_config_id += 1 return config def _get_next_job(self) -> Optional[AsyncHBConfig]: """Get next configuration to evaluate (promotion or new sample).""" with self.lock: # Priority: promoted configs > new configs if self.pending_queue: return self.pending_queue.pop(0) else: return self._sample_new_config() def _check_promotion(self, config: AsyncHBConfig) -> bool: """ Check if configuration should be promoted. Uses ASHA-style: promote if in top 1/eta of completed at current rung. """ bracket = config.bracket rung = config.rung rungs = self.bracket_rungs[bracket] # Can't promote from last rung if rung >= len(rungs) - 1: return False key = (bracket, rung) if key not in self.rung_results: self.rung_results[key] = [] completed = self.rung_results[key] if len(completed) < 2: # Need at least 2 to compare return True # Promote by default # Sort by performance (lower is better) sorted_results = sorted(completed, key=lambda x: x[1]) n_to_promote = max(1, len(completed) // self.eta) # Check if this config is in top 1/eta current_perf = config.rung_performances.get(rung, float('inf')) threshold_perf = sorted_results[n_to_promote - 1][1] return current_perf <= threshold_perf def _worker_loop(self, worker_id: int, max_evals: int): """Worker loop: get job, evaluate, check promotion, repeat.""" evals = 0 while evals < max_evals: config = self._get_next_job() if config is None: break bracket = config.bracket rung = config.rung rungs = self.bracket_rungs[bracket] budget = rungs[rung] logger.debug(f"Worker {worker_id}: evaluating {config.config_id} " f"bracket={bracket} rung={rung} budget={budget}") # Evaluate (outside lock) if self.evaluator: perf = self.evaluator(config.hyperparameters, budget) else: # Simulate lr = config.hyperparameters.get("lr", 0.001) quality = -((np.log10(lr) + 3) ** 2) / 4 progress = 1 - np.exp(-budget / 30) perf = 0.5 + 0.2 * quality perf = perf * (1 - 0.8 * progress) perf += np.random.normal(0, 0.02 / np.sqrt(budget)) evals += 1 with self.lock: # Record result config.rung_performances[rung] = perf key = (bracket, rung) if key not in self.rung_results: self.rung_results[key] = [] self.rung_results[key].append((config.config_id, perf)) self.configs[config.config_id] = config # Check promotion if rung < len(rungs) - 1 and self._check_promotion(config): config.rung += 1 config.status = "promoted" self.pending_queue.append(config) logger.debug(f"{config.config_id} promoted to rung {config.rung}") elif rung == len(rungs) - 1: config.status = "completed" else: config.status = "stopped" def run(self, max_evaluations: int) -> AsyncHBConfig: """ Run Async Hyperband until max_evaluations. Returns best completed configuration. """ evals_per_worker = max_evaluations // self.n_workers logger.info(f"Running Async Hyperband with {max_evaluations} evaluations") with ThreadPoolExecutor(max_workers=self.n_workers) as executor: futures = [ executor.submit(self._worker_loop, i, evals_per_worker) for i in range(self.n_workers) ] for f in as_completed(futures): f.result() # Find best completed best_config = None best_perf = float('inf') for config in self.configs.values(): if config.status == "completed": final_rung = len(self.bracket_rungs[config.bracket]) - 1 perf = config.rung_performances.get(final_rung, float('inf')) if perf < best_perf: best_perf = perf best_config = config if best_config: logger.info(f"Best: {best_config.config_id} " f"perf={best_perf:.4f} " f"hp={best_config.hyperparameters}") return best_config def get_statistics(self) -> Dict: """Get optimization statistics.""" completed = [c for c in self.configs.values() if c.status == "completed"] stopped = [c for c in self.configs.values() if c.status == "stopped"] return { "total_configs": len(self.configs), "completed": len(completed), "stopped": len(stopped), "best_perf": min( (list(c.rung_performances.values())[-1] for c in completed if c.rung_performances), default=None ), } if __name__ == "__main__": logging.basicConfig(level=logging.INFO) ahb = AsyncHyperband(max_budget=27, eta=3, n_workers=4) best = ahb.run(max_evaluations=100) print(f"Statistics: {ahb.get_statistics()}")In practice, Async Hyperband achieves near-linear speedup with the number of workers because:
While Hyperband is more robust than Successive Halving (no need to choose (n)), it still requires setting (R) (maximum budget) and (\eta) (reduction factor). These choices significantly impact performance.
| η | s_max | Brackets | Max configs (s=s_max) | Min initial budget |
|---|---|---|---|---|
| 2 | 6 | 7 | 128 | 1.27 |
| 3 | 4 | 5 | 81 | 1 |
| 4 | 3 | 4 | 64 | 1.27 |
| 5 | 2 | 3 | 45 | 3.24 |
Practical Heuristics:
Start with η=3: This is the "default" recommendation from the original paper and works well in practice.
Calibrate R empirically: Run a few configurations to completion to understand training dynamics. Set R such that most configurations have converged (validation loss has plateaued).
Verify minimum budget is meaningful: Check that configurations at the minimum budget ((R/\eta^{s_{\max}})) can be meaningfully distinguished. If all configurations look nearly identical at minimum budget, increase R or decrease η.
Consider wall-clock time: More brackets mean more configurations to sample and more overhead. If wall-clock time is critical, fewer brackets (smaller η) may be preferable despite theoretical overhead.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175
"""Utilities for configuring Hyperband parameters."""import numpy as npfrom typing import Dict, Tupleimport math def compute_hyperband_structure( max_budget: float, eta: int = 3) -> Dict: """ Compute complete Hyperband structure for given parameters. Returns: Dictionary with: - s_max: Maximum bracket index - brackets: List of bracket specifications - total_configs: Total configs per iteration - total_budget: Total budget per iteration """ s_max = int(np.log(max_budget) / np.log(eta)) brackets = [] total_configs = 0 total_budget = 0 for s in range(s_max, -1, -1): # Number of initial configs in this bracket n = int(np.ceil((s_max + 1) / (s + 1) * (eta ** s))) # Initial budget per config r = max_budget * (eta ** (-s)) # Number of rounds in this bracket num_rounds = s + 1 # Budget used by this bracket bracket_budget = 0 n_i = n r_i = r rounds = [] for i in range(num_rounds): round_budget = n_i * r_i bracket_budget += round_budget rounds.append({ "round": i, "configs": n_i, "budget": r_i, "total": round_budget }) n_i = max(1, n_i // eta) r_i = min(r_i * eta, max_budget) brackets.append({ "s": s, "n": n, "r": r, "rounds": rounds, "total_budget": bracket_budget }) total_configs += n total_budget += bracket_budget return { "max_budget": max_budget, "eta": eta, "s_max": s_max, "brackets": brackets, "total_configs": total_configs, "total_budget": total_budget, "min_budget": max_budget / (eta ** s_max), } def recommend_parameters( target_budget: float, min_meaningful_budget: float, preferred_configs: int = None) -> Tuple[float, int]: """ Recommend R and η based on constraints. Args: target_budget: Total budget available min_meaningful_budget: Minimum budget to distinguish configs preferred_configs: Desired number of configs (optional) Returns: (R, η) recommendation """ recommendations = [] for eta in [2, 3, 4]: # R must satisfy: R / η^s_max >= min_meaningful_budget # s_max = floor(log_η(R)) # Try different R values for R in [27, 81, 125, 243]: if R < min_meaningful_budget: continue s_max = int(np.log(R) / np.log(eta)) min_budget = R / (eta ** s_max) if min_budget < min_meaningful_budget: continue structure = compute_hyperband_structure(R, eta) # Check if total budget is reasonable iterations = target_budget / structure["total_budget"] recommendations.append({ "R": R, "eta": eta, "min_budget": min_budget, "configs_per_iter": structure["total_configs"], "budget_per_iter": structure["total_budget"], "iterations": iterations, "score": structure["total_configs"] * iterations, }) # Rank by total configs explored recommendations.sort(key=lambda x: x["score"], reverse=True) if recommendations: best = recommendations[0] return best["R"], best["eta"], best return 81, 3, None # Default def print_hyperband_plan(max_budget: float, eta: int): """Print detailed Hyperband execution plan.""" structure = compute_hyperband_structure(max_budget, eta) print(f"=" * 60) print(f"HYPERBAND CONFIGURATION") print(f"=" * 60) print(f"Max budget (R): {max_budget}") print(f"Reduction factor (η): {eta}") print(f"Number of brackets (s_max + 1): {structure['s_max'] + 1}") print(f"Min budget: {structure['min_budget']:.2f}") print(f"") print(f"Total configs per iteration: {structure['total_configs']}") print(f"Total budget per iteration: {structure['total_budget']:.1f}") print(f"=" * 60) for bracket in structure["brackets"]: print(f"Bracket s={bracket['s']}: n={bracket['n']}, r={bracket['r']:.1f}") print(f"{'Round':<8} {'Configs':<10} {'Budget':<12} {'Total':<10}") print("-" * 40) for round_info in bracket["rounds"]: print(f"{round_info['round']:<8} {round_info['configs']:<10} " f"{round_info['budget']:<12.1f} {round_info['total']:<10.1f}") print(f"Bracket total: {bracket['total_budget']:.1f}") # Exampleif __name__ == "__main__": print_hyperband_plan(81, 3) print(" Recommendation for budget=1000, min_meaningful=3:") R, eta, details = recommend_parameters( target_budget=1000, min_meaningful_budget=3 ) print(f" R={R}, η={eta}") if details: print(f" Configs per iteration: {details['configs_per_iter']}") print(f" Expected iterations: {details['iterations']:.1f}")Despite Hyperband's robustness, practitioners commonly encounter issues that can be avoided with proper configuration and understanding.
Hyperband assumes budgets correspond to training progress. If using learning rate schedules (cosine annealing, warmup, step decay), ensure schedules are relative to total epochs, not absolute. A configuration at budget 10 should behave as if it's at epoch 10 of a 10-epoch schedule, not partially through a 100-epoch schedule.
Hyperband has spawned numerous extensions that address specific limitations or add capabilities:
| Variant | Key Improvement | Use Case |
|---|---|---|
| ASHA | Asynchronous execution | Parallel/distributed settings |
| BOHB | Bayesian configuration sampling | When prior knowledge helps |
| MOBSTER | Multi-fidelity BO + transfer | Related task sequence |
| SHA-UCB | Upper confidence bound promotion | Theoretical guarantees |
| Population Based Training | Hyperparameter schedules | Longer training runs |
| DEHB | Differential evolution sampling | Expensive black-box optimization |
BOHB (Bayesian Optimization and Hyperband):
BOHB, covered in the next page, replaces Hyperband's random sampling with Bayesian optimization. Instead of sampling configurations uniformly at random, BOHB uses kernel density estimation to:
This dramatically improves sample efficiency when the search space has exploitable structure.
Population Based Training (PBT):
PBT differs fundamentally from Hyperband: rather than eliminating poor configurations, it mutates them. Poor-performing configurations copy hyperparameters from good performers and add random perturbations. This enables:
PBT is particularly effective for very long training runs where hyperparameter needs may change over time.
• Standard Hyperband: Good default for new problems with unknown structure • ASHA: When parallelization is critical and you have many workers • BOHB: When search space is continuous and you suspect structure exists • PBT: For very long training runs (days/weeks) where schedules matter • DEHB: For expensive evaluations (hours per config) where sample efficiency is paramount
Hyperband extends Successive Halving to handle the fundamental n-vs-B/n tradeoff without requiring prior knowledge of problem characteristics.
Looking Ahead:
The next page covers BOHB (Bayesian Optimization and Hyperband)—a powerful combination that adds intelligent configuration sampling to Hyperband's multi-fidelity search. BOHB leverages completed evaluations to focus sampling on promising regions of the search space, dramatically improving sample efficiency for structured problems.
You now understand Hyperband's algorithm, theoretical guarantees, bracket structure, asynchronous variants, and practical configuration. You can implement Hyperband for your optimization problems and choose appropriate parameters. Next, we'll see how BOHB combines Hyperband with Bayesian optimization for even greater efficiency.