Multi Fidelity Optimization - Learning Module

Loading content...

0/278

Hyperband

Resolving the n-vs-B/n Dilemma

Successive Halving's fundamental limitation—the requirement to choose the number of initial configurations (n) before knowing problem characteristics—leaves practitioners in a bind. Choose (n) too small, and you may never sample the optimal configuration. Choose (n) too large, and initial budgets become too small to distinguish good from bad configurations.

Hyperband elegantly resolves this dilemma by running multiple Successive Halving brackets in parallel, each with a different (n). Some brackets prioritize exploration (many configurations, low initial budget), while others prioritize exploitation (few configurations, high initial budget). By hedging across the n-vs-B/n spectrum, Hyperband achieves robust performance without requiring prior knowledge of problem characteristics.

Introduced by Li et al. (2017), Hyperband has become the de facto standard for multi-fidelity hyperparameter optimization, offering strong theoretical guarantees and empirical performance across diverse machine learning problems.

What You Will Learn

By the end of this page, you will understand: • How Hyperband extends Successive Halving with multiple brackets • The mathematical derivation and bracket structure • Theoretical guarantees and optimality properties • Complete implementation with proper resource management • Practical tuning guidelines and common pitfalls

The Hyperband Algorithm

Hyperband operates by running multiple Successive Halving brackets within an outer loop. Each bracket represents a different balance between (n) (number of configurations) and (r) (initial budget per configuration).

Algorithm Parameters:

(R): Maximum budget per configuration (e.g., 81 epochs)
(\eta): Reduction factor for Successive Halving (typically 3)

From these, we derive:

(s_{\max} = \lfloor \log_\eta(R) \rfloor): Number of brackets (plus one)
(B): Total budget per Hyperband iteration ≈ ((s_{\max} + 1) \times R)

The Outer Loop (Brackets):

Hyperband iterates over (s \in {s_{\max}, s_{\max}-1, ..., 0}), where each (s) defines a Successive Halving bracket with:

Initial configurations: (n = \lceil \frac{B}{R} \cdot \frac{\eta^s}{s+1} \rceil)
Initial budget: (r = R \cdot \eta^{-s})

Example: Hyperband with R=81, η=3 (sₘₐₓ=4)
Bracket (s)	n (configs)	r (initial budget)	Rounds	Focus
4 (aggressive)	81	1	5	Maximum exploration
3	34	3	4	Moderate exploration
2	15	9	3	Balanced
1	8	27	2	Moderate exploitation
0 (conservative)	5	81	1	Maximum exploitation

Understanding Bracket Design:

Each bracket implements a different strategy for the n-vs-B/n tradeoff:

Bracket (s = s_{\max}) (most aggressive): Samples the maximum number of configurations but with minimum initial budget. Effective when optimal configurations are rare and clearly distinguishable at low budgets.
Bracket (s = 0) (most conservative): Samples few configurations but evaluates each to completion. Effective when many configurations are viable and early performance is not predictive.
Intermediate brackets: Balance between the extremes, providing robustness.

The key insight is that one of these brackets will be near-optimal for any given problem, even if we don't know which one in advance.

hyperband.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
import numpy as np
from typing import List, Dict, Callable, Any, Tuple
from dataclasses import dataclass, field
import logging
import math
 
logger = logging.getLogger(__name__)
 
@dataclass
class HBConfiguration:
    """A hyperparameter configuration in Hyperband."""
    config_id: str
    bracket: int
    hyperparameters: Dict[str, Any]
    performances: Dict[int, float] = field(default_factory=dict)  # round -> perf
    status: str = "active"  # active, completed, eliminated
 
class SuccessiveHalvingBracket:
    """A single Successive Halving bracket within Hyperband."""
    
    def __init__(
        self,
        bracket_id: int,
        n_configs: int,
        min_budget: float,
        max_budget: float,
        eta: int,
        config_sampler: Callable[[], Dict[str, Any]],
        evaluator: Callable[[Dict[str, Any], float], float],
    ):
        self.bracket_id = bracket_id
        self.n_configs = n_configs
        self.min_budget = min_budget
        self.max_budget = max_budget
        self.eta = eta
        self.config_sampler = config_sampler
        self.evaluator = evaluator
        
        # Calculate rounds
        self.num_rounds = 1 + int(np.log(max_budget / min_budget) / np.log(eta))
        
        # Generate budget and config schedule
        self.budgets = [min_budget * (eta ** i) for i in range(self.num_rounds)]
        self.budgets = [min(b, max_budget) for b in self.budgets]
        
        self.config_counts = [n_configs]
        for i in range(1, self.num_rounds):
            self.config_counts.append(max(1, self.config_counts[-1] // eta))
        
        logger.info(f"Bracket {bracket_id}: n={n_configs}, r={min_budget}")
        logger.info(f"  Budgets: {self.budgets}")
        logger.info(f"  Configs: {self.config_counts}")
        
        self.configurations: List[HBConfiguration] = []
        self.current_round = 0
        self.surviving_indices: List[int] = []
    
    def sample_configurations(self):
        """Sample initial configurations."""
        for i in range(self.n_configs):
            config = HBConfiguration(
                config_id=f"b{self.bracket_id}_c{i}",
                bracket=self.bracket_id,
                hyperparameters=self.config_sampler()
            )
            self.configurations.append(config)
        self.surviving_indices = list(range(self.n_configs))
    
    def run_round(self) -> bool:
        """
        Run one round of the bracket.
        Returns False if bracket is complete.
        """
        if self.current_round >= self.num_rounds:
            return False
        
        budget = self.budgets[self.current_round]
        n_keep = self.config_counts[
            min(self.current_round + 1, self.num_rounds - 1)
        ]
        
        logger.info(f"Bracket {self.bracket_id} Round {self.current_round}: "
                   f"{len(self.surviving_indices)} configs at budget {budget}")
        
        # Evaluate all surviving configurations
        results = []
        for idx in self.surviving_indices:
            config = self.configurations[idx]
            perf = self.evaluator(config.hyperparameters, budget)
            config.performances[self.current_round] = perf
            results.append((idx, perf))
        
        # Sort and keep top
        results.sort(key=lambda x: x[1])  # Lower is better
        
        promoted = [idx for idx, _ in results[:n_keep]]
        eliminated = [idx for idx, _ in results[n_keep:]]
        
        for idx in eliminated:
            self.configurations[idx].status = "eliminated"
        
        if self.current_round == self.num_rounds - 1:
            for idx in promoted:
                self.configurations[idx].status = "completed"
        
        self.surviving_indices = promoted
        self.current_round += 1
        
        return self.current_round < self.num_rounds
    
    def get_best(self) -> Tuple[HBConfiguration, float]:
        """Get best configuration from this bracket."""
        completed = [
            c for c in self.configurations 
            if c.status == "completed" and c.performances
        ]
        if not completed:
            return None, float('inf')
        
        best = min(completed, key=lambda c: list(c.performances.values())[-1])
        return best, list(best.performances.values())[-1]
 
class Hyperband:
    """
    Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.
    
    Reference: Li, L., et al. "Hyperband: A Novel Bandit-Based Approach to 
    Hyperparameter Optimization." JMLR 2018.
    """
    
    def __init__(
        self,
        max_budget: float = 81,
        eta: int = 3,
        config_sampler: Callable[[], Dict[str, Any]] = None,
        evaluator: Callable[[Dict[str, Any], float], float] = None,
    ):
        """
        Args:
            max_budget: Maximum budget per configuration (R)
            eta: Reduction factor (typically 3)
            config_sampler: Function to sample hyperparameters
            evaluator: Function(config, budget) -> validation_loss
        """
        self.max_budget = max_budget
        self.eta = eta
        self.config_sampler = config_sampler or self._default_sampler
        self.evaluator = evaluator or self._simulate_evaluator
        
        # Calculate s_max and brackets
        self.s_max = int(np.log(max_budget) / np.log(eta))
        self.total_budget_per_iteration = (self.s_max + 1) * max_budget
        
        logger.info(f"Hyperband initialized: R={max_budget}, eta={eta}")
        logger.info(f"s_max={self.s_max}, brackets={self.s_max + 1}")
        
        self.brackets: List[SuccessiveHalvingBracket] = []
        self.iteration = 0
        self.all_results: List[Tuple[HBConfiguration, float]] = []
    
    def _default_sampler(self) -> Dict[str, Any]:
        """Default hyperparameter sampler."""
        return {
            "learning_rate": 10 ** np.random.uniform(-5, -1),
            "weight_decay": 10 ** np.random.uniform(-6, -2),
            "batch_size": 2 ** np.random.randint(4, 9),
            "dropout": np.random.uniform(0, 0.5),
        }
    
    def _simulate_evaluator(self, config: Dict, budget: float) -> float:
        """Simulated evaluator for testing."""
        lr = config.get("learning_rate", 0.001)
        wd = config.get("weight_decay", 0.0001)
        
        # Optimal around lr=0.001, wd=0.0001
        lr_quality = -((np.log10(lr) + 3) ** 2) / 4
        wd_quality = -((np.log10(wd) + 4) ** 2) / 4
        
        base_quality = 0.5 + 0.2 * (lr_quality + wd_quality)
        progress = 1 - np.exp(-budget / 30)
        noise = np.random.normal(0, 0.02 / np.sqrt(budget))
        
        return base_quality * (1 - 0.8 * progress) + noise
    
    def _create_bracket(self, s: int) -> SuccessiveHalvingBracket:
        """Create a Successive Halving bracket for given s."""
        # n: number of configurations
        n = int(np.ceil((self.s_max + 1) / (s + 1) * (self.eta ** s)))
        
        # r: initial budget per configuration
        r = self.max_budget * (self.eta ** (-s))
        
        return SuccessiveHalvingBracket(
            bracket_id=s,
            n_configs=n,
            min_budget=r,
            max_budget=self.max_budget,
            eta=self.eta,
            config_sampler=self.config_sampler,
            evaluator=self.evaluator,
        )
    
    def run_iteration(self) -> List[Tuple[HBConfiguration, float]]:
        """
        Run one complete Hyperband iteration (all brackets).
        Returns list of (config, final_performance) for all completed configs.
        """
        self.iteration += 1
        logger.info(f"
{'='*50}")
        logger.info(f"HYPERBAND ITERATION {self.iteration}")
        logger.info(f"{'='*50}")
        
        iteration_results = []
        
        # Create and run all brackets from s_max down to 0
        for s in range(self.s_max, -1, -1):
            logger.info(f"
--- Bracket s={s} ---")
            
            bracket = self._create_bracket(s)
            bracket.sample_configurations()
            
            # Run all rounds of this bracket
            while bracket.run_round():
                pass
            
            # Collect results
            best_config, best_perf = bracket.get_best()
            if best_config:
                iteration_results.append((best_config, best_perf))
                logger.info(f"Bracket {s} best: {best_perf:.4f}")
            
            self.brackets.append(bracket)
        
        self.all_results.extend(iteration_results)
        return iteration_results
    
    def run(self, n_iterations: int = 1) -> HBConfiguration:
        """
        Run multiple Hyperband iterations.
        Returns the best configuration found.
        """
        for _ in range(n_iterations):
            self.run_iteration()
        
        if not self.all_results:
            raise ValueError("No results collected")
        
        best_config, best_perf = min(self.all_results, key=lambda x: x[1])
        
        logger.info(f"
{'='*50}")
        logger.info(f"OPTIMIZATION COMPLETE")
        logger.info(f"Best performance: {best_perf:.4f}")
        logger.info(f"Best hyperparameters: {best_config.hyperparameters}")
        logger.info(f"{'='*50}")
        
        return best_config
    
    def get_statistics(self) -> Dict:
        """Get statistics about the optimization."""
        total_configs = sum(len(b.configurations) for b in self.brackets)
        total_evaluations = sum(
            sum(len(c.performances) for c in b.configurations)
            for b in self.brackets
        )
        
        return {
            "iterations": self.iteration,
            "brackets": len(self.brackets),
            "total_configurations": total_configs,
            "total_evaluations": total_evaluations,
            "best_result": min((r[1] for r in self.all_results), default=None),
        }
 
 
# Example usage
if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    
    hb = Hyperband(max_budget=81, eta=3)
    best = hb.run(n_iterations=1)
    
    stats = hb.get_statistics()
    print(f"
Statistics: {stats}")

Understanding Bracket Structure

Understanding the precise structure of Hyperband's brackets is essential for effective use and debugging. Let's trace through the complete algorithm with a worked example.

Complete Example: R=81, η=3

With maximum budget (R = 81) and reduction factor (\eta = 3):

(s_{\max} = \lfloor \log_3(81) \rfloor = 4)
Total budget per iteration: (B \approx (4+1) \times 81 = 405)

Complete Bracket Structure for One Hyperband Iteration
Bracket s	Round	n_i (configs)	r_i (budget)	Round Total
s=4	0	81	1	81
	1	27	3	81
	2	9	9	81
	3	3	27	81
	4	1	81	81
Bracket 4 Total				405
s=3	0	34	3	102
	1	11	9	99
	2	3	27	81
	3	1	81	81
Bracket 3 Total				363
s=2	0	15	9	135
	1	5	27	135
	2	1	81	81
Bracket 2 Total				351
s=1	0	8	27	216
	1	2	81	162
Bracket 1 Total				378
s=0	0	5	81	405
Bracket 0 Total				405

Key Observations:

Total budget varies slightly by bracket due to rounding, but all are approximately (B = (s_{\max}+1) \times R).
Bracket s=4 samples the most configurations (81) but starts with minimum budget (1). It has 5 rounds of elimination.
Bracket s=0 samples the fewest (5) but starts at maximum budget (81). Every configuration runs to completion—no early stopping.
Configuration overlap: Across one Hyperband iteration, we sample (81 + 34 + 15 + 8 + 5 = 143) configurations total.
Budget distribution: More aggressive brackets allocate budget to early rounds; conservative brackets concentrate budget in later rounds.

The Hedging Strategy

Hyperband's power comes from hedging. If early stopping works well for this problem (high rank correlation), bracket s=4 will find good configurations most efficiently. If early stopping fails (low rank correlation), bracket s=0 will still evaluate configurations thoroughly. At least one bracket will perform well, regardless of problem characteristics.

Theoretical Analysis and Guarantees

Hyperband inherits and extends the theoretical guarantees of Successive Halving, with additional robustness from the multi-bracket design.

Simple Regret Bound:

Let (\lambda^*) be the optimal configuration and (\hat{\lambda}) be the configuration returned by Hyperband. The simple regret is:

$$\text{Simple Regret} = f(\hat{\lambda}) - f(\lambda^*)$$

Theorem (Li et al., 2017): For Hyperband with budget (B) per iteration:

$$\mathbb{E}[\text{Simple Regret}] = O\left( \sqrt{\frac{\log(B)^3}{B}} \right)$$

This bound is nearly optimal up to logarithmic factors. The (\log(B)^3) overhead comes from:

(\log(B)) brackets
(\log(B)) rounds within each bracket
(\log(B)) from union bounds in the analysis

Comparison to Pure Strategies:

Consider two extreme strategies for budget allocation:

Pure exploration (like bracket s=s_max): Sample (n = B/R_{\min}) configurations, evaluate each at minimum budget (R_{\min}), keep the best.
Pure exploitation (like bracket s=0): Sample (n = B/R_{\max}) configurations, evaluate each to completion at (R_{\max}).

Pure Exploration Regret: $$\mathbb{E}[\text{Regret}] = O\left( \sqrt{\frac{R_{\max}}{B} \cdot \sigma^2 + \varepsilon_{\text{gap}}} \right)$$

where (\varepsilon_{\text{gap}}) is the error from using low-budget estimates.

Pure Exploitation Regret: $$\mathbb{E}[\text{Regret}] = O\left( \sqrt{\frac{R_{\max}}{B}} \right)$$

which suffers from insufficient exploration when optimal configurations are rare.

Hyperband achieves the best of both worlds—it matches the better of these two strategies up to logarithmic factors, without knowing which is better a priori.

Adaptive Optimality

Hyperband's theoretical guarantee is "adaptive"—it automatically achieves near-optimal performance whether the problem favors exploration or exploitation. This adaptivity is the key theoretical contribution beyond Successive Halving.

Computational Overhead:

The total number of configurations evaluated across all brackets in one Hyperband iteration is approximately:

$$N_{\text{total}} = \sum_{s=0}^{s_{\max}} n_s \approx \frac{s_{\max}+1}{\log \eta} \cdot \frac{B}{R}$$

For (R = 81, \eta = 3), this is about (5 \times 81 / 2 \approx 200) configurations per iteration.

The total budget used is approximately ((s_{\max}+1) \times B = O(B \log R)), so Hyperband uses a factor of (O(\log R)) more budget than a single Successive Halving run. This overhead buys robustness to the n-vs-B/n tradeoff.

Asynchronous Hyperband

The standard Hyperband algorithm runs brackets sequentially and uses synchronous Successive Halving within each bracket. This creates efficiency challenges in parallel environments. Asynchronous Hyperband addresses these limitations.

Parallelization Strategies:

Bracket-Level Parallelism: Run all brackets of one iteration in parallel. Limited by (s_{\max}+1) parallel threads.
Round-Level Parallelism: Within each bracket, evaluate all configurations at the current round in parallel. Limited by the number of surviving configurations.
Full Asynchrony (ASHA-style): Combine ASHA (Asynchronous Successive Halving) ideas with Hyperband's multi-bracket structure.

async_hyperband.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
"""
Asynchronous Hyperband implementation using ASHA within brackets.
"""
import numpy as np
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Callable, Any
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
import logging
 
logger = logging.getLogger(__name__)
 
@dataclass
class AsyncHBConfig:
    """Configuration being evaluated in Async Hyperband."""
    config_id: str
    bracket: int
    rung: int  # Current rung within bracket
    hyperparameters: Dict[str, Any]
    rung_performances: Dict[int, float] = field(default_factory=dict)
    status: str = "pending"  # pending, running, promoted, stopped, completed
 
class AsyncHyperband:
    """
    Asynchronous Hyperband with ASHA-style promotion within brackets.
    
    Key features:
    - No synchronization barriers within brackets
    - Workers always busy (no idle time)
    - Promotion as soon as configuration qualifies
    """
    
    def __init__(
        self,
        max_budget: float = 81,
        eta: int = 3,
        n_workers: int = 4,
        config_sampler: Callable[[], Dict[str, Any]] = None,
        evaluator: Callable[[Dict[str, Any], float], float] = None,
    ):
        self.max_budget = max_budget
        self.eta = eta
        self.n_workers = n_workers
        self.config_sampler = config_sampler
        self.evaluator = evaluator
        
        self.s_max = int(np.log(max_budget) / np.log(eta))
        
        # Compute rung budgets for each bracket
        self.bracket_rungs: Dict[int, List[float]] = {}
        for s in range(self.s_max + 1):
            min_r = max_budget * (eta ** (-s))
            rungs = []
            r = min_r
            while r <= max_budget:
                rungs.append(r)
                r *= eta
            self.bracket_rungs[s] = rungs
        
        # Shared state (thread-safe)
        self.lock = threading.Lock()
        self.configs: Dict[str, AsyncHBConfig] = {}
        self.rung_results: Dict[tuple, List] = {}  # (bracket, rung) -> results
        self.next_config_id = 0
        self.pending_queue: List[AsyncHBConfig] = []
        
        logger.info(f"Async Hyperband: R={max_budget}, eta={eta}, "
                   f"workers={n_workers}")
    
    def _sample_bracket(self) -> int:
        """Sample which bracket to add new configuration to."""
        # Uniform or weighted by bracket size
        return np.random.randint(0, self.s_max + 1)
    
    def _sample_new_config(self) -> AsyncHBConfig:
        """Sample a new configuration."""
        bracket = self._sample_bracket()
        config = AsyncHBConfig(
            config_id=f"c{self.next_config_id}",
            bracket=bracket,
            rung=0,
            hyperparameters=self.config_sampler() if self.config_sampler else {
                "lr": 10 ** np.random.uniform(-5, -1),
                "wd": 10 ** np.random.uniform(-6, -2),
            }
        )
        self.next_config_id += 1
        return config
    
    def _get_next_job(self) -> Optional[AsyncHBConfig]:
        """Get next configuration to evaluate (promotion or new sample)."""
        with self.lock:
            # Priority: promoted configs > new configs
            if self.pending_queue:
                return self.pending_queue.pop(0)
            else:
                return self._sample_new_config()
    
    def _check_promotion(self, config: AsyncHBConfig) -> bool:
        """
        Check if configuration should be promoted.
        Uses ASHA-style: promote if in top 1/eta of completed at current rung.
        """
        bracket = config.bracket
        rung = config.rung
        rungs = self.bracket_rungs[bracket]
        
        # Can't promote from last rung
        if rung >= len(rungs) - 1:
            return False
        
        key = (bracket, rung)
        if key not in self.rung_results:
            self.rung_results[key] = []
        
        completed = self.rung_results[key]
        if len(completed) < 2:  # Need at least 2 to compare
            return True  # Promote by default
        
        # Sort by performance (lower is better)
        sorted_results = sorted(completed, key=lambda x: x[1])
        n_to_promote = max(1, len(completed) // self.eta)
        
        # Check if this config is in top 1/eta
        current_perf = config.rung_performances.get(rung, float('inf'))
        threshold_perf = sorted_results[n_to_promote - 1][1]
        
        return current_perf <= threshold_perf
    
    def _worker_loop(self, worker_id: int, max_evals: int):
        """Worker loop: get job, evaluate, check promotion, repeat."""
        evals = 0
        
        while evals < max_evals:
            config = self._get_next_job()
            if config is None:
                break
            
            bracket = config.bracket
            rung = config.rung
            rungs = self.bracket_rungs[bracket]
            budget = rungs[rung]
            
            logger.debug(f"Worker {worker_id}: evaluating {config.config_id} "
                        f"bracket={bracket} rung={rung} budget={budget}")
            
            # Evaluate (outside lock)
            if self.evaluator:
                perf = self.evaluator(config.hyperparameters, budget)
            else:
                # Simulate
                lr = config.hyperparameters.get("lr", 0.001)
                quality = -((np.log10(lr) + 3) ** 2) / 4
                progress = 1 - np.exp(-budget / 30)
                perf = 0.5 + 0.2 * quality
                perf = perf * (1 - 0.8 * progress)
                perf += np.random.normal(0, 0.02 / np.sqrt(budget))
            
            evals += 1
            
            with self.lock:
                # Record result
                config.rung_performances[rung] = perf
                key = (bracket, rung)
                if key not in self.rung_results:
                    self.rung_results[key] = []
                self.rung_results[key].append((config.config_id, perf))
                
                self.configs[config.config_id] = config
                
                # Check promotion
                if rung < len(rungs) - 1 and self._check_promotion(config):
                    config.rung += 1
                    config.status = "promoted"
                    self.pending_queue.append(config)
                    logger.debug(f"{config.config_id} promoted to rung {config.rung}")
                elif rung == len(rungs) - 1:
                    config.status = "completed"
                else:
                    config.status = "stopped"
    
    def run(self, max_evaluations: int) -> AsyncHBConfig:
        """
        Run Async Hyperband until max_evaluations.
        Returns best completed configuration.
        """
        evals_per_worker = max_evaluations // self.n_workers
        
        logger.info(f"Running Async Hyperband with {max_evaluations} evaluations")
        
        with ThreadPoolExecutor(max_workers=self.n_workers) as executor:
            futures = [
                executor.submit(self._worker_loop, i, evals_per_worker)
                for i in range(self.n_workers)
            ]
            for f in as_completed(futures):
                f.result()
        
        # Find best completed
        best_config = None
        best_perf = float('inf')
        
        for config in self.configs.values():
            if config.status == "completed":
                final_rung = len(self.bracket_rungs[config.bracket]) - 1
                perf = config.rung_performances.get(final_rung, float('inf'))
                if perf < best_perf:
                    best_perf = perf
                    best_config = config
        
        if best_config:
            logger.info(f"Best: {best_config.config_id} "
                       f"perf={best_perf:.4f} "
                       f"hp={best_config.hyperparameters}")
        
        return best_config
    
    def get_statistics(self) -> Dict:
        """Get optimization statistics."""
        completed = [c for c in self.configs.values() if c.status == "completed"]
        stopped = [c for c in self.configs.values() if c.status == "stopped"]
        
        return {
            "total_configs": len(self.configs),
            "completed": len(completed),
            "stopped": len(stopped),
            "best_perf": min(
                (list(c.rung_performances.values())[-1] 
                 for c in completed if c.rung_performances),
                default=None
            ),
        }
 
 
if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    
    ahb = AsyncHyperband(max_budget=27, eta=3, n_workers=4)
    best = ahb.run(max_evaluations=100)
    
    print(f"
Statistics: {ahb.get_statistics()}")

Practical Parallelism

In practice, Async Hyperband achieves near-linear speedup with the number of workers because:

No synchronization barriers—workers never wait for others
Promotion decisions are local—each worker can decide independently
New configurations can always be sampled if no promotions are pending

Choosing Hyperband Parameters

While Hyperband is more robust than Successive Halving (no need to choose (n)), it still requires setting (R) (maximum budget) and (\eta) (reduction factor). These choices significantly impact performance.

Parameter Selection Guidelines

•Maximum budget R: Set to the largest budget you're willing to allocate to a single configuration. This is problem-dependent: typical choices are 50-200 epochs for deep learning, full dataset for data subset fidelity.
•Reduction factor η: Controls the aggressiveness of early stopping. η=3 is the standard choice, balancing efficiency and safety. η=2 is more conservative (more configs survive each round). η=4 or higher is aggressive (faster but riskier).
•Minimum budget: Implicitly determined by R and s_max. Ensure min budget is sufficient to distinguish clearly bad configurations. At least 1-2 epochs is typical.

Effect of η on Hyperband Structure (R=81)
η	s_max	Brackets	Max configs (s=s_max)	Min initial budget
2	6	7	128	1.27
3	4	5	81	1
4	3	4	64	1.27
5	2	3	45	3.24

Practical Heuristics:

Start with η=3: This is the "default" recommendation from the original paper and works well in practice.
Calibrate R empirically: Run a few configurations to completion to understand training dynamics. Set R such that most configurations have converged (validation loss has plateaued).
Verify minimum budget is meaningful: Check that configurations at the minimum budget ((R/\eta^{s_{\max}})) can be meaningfully distinguished. If all configurations look nearly identical at minimum budget, increase R or decrease η.
Consider wall-clock time: More brackets mean more configurations to sample and more overhead. If wall-clock time is critical, fewer brackets (smaller η) may be preferable despite theoretical overhead.

hyperband_config.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
"""
Utilities for configuring Hyperband parameters.
"""
import numpy as np
from typing import Dict, Tuple
import math
 
def compute_hyperband_structure(
    max_budget: float,
    eta: int = 3
) -> Dict:
    """
    Compute complete Hyperband structure for given parameters.
    
    Returns:
        Dictionary with:
        - s_max: Maximum bracket index
        - brackets: List of bracket specifications
        - total_configs: Total configs per iteration
        - total_budget: Total budget per iteration
    """
    s_max = int(np.log(max_budget) / np.log(eta))
    
    brackets = []
    total_configs = 0
    total_budget = 0
    
    for s in range(s_max, -1, -1):
        # Number of initial configs in this bracket
        n = int(np.ceil((s_max + 1) / (s + 1) * (eta ** s)))
        
        # Initial budget per config
        r = max_budget * (eta ** (-s))
        
        # Number of rounds in this bracket
        num_rounds = s + 1
        
        # Budget used by this bracket
        bracket_budget = 0
        n_i = n
        r_i = r
        rounds = []
        
        for i in range(num_rounds):
            round_budget = n_i * r_i
            bracket_budget += round_budget
            rounds.append({
                "round": i,
                "configs": n_i,
                "budget": r_i,
                "total": round_budget
            })
            n_i = max(1, n_i // eta)
            r_i = min(r_i * eta, max_budget)
        
        brackets.append({
            "s": s,
            "n": n,
            "r": r,
            "rounds": rounds,
            "total_budget": bracket_budget
        })
        
        total_configs += n
        total_budget += bracket_budget
    
    return {
        "max_budget": max_budget,
        "eta": eta,
        "s_max": s_max,
        "brackets": brackets,
        "total_configs": total_configs,
        "total_budget": total_budget,
        "min_budget": max_budget / (eta ** s_max),
    }
 
def recommend_parameters(
    target_budget: float,
    min_meaningful_budget: float,
    preferred_configs: int = None
) -> Tuple[float, int]:
    """
    Recommend R and η based on constraints.
    
    Args:
        target_budget: Total budget available
        min_meaningful_budget: Minimum budget to distinguish configs
        preferred_configs: Desired number of configs (optional)
        
    Returns:
        (R, η) recommendation
    """
    recommendations = []
    
    for eta in [2, 3, 4]:
        # R must satisfy: R / η^s_max >= min_meaningful_budget
        # s_max = floor(log_η(R))
        # Try different R values
        
        for R in [27, 81, 125, 243]:
            if R < min_meaningful_budget:
                continue
            
            s_max = int(np.log(R) / np.log(eta))
            min_budget = R / (eta ** s_max)
            
            if min_budget < min_meaningful_budget:
                continue
            
            structure = compute_hyperband_structure(R, eta)
            
            # Check if total budget is reasonable
            iterations = target_budget / structure["total_budget"]
            
            recommendations.append({
                "R": R,
                "eta": eta,
                "min_budget": min_budget,
                "configs_per_iter": structure["total_configs"],
                "budget_per_iter": structure["total_budget"],
                "iterations": iterations,
                "score": structure["total_configs"] * iterations,
            })
    
    # Rank by total configs explored
    recommendations.sort(key=lambda x: x["score"], reverse=True)
    
    if recommendations:
        best = recommendations[0]
        return best["R"], best["eta"], best
    
    return 81, 3, None  # Default
 
def print_hyperband_plan(max_budget: float, eta: int):
    """Print detailed Hyperband execution plan."""
    structure = compute_hyperband_structure(max_budget, eta)
    
    print(f"=" * 60)
    print(f"HYPERBAND CONFIGURATION")
    print(f"=" * 60)
    print(f"Max budget (R): {max_budget}")
    print(f"Reduction factor (η): {eta}")
    print(f"Number of brackets (s_max + 1): {structure['s_max'] + 1}")
    print(f"Min budget: {structure['min_budget']:.2f}")
    print(f"")
    print(f"Total configs per iteration: {structure['total_configs']}")
    print(f"Total budget per iteration: {structure['total_budget']:.1f}")
    print(f"=" * 60)
    
    for bracket in structure["brackets"]:
        print(f"
Bracket s={bracket['s']}: n={bracket['n']}, r={bracket['r']:.1f}")
        print(f"{'Round':<8} {'Configs':<10} {'Budget':<12} {'Total':<10}")
        print("-" * 40)
        for round_info in bracket["rounds"]:
            print(f"{round_info['round']:<8} {round_info['configs']:<10} "
                  f"{round_info['budget']:<12.1f} {round_info['total']:<10.1f}")
        print(f"Bracket total: {bracket['total_budget']:.1f}")
 
 
# Example
if __name__ == "__main__":
    print_hyperband_plan(81, 3)
    
    print("
 
Recommendation for budget=1000, min_meaningful=3:")
    R, eta, details = recommend_parameters(
        target_budget=1000,
        min_meaningful_budget=3
    )
    print(f"  R={R}, η={eta}")
    if details:
        print(f"  Configs per iteration: {details['configs_per_iter']}")
        print(f"  Expected iterations: {details['iterations']:.1f}")

Common Pitfalls and Best Practices

Despite Hyperband's robustness, practitioners commonly encounter issues that can be avoided with proper configuration and understanding.

Common Pitfalls

•Too small R: Maximum budget doesn't allow configurations to reach their potential. Configurations that would be best with more training are never discovered.
•Non-epoch budgets: Using steps instead of epochs as budget unit creates unequal data exposure for different batch sizes, confounding comparisons.
•Ignoring random seeds: Not fixing training random seeds introduces variance that can corrupt ranking comparisons.
•Metric instability: Using validation loss without smoothing leads to noisy rankings; consider exponential moving average.
•Checkpoint overhead: Frequent checkpointing for resumption can dominate wall-clock time for short budgets.
•Resource fragmentation: In distributed settings, eliminating configurations can leave GPUs idle if new configs aren't ready.

Best Practices

•Validate early stopping assumption: Run pilot study to check rank correlation between low and high budget performance.
•Use epoch-based budgets: Ensures consistent data exposure regardless of batch size.
•Fix random seeds within comparison: Same seed for all configurations being compared in a bracket.
•Smooth validation metrics: Apply EMA to validation loss to reduce noise in ranking.
•Checkpoint strategically: Only checkpoint at rung boundaries, not every epoch.
•Warm-start aggressive brackets: Pre-sample configurations for aggressive brackets to avoid cold-start delays.
•Log everything: Store complete learning curves for post-hoc analysis and debugging.

Learning Rate Schedule Interactions

Hyperband assumes budgets correspond to training progress. If using learning rate schedules (cosine annealing, warmup, step decay), ensure schedules are relative to total epochs, not absolute. A configuration at budget 10 should behave as if it's at epoch 10 of a 10-epoch schedule, not partially through a 100-epoch schedule.

Hyperband Variants and Extensions

Hyperband has spawned numerous extensions that address specific limitations or add capabilities:

Hyperband Variants and Extensions
Variant	Key Improvement	Use Case
ASHA	Asynchronous execution	Parallel/distributed settings
BOHB	Bayesian configuration sampling	When prior knowledge helps
MOBSTER	Multi-fidelity BO + transfer	Related task sequence
SHA-UCB	Upper confidence bound promotion	Theoretical guarantees
Population Based Training	Hyperparameter schedules	Longer training runs
DEHB	Differential evolution sampling	Expensive black-box optimization

BOHB (Bayesian Optimization and Hyperband):

BOHB, covered in the next page, replaces Hyperband's random sampling with Bayesian optimization. Instead of sampling configurations uniformly at random, BOHB uses kernel density estimation to:

Build a probabilistic model of good configurations based on completed evaluations
Sample new configurations that are likely to perform well

This dramatically improves sample efficiency when the search space has exploitable structure.

Population Based Training (PBT):

PBT differs fundamentally from Hyperband: rather than eliminating poor configurations, it mutates them. Poor-performing configurations copy hyperparameters from good performers and add random perturbations. This enables:

Learning hyperparameter schedules rather than fixed values
Warm-starting from checkpoint rather than retraining
Continuous adaptation throughout training

PBT is particularly effective for very long training runs where hyperparameter needs may change over time.

When to Use Each Variant

• Standard Hyperband: Good default for new problems with unknown structure • ASHA: When parallelization is critical and you have many workers • BOHB: When search space is continuous and you suspect structure exists • PBT: For very long training runs (days/weeks) where schedules matter • DEHB: For expensive evaluations (hours per config) where sample efficiency is paramount

Summary: Hyperband

Hyperband extends Successive Halving to handle the fundamental n-vs-B/n tradeoff without requiring prior knowledge of problem characteristics.

Key Takeaways

•Multi-bracket design: Running brackets with different n values hedges against unknown problem difficulty
•Adaptive optimality: Matches the better of exploration-focused and exploitation-focused strategies up to log factors
•Bracket spectrum: From s=s_max (many configs, low initial budget) to s=0 (few configs, full budget)
•Parameter selection: η=3 is robust default; R should equal maximum meaningful training budget
•Asynchronous variants: ASHA-style execution enables efficient parallel deployment
•Extensions: BOHB adds Bayesian sampling; PBT enables hyperparameter schedules

Looking Ahead:

The next page covers BOHB (Bayesian Optimization and Hyperband)—a powerful combination that adds intelligent configuration sampling to Hyperband's multi-fidelity search. BOHB leverages completed evaluations to focus sampling on promising regions of the search space, dramatically improving sample efficiency for structured problems.

Page Complete

You now understand Hyperband's algorithm, theoretical guarantees, bracket structure, asynchronous variants, and practical configuration. You can implement Hyperband for your optimization problems and choose appropriate parameters. Next, we'll see how BOHB combines Hyperband with Bayesian optimization for even greater efficiency.