Loading learning content...
Multi-fidelity optimization methods like Hyperband and BOHB provide frameworks for within-run budget allocation—how to distribute resources across configurations during a single optimization run. But practitioners face broader strategic budget allocation decisions:
This page addresses these strategic considerations, providing frameworks for making principled budget allocation decisions across the full hyperparameter optimization lifecycle.
By the end of this page, you will understand: • Theoretical foundations of budget allocation in HPO • Parallel and distributed budget management • Adaptive budget strategies for changing constraints • Cross-campaign budget allocation and transfer • Practical budget planning for production systems
Budget allocation in hyperparameter optimization can be analyzed through the lens of optimal stopping theory and resource allocation under uncertainty.
The Fundamental Tradeoff:
Given total budget (B), we must decide:
Let (B = B_e + B_x + B_f) where these represent exploration, exploitation, and final training budgets respectively.
Optimal Allocation Theory:
Under simplifying assumptions (configurations sampled from known distribution, performance improves monotonically with budget), optimal allocation satisfies:
$$\frac{\partial \mathbb{E}[\text{Improvement}]}{\partial B_e} = \frac{\partial \mathbb{E}[\text{Improvement}]}{\partial B_x}$$
At the optimum, marginal improvement from exploration equals marginal improvement from exploitation. Early in optimization, exploration dominates; as good configurations are found, exploitation becomes more valuable.
Practical Implications:
| Phase | Exploration % | Exploitation % | Reserve % | Strategy |
|---|---|---|---|---|
| Early (0-25%) | 60-70% | 20-30% | 10% | Aggressive random search |
| Middle (25-60%) | 40-50% | 40-50% | 10% | Balanced Hyperband |
| Late (60-90%) | 20-30% | 50-60% | 20% | Focused BOHB |
| Final (90-100%) | 0% | 20-30% | 70-80% | Best config full training |
In distributed settings with multiple workers, budget allocation becomes more complex. Key considerations include worker heterogeneity, communication overhead, and load balancing.
Worker Heterogeneity:
Real clusters often contain heterogeneous hardware—different GPU types, varying memory, different network latencies. Naive allocation ignores this, leading to poor utilization.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
"""Parallel budget management with heterogeneous workers."""import numpy as npfrom dataclasses import dataclassfrom typing import Dict, List, Optionalfrom collections import defaultdictimport heapq @dataclassclass Worker: worker_id: str gpu_type: str throughput: float # Budget units per hour current_job: Optional[str] = None available_at: float = 0.0 @dataclassclass Job: job_id: str config_id: str budget: float priority: float = 0.0 # Higher = more important class ParallelBudgetManager: """ Manages budget allocation across heterogeneous workers. """ def __init__(self, workers: List[Worker]): self.workers = {w.worker_id: w for w in workers} self.pending_jobs: List[Job] = [] self.completed_jobs: List[Job] = [] self.current_time = 0.0 # Track throughput by GPU type for estimation self.throughput_history: Dict[str, List[float]] = defaultdict(list) def submit_job(self, job: Job): """Submit a job to the pending queue.""" heapq.heappush(self.pending_jobs, (-job.priority, job.budget, job)) def get_assignment(self, worker_id: str) -> Optional[Job]: """ Get next job assignment for a worker. Uses capability-aware scheduling. """ if not self.pending_jobs: return None worker = self.workers[worker_id] # Find best job for this worker # Prefer jobs where budget matches worker capability best_idx = 0 best_score = float('-inf') for i, (neg_priority, budget, job) in enumerate(self.pending_jobs): # Score based on priority and capability match time_estimate = budget / worker.throughput efficiency = worker.throughput / self._avg_throughput() # High-budget jobs go to high-throughput workers capability_match = efficiency if budget > self._median_budget() else 1/efficiency score = -neg_priority + 0.5 * capability_match if score > best_score: best_score = score best_idx = i # Remove and return best job _, _, job = self.pending_jobs.pop(best_idx) heapq.heapify(self.pending_jobs) worker.current_job = job.job_id worker.available_at = self.current_time + job.budget / worker.throughput return job def complete_job(self, worker_id: str, job: Job, actual_time: float): """Record job completion and update throughput estimates.""" worker = self.workers[worker_id] worker.current_job = None # Update throughput estimate actual_throughput = job.budget / actual_time self.throughput_history[worker.gpu_type].append(actual_throughput) # Exponential moving average alpha = 0.3 worker.throughput = alpha * actual_throughput + (1 - alpha) * worker.throughput self.completed_jobs.append(job) def _avg_throughput(self) -> float: return np.mean([w.throughput for w in self.workers.values()]) def _median_budget(self) -> float: if not self.pending_jobs: return 1.0 budgets = [job.budget for _, _, job in self.pending_jobs] return np.median(budgets) def estimate_completion_time(self) -> float: """Estimate time to complete all pending jobs.""" if not self.pending_jobs: return 0.0 total_budget = sum(job.budget for _, _, job in self.pending_jobs) total_throughput = sum(w.throughput for w in self.workers.values()) return total_budget / total_throughput def get_utilization(self) -> Dict[str, float]: """Get current worker utilization.""" busy = sum(1 for w in self.workers.values() if w.current_job) return { "busy_workers": busy, "total_workers": len(self.workers), "utilization": busy / len(self.workers), }Real-world optimization often faces changing budget constraints—cloud costs may limit resources, deadlines may shift, or new information may change priorities. Adaptive strategies adjust allocation dynamically.
Budget Reallocation Triggers:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
"""Adaptive budget allocation with dynamic reallocation."""import numpy as npfrom typing import Dict, List, Callablefrom dataclasses import dataclassfrom enum import Enum class BudgetPhase(Enum): EXPLORATION = "exploration" BALANCED = "balanced" EXPLOITATION = "exploitation" FINALIZATION = "finalization" @dataclassclass BudgetState: total_budget: float spent_budget: float best_performance: float iterations_without_improvement: int deadline_pressure: float # 0-1, higher = more pressure @property def remaining_budget(self) -> float: return self.total_budget - self.spent_budget @property def progress(self) -> float: return self.spent_budget / self.total_budget class AdaptiveBudgetAllocator: """ Dynamically adjusts budget allocation based on optimization state. """ def __init__( self, total_budget: float, convergence_patience: int = 10, phase_thresholds: Dict[str, float] = None, ): self.total_budget = total_budget self.convergence_patience = convergence_patience self.phase_thresholds = phase_thresholds or { "exploration_end": 0.25, "balanced_end": 0.60, "exploitation_end": 0.90, } self.state = BudgetState( total_budget=total_budget, spent_budget=0.0, best_performance=float('inf'), iterations_without_improvement=0, deadline_pressure=0.0, ) self.current_phase = BudgetPhase.EXPLORATION def update_state( self, budget_spent: float, best_performance: float, deadline_pressure: float = 0.0, ): """Update internal state after each iteration.""" self.state.spent_budget += budget_spent self.state.deadline_pressure = deadline_pressure if best_performance < self.state.best_performance: self.state.best_performance = best_performance self.state.iterations_without_improvement = 0 else: self.state.iterations_without_improvement += 1 self._update_phase() def _update_phase(self): """Determine current optimization phase.""" progress = self.state.progress converging = self.state.iterations_without_improvement > self.convergence_patience if progress < self.phase_thresholds["exploration_end"] and not converging: self.current_phase = BudgetPhase.EXPLORATION elif progress < self.phase_thresholds["balanced_end"] and not converging: self.current_phase = BudgetPhase.BALANCED elif progress < self.phase_thresholds["exploitation_end"]: self.current_phase = BudgetPhase.EXPLOITATION else: self.current_phase = BudgetPhase.FINALIZATION # Override for deadline pressure if self.state.deadline_pressure > 0.8: self.current_phase = BudgetPhase.FINALIZATION def get_allocation(self) -> Dict[str, float]: """Get recommended budget allocation for current phase.""" allocations = { BudgetPhase.EXPLORATION: { "n_configs": "high", "max_budget_fraction": 0.3, "random_fraction": 0.7, "aggressive_brackets": True, }, BudgetPhase.BALANCED: { "n_configs": "medium", "max_budget_fraction": 0.5, "random_fraction": 0.4, "aggressive_brackets": True, }, BudgetPhase.EXPLOITATION: { "n_configs": "low", "max_budget_fraction": 0.8, "random_fraction": 0.2, "aggressive_brackets": False, }, BudgetPhase.FINALIZATION: { "n_configs": "minimal", "max_budget_fraction": 1.0, "random_fraction": 0.0, "aggressive_brackets": False, }, } return allocations[self.current_phase] def should_early_stop_campaign(self) -> bool: """Check if optimization campaign should terminate early.""" # Stop if converged and sufficient budget spent if (self.state.iterations_without_improvement > 2 * self.convergence_patience and self.state.progress > 0.5): return True # Stop if deadline pressure is extreme if self.state.deadline_pressure > 0.95: return True return FalseOrganizations often run multiple optimization campaigns across different models, datasets, or problem domains. Strategic budget allocation across campaigns can dramatically improve overall efficiency.
Transfer Learning for HPO:
When optimizing hyperparameters for related problems, observations from previous campaigns can inform new ones:
Multi-Task Budget Allocation:
Given (k) tasks and total budget (B), how should budget be distributed?
Approaches:
Maintain a "configuration portfolio"—a set of hyperparameter configurations that collectively perform well across past tasks. When starting a new campaign, evaluate portfolio configurations first at low fidelity. This often identifies good starting points within 5-10% of total budget, leaving 90%+ for refinement.
Translating theoretical frameworks into practical budget plans requires consideration of real-world constraints.
| Component | Typical % | Considerations |
|---|---|---|
| Initial exploration | 15-25% | Random/grid search to understand landscape |
| Main optimization | 40-50% | Hyperband/BOHB with progressive refinement |
| Validation | 10-15% | Cross-validation of top candidates |
| Final training | 20-30% | Full training of selected configuration |
| Buffer | 5-10% | Unexpected reruns, debugging, verification |
• Underestimating final training cost (often 2-5x the max HPO budget level) • No buffer for infrastructure failures and debugging • Ignoring cross-validation costs for final model selection • Not accounting for hyperparameter-dependent training time (larger models train slower)
Production HPO systems require monitoring to detect issues and adjustment mechanisms to respond dynamically.
Key Metrics to Monitor:
Intervention Triggers:
| Signal | Possible Issue | Response |
|---|---|---|
| No improvement for many iterations | Convergence or stuck | Increase exploration, restart |
| High failure rate | Configuration issues | Narrow search space bounds |
| Low utilization | Scheduling inefficiency | Increase parallelism |
| Budget overrun | Underestimated costs | Reduce max budget, prune brackets |
Module Complete:
You have now completed the Multi-Fidelity Optimization module, covering early stopping approaches, Successive Halving, Hyperband, BOHB, and budget allocation strategies. These techniques form the foundation of efficient hyperparameter optimization at scale.
You now have a comprehensive understanding of multi-fidelity hyperparameter optimization—from theoretical foundations through practical implementation. You can design and deploy efficient HPO systems that find optimal configurations orders of magnitude faster than exhaustive search.