Machine LearningAutoML & Neural Architecture Search

AutoML Best Practices

LevelAdvanced

Duration90 mins

TopicAutoML & Neural Architecture Search

2 / 5

Resource Budgets

The Economics of Automated Search

AutoML systems fundamentally trade compute resources for human expertise. This trade has economics: compute costs money, takes time, and consumes energy. Understanding how to allocate resources efficiently is essential for extracting maximum value from AutoML investments.

Resource budgeting in AutoML is not simply a matter of 'run until resources are exhausted.' Sophisticated budget management involves understanding the diminishing returns curve, implementing intelligent stopping criteria, allocating resources across search phases, and balancing the exploration-exploitation tradeoff. This page provides the comprehensive framework for mastering AutoML resource economics.

What You Will Learn

By the end of this page, you will understand how to allocate compute budgets strategically, set effective time limits, implement early stopping mechanisms, optimize cost-performance tradeoffs, and apply progressive resource allocation strategies that maximize model quality within real-world constraints.

The Resource-Performance Curve

The relationship between resources invested and model performance follows a characteristic pattern that every AutoML practitioner must understand. This pattern exhibits diminishing returns—initial resource investment yields substantial gains, but additional resources provide progressively smaller improvements.

Converting Mermaid diagram...

The Four Phases of AutoML Search:

Phase 1: Rapid Improvement (0-20% of budget)

Default algorithms with reasonable hyperparameters are evaluated
Major algorithm families (tree-based, linear, neural) are compared
Performance improves dramatically as baseline models are established
Typical improvement: 60-80% of final performance achieved

Phase 2: Refinement (20-60% of budget)

Hyperparameter optimization refines promising algorithms
Feature selection and engineering improve quality
Ensemble methods begin combining strong models
Typical improvement: 80-95% of final performance achieved

Phase 3: Diminishing Returns (60-90% of budget)

Marginal hyperparameter adjustments
Complex ensemble stacking
Improvement rate slows significantly
Typical improvement: 95-99% of final performance achieved

Phase 4: Plateau/Overfitting Risk (>90% of budget)

Further search provides negligible or negative returns
Risk of overfitting to validation set increases
Resource expenditure becomes wasteful
Optimal stopping point typically reached

The 80/20 Rule in AutoML

In practice, most AutoML problems exhibit an 80/20 pattern: 80% of achievable performance gain occurs in the first 20% of the resource budget. The remaining 80% of resources contribute only 20% of performance improvement. Understanding this curve is critical for efficient resource allocation.

Typical Resource-Performance Distribution
Budget %	Cumulative Performance %	Marginal ROI	Recommended Action
0-10%	50-60%	Very High	Always invest
10-25%	70-80%	High	Standard investment
25-50%	85-92%	Medium	Continue for production quality
50-75%	94-97%	Low	Only for high-stakes applications
75-100%	97-99%	Very Low	Rarely justified
100%	99-100%	Near Zero	Wasteful; risk of overfitting

Compute Budget Allocation

Compute budget encompasses the computational resources allocated to AutoML search, measured in CPU/GPU hours, wall-clock time, or cloud cost. Effective allocation requires understanding both the total budget available and how to distribute that budget across search components.

Compute Budget Components

•Model Training Compute — The direct cost of training candidate models. Varies dramatically by model type: linear models take seconds, deep learning models can take hours.
•Model Evaluation Compute — Cross-validation, holdout evaluation, and ensemble testing. Often 3-5x the training compute when using k-fold cross-validation.
•Feature Preprocessing Compute — Feature engineering, selection, and transformation. Usually a small fraction of total budget but can dominate for complex transformations.
•Search Overhead — The meta-computation of the search algorithm itself (Bayesian optimization, SMAC, genetic algorithms). Typically <5% of total budget.
•Ensembling Compute — Training and validating ensemble combinations. Can be substantial for large ensemble portfolios.

Budget Sizing Guidelines:

Determining the right total budget depends on problem characteristics:

Budget Sizing Recommendations by Problem Scale
Dataset Size	Problem Complexity	Minimum Budget	Recommended Budget	Extended Budget
Small (<10K rows)	Simple (tabular, standard features)	1 CPU-hour	4-8 CPU-hours	24 CPU-hours
Medium (10K-1M rows)	Moderate (engineered features, imbalanced)	4 CPU-hours	24-48 CPU-hours	100 CPU-hours
Large (1M-100M rows)	Complex (many features, high cardinality)	24 CPU-hours	100-500 CPU-hours	1000+ CPU-hours
Very Large (>100M rows)	Very Complex (deep learning, NAS)	100 GPU-hours	500-2000 GPU-hours	10000+ GPU-hours

compute_budget_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def estimate_automl_budget(
    dataset_rows: int,
    num_features: int,
    problem_type: str,
    cv_folds: int = 5,
    algorithms_to_search: int = 20,
    hyperparameter_trials_per_algo: int = 50
) -> dict:
    """
    Estimate AutoML compute budget based on problem characteristics.
    
    Returns estimates in CPU-hours for tabular problems.
    For deep learning/NAS, multiply by ~10-100x and consider GPU requirements.
    """
    
    # Base training time estimates (seconds per 1000 rows)
    BASE_TRAINING_TIMES = {
        'linear': 0.1,        # LogisticRegression, Ridge, Lasso
        'tree': 0.5,          # DecisionTree
        'ensemble': 2.0,      # RandomForest, XGBoost, LightGBM
        'svm': 5.0,           # SVMs (scales poorly with data size)
        'neural': 10.0,       # Small MLPs
    }
    
    # Estimate base training time per model
    avg_training_time_per_1k = sum(BASE_TRAINING_TIMES.values()) / len(BASE_TRAINING_TIMES)
    base_training_seconds = (dataset_rows / 1000) * avg_training_time_per_1k
    
    # Scale by features (roughly sqrt relationship)
    feature_multiplier = (num_features / 50) ** 0.5
    scaled_training_seconds = base_training_seconds * feature_multiplier
    
    # Total evaluations
    total_evaluations = algorithms_to_search * hyperparameter_trials_per_algo * cv_folds
    
    # Total compute in seconds
    total_seconds = scaled_training_seconds * total_evaluations
    
    # Convert to hours
    cpu_hours = total_seconds / 3600
    
    return {
        'estimated_cpu_hours': round(cpu_hours, 1),
        'minimum_budget': round(cpu_hours * 0.25, 1),
        'recommended_budget': round(cpu_hours, 1),
        'extended_budget': round(cpu_hours * 3, 1),
        'total_evaluations': total_evaluations,
        'avg_seconds_per_eval': round(scaled_training_seconds, 2),
    }
 
# Example usage
budget = estimate_automl_budget(
    dataset_rows=100_000,
    num_features=100,
    problem_type='classification',
    cv_folds=5,
    algorithms_to_search=15,
    hyperparameter_trials_per_algo=40
)
 
print(f"Estimated CPU-hours: {budget['estimated_cpu_hours']}")
print(f"Recommended budget: {budget['recommended_budget']} CPU-hours")
print(f"Total evaluations planned: {budget['total_evaluations']}")

Cloud Cost Translation

To convert CPU-hours to cloud costs: AWS EC2 c5.xlarge (4 vCPU) costs ~$0.17/hour. A 100 CPU-hour budget ≈ 25 instance-hours ≈ $4.25 on-demand or ~$1.50 with spot instances. GPU instances (p3.2xlarge) cost ~$3/hour. Budget accordingly.

Time Budget Strategies

Time budgets constrain AutoML by wall-clock duration rather than compute consumption. This is often the more practical constraint in real-world projects where deadlines matter more than compute costs. Effective time budgeting requires understanding how time maps to search progress.

Time Budget Allocation Strategies

•Fixed Time Limit — Set an absolute wall-clock limit (e.g., 4 hours). Simple but doesn't adapt to problem difficulty. Best for recurring pipelines with predictable behavior.
•Per-Model Time Limit — Limit individual model training time (e.g., 10 minutes per model). Prevents single complex models from consuming the entire budget. Essential for datasets where some algorithms scale poorly.
•Adaptive Time Allocation — Start with short evaluations to identify promising directions, then allocate more time to promising candidates. Maximizes exploration efficiency.
•Holdout-Based Stopping — Monitor performance on a holdout set; stop when improvement stagnates. Prevents wasteful continued search after plateau.
•Multi-Fidelity Scheduling — Train models on data subsets first, promoting only promising candidates to full training. Dramatically accelerates search for large datasets.

time_budget_config.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Auto-sklearn: Time budget configuration
from autosklearn.classification import AutoSklearnClassifier
 
# Strategy 1: Fixed total time limit
clf_fixed = AutoSklearnClassifier(
    time_left_for_this_task=3600,    # Total time: 1 hour
    per_run_time_limit=300,           # Max 5 minutes per model
    ensemble_size=20,
    ensemble_nbest=10,
    memory_limit=8192,                # 8GB RAM limit
)
 
# Strategy 2: Aggressive early stopping for quick iteration
clf_quick = AutoSklearnClassifier(
    time_left_for_this_task=600,      # 10 minutes total
    per_run_time_limit=60,            # Max 1 minute per model
    initial_configurations_via_metalearning=10,  # Warm start
    ensemble_size=5,
)
 
# Strategy 3: Extended search for production quality
clf_production = AutoSklearnClassifier(
    time_left_for_this_task=14400,    # 4 hours total
    per_run_time_limit=600,           # 10 minutes per model
    ensemble_size=50,
    ensemble_nbest=25,
    resampling_strategy='cv',
    resampling_strategy_arguments={'folds': 5},
)
 
# AutoGluon: Preset-based time allocation
from autogluon.tabular import TabularPredictor
 
# Quick prototyping
predictor_quick = TabularPredictor(label='target').fit(
    train_data,
    time_limit=300,           # 5 minutes
    presets='medium_quality',
)
 
# Production-ready
predictor_prod = TabularPredictor(label='target').fit(
    train_data,
    time_limit=3600,          # 1 hour
    presets='best_quality',
    auto_stack=True,
)
 
# Custom per-model limits with AutoGluon
predictor_custom = TabularPredictor(label='target').fit(
    train_data,
    time_limit=7200,          # 2 hours total
    hyperparameters={
        'GBM': {'num_boost_round': 10000, 'ag_args_fit': {'max_time_limit': 600}},
        'NN_TORCH': {'epochs': 100, 'ag_args_fit': {'max_time_limit': 1200}},
        'CAT': {'iterations': 5000, 'ag_args_fit': {'max_time_limit': 600}},
    },
)

Time Allocation Heuristics:

When setting time budgets, consider these empirically-validated heuristics:

Dataset Size	Minimum Meaningful Budget	Standard Budget	Extended Budget
<10K rows	5-10 minutes	30-60 minutes	2-4 hours
10K-100K rows	30-60 minutes	2-4 hours	8-24 hours
100K-1M rows	2-4 hours	8-24 hours	2-7 days
1M rows	8-24 hours	2-7 days	2-4 weeks

These budgets assume availability of multi-core CPUs. GPU acceleration can reduce deep learning components by 10-50x but doesn't affect tree-based methods.

Per-Model Limits are Critical

Always set per-model time limits in addition to total time limits. Without per-model limits, a single pathological configuration (e.g., SVM with RBF kernel on large data) can consume your entire budget, leaving most algorithms unexplored.

Early Stopping Mechanisms

Early stopping is the process of terminating AutoML search before the allocated budget is exhausted when further search is unlikely to improve results. Effective early stopping dramatically improves resource efficiency without sacrificing model quality.

Early Stopping Strategies

•Model-Level Early Stopping — Terminate training of individual models when validation loss stops improving. Prevents wasted compute on models that have converged or are overfitting.
•Search-Level Early Stopping — Terminate the entire AutoML search when the best model score stops improving. Prevents continued exploration after the performance plateau.
•Successive Halving (Hyperband) — Train many configurations with minimal resources, then progressively allocate more resources to the top performers. Efficiently identifies promising configurations.
•Statistical Stopping Rules — Use statistical tests to determine when observed improvement is within noise range. Enables principled stopping decisions.
•Multi-Fidelity Early Stopping — Evaluate on data subsets; terminate configurations that underperform expected trajectories. Combines with successive halving for maximum efficiency.

Successive Halving and Hyperband:

Successive Halving is one of the most effective early stopping strategies for hyperparameter optimization. The approach is elegantly simple:

Start with many configurations and minimal budget per configuration
Evaluate all configurations with this minimal budget
Keep the top half (or third) of configurations
Double the budget for survivors
Repeat until only one configuration remains

Mathematical Formulation:

Given total budget B and n initial configurations:

Round 0: n configurations, each gets B/n resources
Round 1: n/2 configurations, each gets 2B/n resources
Round k: n/2^k configurations, each gets 2^k * B/n resources

This achieves logarithmic speedup over grid search while maintaining quality guarantees.

early_stopping_implementation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
import numpy as np
from typing import List, Callable
 
class SuccessiveHalving:
    """
    Successive Halving for hyperparameter optimization.
    
    Efficiently allocates budget by progressively eliminating
    poorly performing configurations.
    """
    
    def __init__(
        self,
        total_budget: int,
        num_configs: int,
        reduction_factor: int = 3,
        min_budget_per_config: int = 1
    ):
        """
        Args:
            total_budget: Total compute budget (e.g., epochs, iterations)
            num_configs: Number of initial configurations to evaluate
            reduction_factor: Fraction of configs to keep each round (default: keep 1/3)
            min_budget_per_config: Minimum budget for first round
        """
        self.total_budget = total_budget
        self.num_configs = num_configs
        self.reduction_factor = reduction_factor
        self.min_budget = min_budget_per_config
        
        # Calculate schedule
        self.schedule = self._compute_schedule()
        
    def _compute_schedule(self) -> List[dict]:
        """Compute the successive halving schedule."""
        schedule = []
        n_configs = self.num_configs
        budget = self.min_budget
        
        while n_configs >= 1:
            schedule.append({
                'num_configs': int(n_configs),
                'budget_per_config': int(budget),
            })
            n_configs = n_configs / self.reduction_factor
            budget = budget * self.reduction_factor
            
        return schedule
    
    def run(
        self,
        configs: List[dict],
        evaluate_fn: Callable[[dict, int], float],
    ) -> dict:
        """
        Run successive halving.
        
        Args:
            configs: List of configurations to evaluate
            evaluate_fn: Function(config, budget) -> score (higher is better)
            
        Returns:
            Best configuration found
        """
        active_configs = configs[:self.num_configs]
        
        for round_idx, round_info in enumerate(self.schedule):
            budget = round_info['budget_per_config']
            n_to_keep = round_info['num_configs']
            
            print(f"Round {round_idx}: Evaluating {len(active_configs)} configs "
                  f"with budget {budget}")
            
            # Evaluate all active configurations
            scores = []
            for config in active_configs:
                score = evaluate_fn(config, budget)
                scores.append((score, config))
            
            # Sort by score (descending) and keep top performers
            scores.sort(key=lambda x: x[0], reverse=True)
            active_configs = [cfg for _, cfg in scores[:n_to_keep]]
            
            print(f"  Best score this round: {scores[0][0]:.4f}")
            
        return active_configs[0] if active_configs else None
 
 
class SearchLevelEarlyStopping:
    """
    Early stopping for the overall AutoML search based on
    improvement history.
    """
    
    def __init__(
        self,
        patience: int = 20,
        min_improvement: float = 0.001,
        min_evaluations: int = 50,
    ):
        """
        Args:
            patience: Number of evaluations without improvement before stopping
            min_improvement: Minimum relative improvement to count as progress
            min_evaluations: Minimum evaluations before early stopping is considered
        """
        self.patience = patience
        self.min_improvement = min_improvement
        self.min_evaluations = min_evaluations
        
        self.best_score = float('-inf')
        self.evaluations_since_improvement = 0
        self.total_evaluations = 0
        self.history = []
        
    def update(self, score: float) -> bool:
        """
        Update with new evaluation result.
        
        Returns:
            True if search should continue, False if should stop
        """
        self.total_evaluations += 1
        self.history.append(score)
        
        # Check for improvement
        relative_improvement = (score - self.best_score) / max(abs(self.best_score), 1e-10)
        
        if relative_improvement > self.min_improvement:
            self.best_score = score
            self.evaluations_since_improvement = 0
        else:
            self.evaluations_since_improvement += 1
            
        # Decide whether to continue
        if self.total_evaluations < self.min_evaluations:
            return True  # Keep searching
            
        if self.evaluations_since_improvement >= self.patience:
            print(f"Early stopping: No improvement for {self.patience} evaluations")
            return False  # Stop searching
            
        return True  # Keep searching
    
    def get_statistics(self) -> dict:
        """Get search statistics."""
        return {
            'total_evaluations': self.total_evaluations,
            'best_score': self.best_score,
            'evaluations_since_improvement': self.evaluations_since_improvement,
            'improvement_rate': len([h for h in self.history if h >= self.best_score * 0.99]) / max(len(self.history), 1),
        }
 
 
# Usage example
def example_usage():
    # Successive Halving example
    sh = SuccessiveHalving(
        total_budget=1000,
        num_configs=81,
        reduction_factor=3,
        min_budget_per_config=10
    )
    
    print("Successive Halving Schedule:")
    for i, round_info in enumerate(sh.schedule):
        print(f"  Round {i}: {round_info['num_configs']} configs, "
              f"{round_info['budget_per_config']} budget each")
    
    # Early stopping example
    stopper = SearchLevelEarlyStopping(patience=10, min_evaluations=30)
    
    # Simulate search with diminishing returns
    for i in range(100):
        # Simulate score that improves initially then plateaus
        score = 0.5 + 0.4 * (1 - np.exp(-i/10)) + np.random.normal(0, 0.01)
        
        if not stopper.update(score):
            print(f"Stopped at evaluation {i}")
            break
    
    print(f"Final stats: {stopper.get_statistics()}")

Empirical Stopping Rule

A practical rule of thumb: if the top-5 models haven't improved by more than 0.1% for 20 consecutive evaluations after at least 50 total evaluations, further search is typically wasteful. This heuristic works well across diverse tabular problems.

Multi-Fidelity Optimization

Multi-fidelity optimization exploits the insight that cheap approximations can identify promising configurations before expensive full evaluations. By evaluating on subsets of data, fewer epochs, or simpler proxies, multi-fidelity methods dramatically accelerate AutoML search.

Key Fidelity Dimensions:

1. Data Subsampling

Train on 10% of data initially, promote good performers to 100%
Works well when learning curves are monotonic (more data → better performance)
Typical speedup: 5-20x

2. Epoch/Iteration Reduction

Train for 10 epochs initially, extend for promising configurations
Works for iterative learners (neural networks, gradient boosting)
Typical speedup: 3-10x

3. Model Simplification

Evaluate with fewer trees, smaller networks, simpler features first
Works when model complexity scales with compute
Typical speedup: 2-5x

4. Cross-Validation Reduction

Use 2-fold CV initially, 5-fold for finalists
Reduces evaluation cost by 60%
Typical speedup: 2-3x

multi_fidelity_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from typing import Dict, Any, Tuple
 
class MultiFidelityOptimizer:
    """
    Multi-fidelity hyperparameter optimization using data subsampling.
    
    Evaluates configurations on progressively larger data subsets,
    eliminating poorly performing configurations early.
    """
    
    def __init__(
        self,
        fidelity_levels: list = [0.1, 0.3, 0.5, 1.0],
        survival_rate: float = 0.5,
        cv_folds: int = 3,
    ):
        """
        Args:
            fidelity_levels: Data fraction at each stage (increasing)
            survival_rate: Fraction of configs that survive each round
            cv_folds: Cross-validation folds for evaluation
        """
        self.fidelity_levels = fidelity_levels
        self.survival_rate = survival_rate
        self.cv_folds = cv_folds
        
    def _subsample(self, X, y, fraction: float):
        """Subsample data to given fraction."""
        n_samples = int(len(X) * fraction)
        indices = np.random.choice(len(X), n_samples, replace=False)
        return X[indices], y[indices]
    
    def _evaluate_config(
        self,
        config: Dict[str, Any],
        X, y,
        fidelity: float
    ) -> float:
        """Evaluate a configuration at given fidelity level."""
        X_sub, y_sub = self._subsample(X, y, fidelity)
        
        model = RandomForestClassifier(**config, random_state=42)
        scores = cross_val_score(model, X_sub, y_sub, cv=self.cv_folds)
        
        return scores.mean()
    
    def optimize(
        self,
        configs: list,
        X, y,
    ) -> Tuple[Dict[str, Any], float]:
        """
        Run multi-fidelity optimization.
        
        Args:
            configs: List of configuration dicts
            X: Feature matrix
            y: Target vector
            
        Returns:
            Tuple of (best_config, best_score)
        """
        active_configs = configs.copy()
        
        for fidelity in self.fidelity_levels:
            n_survive = max(1, int(len(active_configs) * self.survival_rate))
            
            print(f"
Fidelity {fidelity:.0%}: Evaluating {len(active_configs)} configs")
            
            # Evaluate all active configs
            results = []
            for config in active_configs:
                score = self._evaluate_config(config, X, y, fidelity)
                results.append((score, config))
                
            # Sort and keep top performers
            results.sort(key=lambda x: x[0], reverse=True)
            
            # Report top score at this fidelity
            print(f"  Top score at {fidelity:.0%} fidelity: {results[0][0]:.4f}")
            
            # Keep survivors for next round
            active_configs = [cfg for _, cfg in results[:n_survive]]
            
        # Final result is best config at full fidelity
        return active_configs[0], results[0][0]
 
 
# Example: Hyperband implementation sketch
class Hyperband:
    """
    Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.
    
    Combines successive halving with multiple brackets to handle
    the exploration-exploitation tradeoff in budget allocation.
    """
    
    def __init__(
        self,
        max_budget: int,
        reduction_factor: float = 3,
    ):
        self.max_budget = max_budget  # R in paper
        self.eta = reduction_factor    # η in paper
        
        # Number of successive halving brackets
        self.s_max = int(np.log(max_budget) / np.log(reduction_factor))
        self.B = (self.s_max + 1) * max_budget  # Total budget
        
    def get_brackets(self) -> list:
        """
        Generate Hyperband brackets.
        
        Each bracket represents a successive halving run with
        different initial configurations and budgets.
        """
        brackets = []
        
        for s in range(self.s_max, -1, -1):
            # Number of configurations in this bracket
            n = int(np.ceil((self.B / self.max_budget) * 
                           (self.eta ** s) / (s + 1)))
            # Minimum budget per config in this bracket
            r = self.max_budget * (self.eta ** (-s))
            
            bracket = {
                'bracket_id': self.s_max - s,
                'initial_configs': n,
                'min_budget': r,
                'num_rounds': s + 1,
            }
            brackets.append(bracket)
            
        return brackets
    
    def print_schedule(self):
        """Print Hyperband schedule for visualization."""
        print(f"Hyperband Schedule (max_budget={self.max_budget}, η={self.eta})")
        print(f"Total budget per run: {self.B}")
        print()
        
        for bracket in self.get_brackets():
            print(f"Bracket {bracket['bracket_id']}:")
            print(f"  Initial configs: {bracket['initial_configs']}")
            print(f"  Minimum budget: {bracket['min_budget']:.1f}")
            print(f"  Rounds: {bracket['num_rounds']}")
            
            # Show successive halving within bracket
            n = bracket['initial_configs']
            r = bracket['min_budget']
            for round_idx in range(bracket['num_rounds']):
                print(f"    Round {round_idx}: {int(n)} configs × {int(r)} budget")
                n = n / self.eta
                r = r * self.eta
            print()
 
 
# Demo
if __name__ == "__main__":
    hb = Hyperband(max_budget=81, reduction_factor=3)
    hb.print_schedule()

When Multi-Fidelity Works Best

Multi-fidelity optimization works best when low-fidelity evaluations are predictive of high-fidelity performance. This assumption holds for most neural networks and gradient boosting but may fail for models where performance is non-monotonic in resources (e.g., some regularization settings).

Cost Optimization Strategies

For cloud-based AutoML, monetary cost becomes a primary concern. Cost optimization requires understanding cloud pricing models and tailoring AutoML strategies accordingly.

Cloud Cost Optimization Tactics

•Spot/Preemptible Instances — Use spot instances for 60-90% cost savings. AutoML is naturally spot-friendly because interrupted evaluations can be retried. Configure checkpointing to minimize waste from preemption.
•Right-Sizing Instances — Match instance type to workload. CPU instances for tree-based models, GPU for neural networks. Avoid over-provisioning memory or compute.
•Reserved Capacity for Predictable Workloads — If running regular AutoML pipelines, reserved instances can reduce costs by 30-60% compared to on-demand.
•Parallel Scaling Efficiency — Diminishing returns on parallelization beyond ~10-20 concurrent workers due to search algorithm overhead. Don't over-parallelize.
•Storage Optimization — AutoML generates many artifacts (models, logs, checkpoints). Use lifecycle policies to clean up after runs. Don't pay for dormant storage.
•Off-Peak Scheduling — Some cloud regions have lower demand (and prices) during off-peak hours. Schedule long runs during these windows when possible.

Cost Comparison: AutoML Deployment Strategies
Strategy	Hourly Cost (Example)	Reliability	Best For
On-Demand Instances	$1.00/CPU-hour	High	Critical production runs
Spot/Preemptible	$0.30/CPU-hour	Medium (interruptions)	Experimentation, long runs
Reserved (1-year)	$0.60/CPU-hour	High	Regular pipeline runs
Managed AutoML (Vertex, SageMaker)	$0.50-2.00/model-hour	Very High	Low operational overhead
Hybrid (Spot + On-Demand fallback)	$0.40/CPU-hour average	High	Cost-sensitive production

cost_aware_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
"""
Cost-Aware AutoML Configuration Examples
"""
 
# AWS SageMaker Autopilot with cost controls
import sagemaker
from sagemaker.automl.automl import AutoML
 
automl = AutoML(
    role='SageMakerRole',
    target_attribute_name='target',
    output_path='s3://bucket/output',
    
    # Cost control: limit job duration
    max_runtime_per_training_job_in_seconds=600,
    total_job_runtime_in_seconds=7200,
    
    # Cost control: limit candidates
    max_candidates=50,
    
    # Use spot instances for training
    problem_type='BinaryClassification',
    job_objective={'MetricName': 'AUC'},
)
 
# Google Cloud Vertex AI with budget constraints
from google.cloud import aiplatform
 
aiplatform.init(project='my-project', location='us-central1')
 
# Create AutoML job with budget awareness
job = aiplatform.AutoMLTabularTrainingJob(
    display_name='cost-aware-automl',
    optimization_prediction_type='classification',
    optimization_objective='maximize-au-roc',
)
 
model = job.run(
    dataset=dataset,
    target_column='target',
    
    # Budget constraints
    budget_milli_node_hours=1000,  # 1 node-hour maximum
    
    # Disable neural architecture search to reduce cost
    disable_early_stopping=False,
)
 
# H2O AutoML with resource limits
import h2o
from h2o.automl import H2OAutoML
 
h2o.init(max_mem_size='8G')  # Limit memory
 
aml = H2OAutoML(
    max_runtime_secs=3600,         # 1 hour limit
    max_models=20,                  # Model limit
    
    # Exclude expensive algorithm families to control cost
    exclude_algos=['DeepLearning'],  # Skip neural networks
    
    # Use less cross-validation
    nfolds=3,                        # 3-fold instead of default 5
    
    # Early stopping
    stopping_metric='AUC',
    stopping_rounds=5,
    stopping_tolerance=0.001,
)
 
aml.train(x=features, y=target, training_frame=train)
 
 
def estimate_cloud_cost(
    cpu_hours: float,
    gpu_hours: float = 0,
    storage_gb_months: float = 10,
    instance_type: str = 'spot'
) -> dict:
    """
    Estimate cloud costs for AutoML run.
    
    Based on approximate AWS pricing (varies by region/time).
    """
    PRICES = {
        'cpu_ondemand': 0.17,      # per vCPU-hour
        'cpu_spot': 0.05,          # per vCPU-hour
        'cpu_reserved': 0.10,      # per vCPU-hour
        'gpu_ondemand': 3.06,      # per GPU-hour (V100)
        'gpu_spot': 0.90,          # per GPU-hour
        'storage': 0.023,          # per GB-month (S3)
    }
    
    if instance_type == 'spot':
        cpu_cost = cpu_hours * PRICES['cpu_spot']
        gpu_cost = gpu_hours * PRICES['gpu_spot']
    elif instance_type == 'reserved':
        cpu_cost = cpu_hours * PRICES['cpu_reserved']
        gpu_cost = gpu_hours * PRICES['gpu_ondemand'] * 0.6  # ~40% discount
    else:  # on-demand
        cpu_cost = cpu_hours * PRICES['cpu_ondemand']
        gpu_cost = gpu_hours * PRICES['gpu_ondemand']
    
    storage_cost = storage_gb_months * PRICES['storage']
    
    total = cpu_cost + gpu_cost + storage_cost
    
    return {
        'cpu_cost': round(cpu_cost, 2),
        'gpu_cost': round(gpu_cost, 2),
        'storage_cost': round(storage_cost, 2),
        'total_cost': round(total, 2),
        'instance_type': instance_type,
    }
 
# Example
print(estimate_cloud_cost(cpu_hours=100, gpu_hours=10, instance_type='spot'))
# {'cpu_cost': 5.0, 'gpu_cost': 9.0, 'storage_cost': 0.23, 'total_cost': 14.23, 'instance_type': 'spot'}

The Spot Instance Strategy

Spot instances are ideal for AutoML because: (1) AutoML is naturally fault-tolerant—interrupted evaluations can restart, (2) most evaluations are independent, enabling easy rescheduling, (3) checkpointing allows resumption mid-training. Configure your AutoML to save checkpoints every 10-15 minutes to minimize waste from interruptions.

Portfolio Budget Allocation

Organizations running multiple AutoML projects must allocate a finite budget across projects. This portfolio perspective requires balancing exploration (trying new approaches) with exploitation (refining known successes).

Portfolio Allocation Principles:

1. Prioritize by Business Impact Allocate more budget to projects with higher potential business value. A 1% improvement on a $100M revenue stream justifies more investment than a 10% improvement on a $100K stream.

2. Consider Marginal Returns Projects early in development have higher marginal returns from additional budget than mature projects near performance plateaus.

3. Diversify Risk Don't concentrate all budget on a single speculative project. Spread allocation to ensure some projects definitely succeed.

4. Reserve Contingency Maintain 10-20% budget reserve for unexpected opportunities or project extensions. AutoML often reveals promising directions that warrant further exploration.

Example Portfolio Budget Allocation
Project	Business Value	Current Stage	Marginal ROI	Budget %
Customer Churn	High ($5M/yr impact)	Development	High	30%
Fraud Detection	Critical (risk mitigation)	Refinement	Medium	25%
Pricing Optimization	High ($3M/yr impact)	Exploration	Very High	20%
Inventory Forecasting	Medium ($500K/yr impact)	Production	Low	10%
Customer Segmentation	Medium (strategic)	Development	High	10%
Contingency Reserve	—	—	—	5%

The 70/20/10 Rule

A useful heuristic for AutoML portfolio allocation: 70% to proven, production-critical models with clear ROI; 20% to promising development projects with high potential; 10% to speculative exploration of new problem areas or techniques. This balance ensures both reliability and innovation.

Summary: Mastering Resource Budgets

We've covered the comprehensive landscape of AutoML resource budgeting. Let's consolidate the essential principles:

Key Takeaways

•Diminishing returns are fundamental — Most performance gain occurs in early budget expenditure; extended search provides marginal improvement. The 80/20 rule applies.
•Set both total and per-model limits — Total limits prevent runaway costs; per-model limits prevent single configurations from consuming the entire budget.
•Implement early stopping at multiple levels — Model-level stopping prevents wasted training; search-level stopping prevents continued exploration after plateau.
•Multi-fidelity methods provide dramatic speedups — Evaluating on data subsets or reduced epochs before full evaluation enables exploring more configurations within budget.
•Cloud cost optimization is orthogonal to search optimization — Spot instances, right-sizing, and smart scheduling reduce costs without affecting model quality.
•Portfolio thinking applies to multi-project budgets — Allocate based on business impact, marginal returns, and risk diversification rather than equal distribution.

What's Next:

With resource budgeting mastered, we turn to a critical operational challenge: Constraint Handling. The next page examines how to incorporate real-world constraints—latency requirements, memory limits, fairness criteria, regulatory requirements—into AutoML search to produce models that are not just accurate but deployable.

Page Complete

You now have a comprehensive framework for AutoML resource budgeting—from individual search runs to organizational portfolio allocation. This knowledge enables efficient investment of compute resources, avoiding both under-exploration and wasteful over-search.

2 / 5

Loading learning content...

Machine LearningAutoML & Neural Architecture Search

AutoML Best Practices

LevelAdvanced

Duration90 mins

TopicAutoML & Neural Architecture Search

2 / 5

Resource Budgets

The Economics of Automated Search

What You Will Learn

The Resource-Performance Curve

Converting Mermaid diagram...

The Four Phases of AutoML Search:

Phase 1: Rapid Improvement (0-20% of budget)

Default algorithms with reasonable hyperparameters are evaluated
Major algorithm families (tree-based, linear, neural) are compared
Performance improves dramatically as baseline models are established
Typical improvement: 60-80% of final performance achieved

Phase 2: Refinement (20-60% of budget)

Hyperparameter optimization refines promising algorithms
Feature selection and engineering improve quality
Ensemble methods begin combining strong models
Typical improvement: 80-95% of final performance achieved

Phase 3: Diminishing Returns (60-90% of budget)

Marginal hyperparameter adjustments
Complex ensemble stacking
Improvement rate slows significantly
Typical improvement: 95-99% of final performance achieved

Phase 4: Plateau/Overfitting Risk (>90% of budget)

Further search provides negligible or negative returns
Risk of overfitting to validation set increases
Resource expenditure becomes wasteful
Optimal stopping point typically reached

The 80/20 Rule in AutoML

Typical Resource-Performance Distribution
Budget %	Cumulative Performance %	Marginal ROI	Recommended Action
0-10%	50-60%	Very High	Always invest
10-25%	70-80%	High	Standard investment
25-50%	85-92%	Medium	Continue for production quality
50-75%	94-97%	Low	Only for high-stakes applications
75-100%	97-99%	Very Low	Rarely justified
100%	99-100%	Near Zero	Wasteful; risk of overfitting

Compute Budget Allocation

Compute Budget Components

•Model Training Compute — The direct cost of training candidate models. Varies dramatically by model type: linear models take seconds, deep learning models can take hours.
•Model Evaluation Compute — Cross-validation, holdout evaluation, and ensemble testing. Often 3-5x the training compute when using k-fold cross-validation.
•Feature Preprocessing Compute — Feature engineering, selection, and transformation. Usually a small fraction of total budget but can dominate for complex transformations.
•Search Overhead — The meta-computation of the search algorithm itself (Bayesian optimization, SMAC, genetic algorithms). Typically <5% of total budget.
•Ensembling Compute — Training and validating ensemble combinations. Can be substantial for large ensemble portfolios.

Budget Sizing Guidelines:

Determining the right total budget depends on problem characteristics:

Budget Sizing Recommendations by Problem Scale
Dataset Size	Problem Complexity	Minimum Budget	Recommended Budget	Extended Budget
Small (<10K rows)	Simple (tabular, standard features)	1 CPU-hour	4-8 CPU-hours	24 CPU-hours
Medium (10K-1M rows)	Moderate (engineered features, imbalanced)	4 CPU-hours	24-48 CPU-hours	100 CPU-hours
Large (1M-100M rows)	Complex (many features, high cardinality)	24 CPU-hours	100-500 CPU-hours	1000+ CPU-hours
Very Large (>100M rows)	Very Complex (deep learning, NAS)	100 GPU-hours	500-2000 GPU-hours	10000+ GPU-hours

compute_budget_estimation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def estimate_automl_budget(
    dataset_rows: int,
    num_features: int,
    problem_type: str,
    cv_folds: int = 5,
    algorithms_to_search: int = 20,
    hyperparameter_trials_per_algo: int = 50
) -> dict:
    """
    Estimate AutoML compute budget based on problem characteristics.
    
    Returns estimates in CPU-hours for tabular problems.
    For deep learning/NAS, multiply by ~10-100x and consider GPU requirements.
    """
    
    # Base training time estimates (seconds per 1000 rows)
    BASE_TRAINING_TIMES = {
        'linear': 0.1,        # LogisticRegression, Ridge, Lasso
        'tree': 0.5,          # DecisionTree
        'ensemble': 2.0,      # RandomForest, XGBoost, LightGBM
        'svm': 5.0,           # SVMs (scales poorly with data size)
        'neural': 10.0,       # Small MLPs
    }
    
    # Estimate base training time per model
    avg_training_time_per_1k = sum(BASE_TRAINING_TIMES.values()) / len(BASE_TRAINING_TIMES)
    base_training_seconds = (dataset_rows / 1000) * avg_training_time_per_1k
    
    # Scale by features (roughly sqrt relationship)
    feature_multiplier = (num_features / 50) ** 0.5
    scaled_training_seconds = base_training_seconds * feature_multiplier
    
    # Total evaluations
    total_evaluations = algorithms_to_search * hyperparameter_trials_per_algo * cv_folds
    
    # Total compute in seconds
    total_seconds = scaled_training_seconds * total_evaluations
    
    # Convert to hours
    cpu_hours = total_seconds / 3600
    
    return {
        'estimated_cpu_hours': round(cpu_hours, 1),
        'minimum_budget': round(cpu_hours * 0.25, 1),
        'recommended_budget': round(cpu_hours, 1),
        'extended_budget': round(cpu_hours * 3, 1),
        'total_evaluations': total_evaluations,
        'avg_seconds_per_eval': round(scaled_training_seconds, 2),
    }
 
# Example usage
budget = estimate_automl_budget(
    dataset_rows=100_000,
    num_features=100,
    problem_type='classification',
    cv_folds=5,
    algorithms_to_search=15,
    hyperparameter_trials_per_algo=40
)
 
print(f"Estimated CPU-hours: {budget['estimated_cpu_hours']}")
print(f"Recommended budget: {budget['recommended_budget']} CPU-hours")
print(f"Total evaluations planned: {budget['total_evaluations']}")

Cloud Cost Translation

Time Budget Strategies

Time Budget Allocation Strategies

•Fixed Time Limit — Set an absolute wall-clock limit (e.g., 4 hours). Simple but doesn't adapt to problem difficulty. Best for recurring pipelines with predictable behavior.
•Per-Model Time Limit — Limit individual model training time (e.g., 10 minutes per model). Prevents single complex models from consuming the entire budget. Essential for datasets where some algorithms scale poorly.
•Adaptive Time Allocation — Start with short evaluations to identify promising directions, then allocate more time to promising candidates. Maximizes exploration efficiency.
•Holdout-Based Stopping — Monitor performance on a holdout set; stop when improvement stagnates. Prevents wasteful continued search after plateau.
•Multi-Fidelity Scheduling — Train models on data subsets first, promoting only promising candidates to full training. Dramatically accelerates search for large datasets.

time_budget_config.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Auto-sklearn: Time budget configuration
from autosklearn.classification import AutoSklearnClassifier
 
# Strategy 1: Fixed total time limit
clf_fixed = AutoSklearnClassifier(
    time_left_for_this_task=3600,    # Total time: 1 hour
    per_run_time_limit=300,           # Max 5 minutes per model
    ensemble_size=20,
    ensemble_nbest=10,
    memory_limit=8192,                # 8GB RAM limit
)
 
# Strategy 2: Aggressive early stopping for quick iteration
clf_quick = AutoSklearnClassifier(
    time_left_for_this_task=600,      # 10 minutes total
    per_run_time_limit=60,            # Max 1 minute per model
    initial_configurations_via_metalearning=10,  # Warm start
    ensemble_size=5,
)
 
# Strategy 3: Extended search for production quality
clf_production = AutoSklearnClassifier(
    time_left_for_this_task=14400,    # 4 hours total
    per_run_time_limit=600,           # 10 minutes per model
    ensemble_size=50,
    ensemble_nbest=25,
    resampling_strategy='cv',
    resampling_strategy_arguments={'folds': 5},
)
 
# AutoGluon: Preset-based time allocation
from autogluon.tabular import TabularPredictor
 
# Quick prototyping
predictor_quick = TabularPredictor(label='target').fit(
    train_data,
    time_limit=300,           # 5 minutes
    presets='medium_quality',
)
 
# Production-ready
predictor_prod = TabularPredictor(label='target').fit(
    train_data,
    time_limit=3600,          # 1 hour
    presets='best_quality',
    auto_stack=True,
)
 
# Custom per-model limits with AutoGluon
predictor_custom = TabularPredictor(label='target').fit(
    train_data,
    time_limit=7200,          # 2 hours total
    hyperparameters={
        'GBM': {'num_boost_round': 10000, 'ag_args_fit': {'max_time_limit': 600}},
        'NN_TORCH': {'epochs': 100, 'ag_args_fit': {'max_time_limit': 1200}},
        'CAT': {'iterations': 5000, 'ag_args_fit': {'max_time_limit': 600}},
    },
)

Time Allocation Heuristics:

When setting time budgets, consider these empirically-validated heuristics:

Dataset Size	Minimum Meaningful Budget	Standard Budget	Extended Budget
<10K rows	5-10 minutes	30-60 minutes	2-4 hours
10K-100K rows	30-60 minutes	2-4 hours	8-24 hours
100K-1M rows	2-4 hours	8-24 hours	2-7 days
1M rows	8-24 hours	2-7 days	2-4 weeks

These budgets assume availability of multi-core CPUs. GPU acceleration can reduce deep learning components by 10-50x but doesn't affect tree-based methods.

Per-Model Limits are Critical

Early Stopping Mechanisms

Early Stopping Strategies

•Model-Level Early Stopping — Terminate training of individual models when validation loss stops improving. Prevents wasted compute on models that have converged or are overfitting.
•Search-Level Early Stopping — Terminate the entire AutoML search when the best model score stops improving. Prevents continued exploration after the performance plateau.
•Successive Halving (Hyperband) — Train many configurations with minimal resources, then progressively allocate more resources to the top performers. Efficiently identifies promising configurations.
•Statistical Stopping Rules — Use statistical tests to determine when observed improvement is within noise range. Enables principled stopping decisions.
•Multi-Fidelity Early Stopping — Evaluate on data subsets; terminate configurations that underperform expected trajectories. Combines with successive halving for maximum efficiency.

Successive Halving and Hyperband:

Successive Halving is one of the most effective early stopping strategies for hyperparameter optimization. The approach is elegantly simple:

Start with many configurations and minimal budget per configuration
Evaluate all configurations with this minimal budget
Keep the top half (or third) of configurations
Double the budget for survivors
Repeat until only one configuration remains

Mathematical Formulation:

Given total budget B and n initial configurations:

Round 0: n configurations, each gets B/n resources
Round 1: n/2 configurations, each gets 2B/n resources
Round k: n/2^k configurations, each gets 2^k * B/n resources

This achieves logarithmic speedup over grid search while maintaining quality guarantees.

early_stopping_implementation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
import numpy as np
from typing import List, Callable
 
class SuccessiveHalving:
    """
    Successive Halving for hyperparameter optimization.
    
    Efficiently allocates budget by progressively eliminating
    poorly performing configurations.
    """
    
    def __init__(
        self,
        total_budget: int,
        num_configs: int,
        reduction_factor: int = 3,
        min_budget_per_config: int = 1
    ):
        """
        Args:
            total_budget: Total compute budget (e.g., epochs, iterations)
            num_configs: Number of initial configurations to evaluate
            reduction_factor: Fraction of configs to keep each round (default: keep 1/3)
            min_budget_per_config: Minimum budget for first round
        """
        self.total_budget = total_budget
        self.num_configs = num_configs
        self.reduction_factor = reduction_factor
        self.min_budget = min_budget_per_config
        
        # Calculate schedule
        self.schedule = self._compute_schedule()
        
    def _compute_schedule(self) -> List[dict]:
        """Compute the successive halving schedule."""
        schedule = []
        n_configs = self.num_configs
        budget = self.min_budget
        
        while n_configs >= 1:
            schedule.append({
                'num_configs': int(n_configs),
                'budget_per_config': int(budget),
            })
            n_configs = n_configs / self.reduction_factor
            budget = budget * self.reduction_factor
            
        return schedule
    
    def run(
        self,
        configs: List[dict],
        evaluate_fn: Callable[[dict, int], float],
    ) -> dict:
        """
        Run successive halving.
        
        Args:
            configs: List of configurations to evaluate
            evaluate_fn: Function(config, budget) -> score (higher is better)
            
        Returns:
            Best configuration found
        """
        active_configs = configs[:self.num_configs]
        
        for round_idx, round_info in enumerate(self.schedule):
            budget = round_info['budget_per_config']
            n_to_keep = round_info['num_configs']
            
            print(f"Round {round_idx}: Evaluating {len(active_configs)} configs "
                  f"with budget {budget}")
            
            # Evaluate all active configurations
            scores = []
            for config in active_configs:
                score = evaluate_fn(config, budget)
                scores.append((score, config))
            
            # Sort by score (descending) and keep top performers
            scores.sort(key=lambda x: x[0], reverse=True)
            active_configs = [cfg for _, cfg in scores[:n_to_keep]]
            
            print(f"  Best score this round: {scores[0][0]:.4f}")
            
        return active_configs[0] if active_configs else None
 
 
class SearchLevelEarlyStopping:
    """
    Early stopping for the overall AutoML search based on
    improvement history.
    """
    
    def __init__(
        self,
        patience: int = 20,
        min_improvement: float = 0.001,
        min_evaluations: int = 50,
    ):
        """
        Args:
            patience: Number of evaluations without improvement before stopping
            min_improvement: Minimum relative improvement to count as progress
            min_evaluations: Minimum evaluations before early stopping is considered
        """
        self.patience = patience
        self.min_improvement = min_improvement
        self.min_evaluations = min_evaluations
        
        self.best_score = float('-inf')
        self.evaluations_since_improvement = 0
        self.total_evaluations = 0
        self.history = []
        
    def update(self, score: float) -> bool:
        """
        Update with new evaluation result.
        
        Returns:
            True if search should continue, False if should stop
        """
        self.total_evaluations += 1
        self.history.append(score)
        
        # Check for improvement
        relative_improvement = (score - self.best_score) / max(abs(self.best_score), 1e-10)
        
        if relative_improvement > self.min_improvement:
            self.best_score = score
            self.evaluations_since_improvement = 0
        else:
            self.evaluations_since_improvement += 1
            
        # Decide whether to continue
        if self.total_evaluations < self.min_evaluations:
            return True  # Keep searching
            
        if self.evaluations_since_improvement >= self.patience:
            print(f"Early stopping: No improvement for {self.patience} evaluations")
            return False  # Stop searching
            
        return True  # Keep searching
    
    def get_statistics(self) -> dict:
        """Get search statistics."""
        return {
            'total_evaluations': self.total_evaluations,
            'best_score': self.best_score,
            'evaluations_since_improvement': self.evaluations_since_improvement,
            'improvement_rate': len([h for h in self.history if h >= self.best_score * 0.99]) / max(len(self.history), 1),
        }
 
 
# Usage example
def example_usage():
    # Successive Halving example
    sh = SuccessiveHalving(
        total_budget=1000,
        num_configs=81,
        reduction_factor=3,
        min_budget_per_config=10
    )
    
    print("Successive Halving Schedule:")
    for i, round_info in enumerate(sh.schedule):
        print(f"  Round {i}: {round_info['num_configs']} configs, "
              f"{round_info['budget_per_config']} budget each")
    
    # Early stopping example
    stopper = SearchLevelEarlyStopping(patience=10, min_evaluations=30)
    
    # Simulate search with diminishing returns
    for i in range(100):
        # Simulate score that improves initially then plateaus
        score = 0.5 + 0.4 * (1 - np.exp(-i/10)) + np.random.normal(0, 0.01)
        
        if not stopper.update(score):
            print(f"Stopped at evaluation {i}")
            break
    
    print(f"Final stats: {stopper.get_statistics()}")

Empirical Stopping Rule

Multi-Fidelity Optimization

Key Fidelity Dimensions:

1. Data Subsampling

Train on 10% of data initially, promote good performers to 100%
Works well when learning curves are monotonic (more data → better performance)
Typical speedup: 5-20x

2. Epoch/Iteration Reduction

Train for 10 epochs initially, extend for promising configurations
Works for iterative learners (neural networks, gradient boosting)
Typical speedup: 3-10x

3. Model Simplification

Evaluate with fewer trees, smaller networks, simpler features first
Works when model complexity scales with compute
Typical speedup: 2-5x

4. Cross-Validation Reduction

Use 2-fold CV initially, 5-fold for finalists
Reduces evaluation cost by 60%
Typical speedup: 2-3x

multi_fidelity_example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from typing import Dict, Any, Tuple
 
class MultiFidelityOptimizer:
    """
    Multi-fidelity hyperparameter optimization using data subsampling.
    
    Evaluates configurations on progressively larger data subsets,
    eliminating poorly performing configurations early.
    """
    
    def __init__(
        self,
        fidelity_levels: list = [0.1, 0.3, 0.5, 1.0],
        survival_rate: float = 0.5,
        cv_folds: int = 3,
    ):
        """
        Args:
            fidelity_levels: Data fraction at each stage (increasing)
            survival_rate: Fraction of configs that survive each round
            cv_folds: Cross-validation folds for evaluation
        """
        self.fidelity_levels = fidelity_levels
        self.survival_rate = survival_rate
        self.cv_folds = cv_folds
        
    def _subsample(self, X, y, fraction: float):
        """Subsample data to given fraction."""
        n_samples = int(len(X) * fraction)
        indices = np.random.choice(len(X), n_samples, replace=False)
        return X[indices], y[indices]
    
    def _evaluate_config(
        self,
        config: Dict[str, Any],
        X, y,
        fidelity: float
    ) -> float:
        """Evaluate a configuration at given fidelity level."""
        X_sub, y_sub = self._subsample(X, y, fidelity)
        
        model = RandomForestClassifier(**config, random_state=42)
        scores = cross_val_score(model, X_sub, y_sub, cv=self.cv_folds)
        
        return scores.mean()
    
    def optimize(
        self,
        configs: list,
        X, y,
    ) -> Tuple[Dict[str, Any], float]:
        """
        Run multi-fidelity optimization.
        
        Args:
            configs: List of configuration dicts
            X: Feature matrix
            y: Target vector
            
        Returns:
            Tuple of (best_config, best_score)
        """
        active_configs = configs.copy()
        
        for fidelity in self.fidelity_levels:
            n_survive = max(1, int(len(active_configs) * self.survival_rate))
            
            print(f"
Fidelity {fidelity:.0%}: Evaluating {len(active_configs)} configs")
            
            # Evaluate all active configs
            results = []
            for config in active_configs:
                score = self._evaluate_config(config, X, y, fidelity)
                results.append((score, config))
                
            # Sort and keep top performers
            results.sort(key=lambda x: x[0], reverse=True)
            
            # Report top score at this fidelity
            print(f"  Top score at {fidelity:.0%} fidelity: {results[0][0]:.4f}")
            
            # Keep survivors for next round
            active_configs = [cfg for _, cfg in results[:n_survive]]
            
        # Final result is best config at full fidelity
        return active_configs[0], results[0][0]
 
 
# Example: Hyperband implementation sketch
class Hyperband:
    """
    Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.
    
    Combines successive halving with multiple brackets to handle
    the exploration-exploitation tradeoff in budget allocation.
    """
    
    def __init__(
        self,
        max_budget: int,
        reduction_factor: float = 3,
    ):
        self.max_budget = max_budget  # R in paper
        self.eta = reduction_factor    # η in paper
        
        # Number of successive halving brackets
        self.s_max = int(np.log(max_budget) / np.log(reduction_factor))
        self.B = (self.s_max + 1) * max_budget  # Total budget
        
    def get_brackets(self) -> list:
        """
        Generate Hyperband brackets.
        
        Each bracket represents a successive halving run with
        different initial configurations and budgets.
        """
        brackets = []
        
        for s in range(self.s_max, -1, -1):
            # Number of configurations in this bracket
            n = int(np.ceil((self.B / self.max_budget) * 
                           (self.eta ** s) / (s + 1)))
            # Minimum budget per config in this bracket
            r = self.max_budget * (self.eta ** (-s))
            
            bracket = {
                'bracket_id': self.s_max - s,
                'initial_configs': n,
                'min_budget': r,
                'num_rounds': s + 1,
            }
            brackets.append(bracket)
            
        return brackets
    
    def print_schedule(self):
        """Print Hyperband schedule for visualization."""
        print(f"Hyperband Schedule (max_budget={self.max_budget}, η={self.eta})")
        print(f"Total budget per run: {self.B}")
        print()
        
        for bracket in self.get_brackets():
            print(f"Bracket {bracket['bracket_id']}:")
            print(f"  Initial configs: {bracket['initial_configs']}")
            print(f"  Minimum budget: {bracket['min_budget']:.1f}")
            print(f"  Rounds: {bracket['num_rounds']}")
            
            # Show successive halving within bracket
            n = bracket['initial_configs']
            r = bracket['min_budget']
            for round_idx in range(bracket['num_rounds']):
                print(f"    Round {round_idx}: {int(n)} configs × {int(r)} budget")
                n = n / self.eta
                r = r * self.eta
            print()
 
 
# Demo
if __name__ == "__main__":
    hb = Hyperband(max_budget=81, reduction_factor=3)
    hb.print_schedule()

When Multi-Fidelity Works Best

Cost Optimization Strategies

For cloud-based AutoML, monetary cost becomes a primary concern. Cost optimization requires understanding cloud pricing models and tailoring AutoML strategies accordingly.

Cloud Cost Optimization Tactics

•Spot/Preemptible Instances — Use spot instances for 60-90% cost savings. AutoML is naturally spot-friendly because interrupted evaluations can be retried. Configure checkpointing to minimize waste from preemption.
•Right-Sizing Instances — Match instance type to workload. CPU instances for tree-based models, GPU for neural networks. Avoid over-provisioning memory or compute.
•Reserved Capacity for Predictable Workloads — If running regular AutoML pipelines, reserved instances can reduce costs by 30-60% compared to on-demand.
•Parallel Scaling Efficiency — Diminishing returns on parallelization beyond ~10-20 concurrent workers due to search algorithm overhead. Don't over-parallelize.
•Storage Optimization — AutoML generates many artifacts (models, logs, checkpoints). Use lifecycle policies to clean up after runs. Don't pay for dormant storage.
•Off-Peak Scheduling — Some cloud regions have lower demand (and prices) during off-peak hours. Schedule long runs during these windows when possible.

Cost Comparison: AutoML Deployment Strategies
Strategy	Hourly Cost (Example)	Reliability	Best For
On-Demand Instances	$1.00/CPU-hour	High	Critical production runs
Spot/Preemptible	$0.30/CPU-hour	Medium (interruptions)	Experimentation, long runs
Reserved (1-year)	$0.60/CPU-hour	High	Regular pipeline runs
Managed AutoML (Vertex, SageMaker)	$0.50-2.00/model-hour	Very High	Low operational overhead
Hybrid (Spot + On-Demand fallback)	$0.40/CPU-hour average	High	Cost-sensitive production

cost_aware_automl.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
"""
Cost-Aware AutoML Configuration Examples
"""
 
# AWS SageMaker Autopilot with cost controls
import sagemaker
from sagemaker.automl.automl import AutoML
 
automl = AutoML(
    role='SageMakerRole',
    target_attribute_name='target',
    output_path='s3://bucket/output',
    
    # Cost control: limit job duration
    max_runtime_per_training_job_in_seconds=600,
    total_job_runtime_in_seconds=7200,
    
    # Cost control: limit candidates
    max_candidates=50,
    
    # Use spot instances for training
    problem_type='BinaryClassification',
    job_objective={'MetricName': 'AUC'},
)
 
# Google Cloud Vertex AI with budget constraints
from google.cloud import aiplatform
 
aiplatform.init(project='my-project', location='us-central1')
 
# Create AutoML job with budget awareness
job = aiplatform.AutoMLTabularTrainingJob(
    display_name='cost-aware-automl',
    optimization_prediction_type='classification',
    optimization_objective='maximize-au-roc',
)
 
model = job.run(
    dataset=dataset,
    target_column='target',
    
    # Budget constraints
    budget_milli_node_hours=1000,  # 1 node-hour maximum
    
    # Disable neural architecture search to reduce cost
    disable_early_stopping=False,
)
 
# H2O AutoML with resource limits
import h2o
from h2o.automl import H2OAutoML
 
h2o.init(max_mem_size='8G')  # Limit memory
 
aml = H2OAutoML(
    max_runtime_secs=3600,         # 1 hour limit
    max_models=20,                  # Model limit
    
    # Exclude expensive algorithm families to control cost
    exclude_algos=['DeepLearning'],  # Skip neural networks
    
    # Use less cross-validation
    nfolds=3,                        # 3-fold instead of default 5
    
    # Early stopping
    stopping_metric='AUC',
    stopping_rounds=5,
    stopping_tolerance=0.001,
)
 
aml.train(x=features, y=target, training_frame=train)
 
 
def estimate_cloud_cost(
    cpu_hours: float,
    gpu_hours: float = 0,
    storage_gb_months: float = 10,
    instance_type: str = 'spot'
) -> dict:
    """
    Estimate cloud costs for AutoML run.
    
    Based on approximate AWS pricing (varies by region/time).
    """
    PRICES = {
        'cpu_ondemand': 0.17,      # per vCPU-hour
        'cpu_spot': 0.05,          # per vCPU-hour
        'cpu_reserved': 0.10,      # per vCPU-hour
        'gpu_ondemand': 3.06,      # per GPU-hour (V100)
        'gpu_spot': 0.90,          # per GPU-hour
        'storage': 0.023,          # per GB-month (S3)
    }
    
    if instance_type == 'spot':
        cpu_cost = cpu_hours * PRICES['cpu_spot']
        gpu_cost = gpu_hours * PRICES['gpu_spot']
    elif instance_type == 'reserved':
        cpu_cost = cpu_hours * PRICES['cpu_reserved']
        gpu_cost = gpu_hours * PRICES['gpu_ondemand'] * 0.6  # ~40% discount
    else:  # on-demand
        cpu_cost = cpu_hours * PRICES['cpu_ondemand']
        gpu_cost = gpu_hours * PRICES['gpu_ondemand']
    
    storage_cost = storage_gb_months * PRICES['storage']
    
    total = cpu_cost + gpu_cost + storage_cost
    
    return {
        'cpu_cost': round(cpu_cost, 2),
        'gpu_cost': round(gpu_cost, 2),
        'storage_cost': round(storage_cost, 2),
        'total_cost': round(total, 2),
        'instance_type': instance_type,
    }
 
# Example
print(estimate_cloud_cost(cpu_hours=100, gpu_hours=10, instance_type='spot'))
# {'cpu_cost': 5.0, 'gpu_cost': 9.0, 'storage_cost': 0.23, 'total_cost': 14.23, 'instance_type': 'spot'}

The Spot Instance Strategy

Portfolio Budget Allocation

Portfolio Allocation Principles:

2. Consider Marginal Returns Projects early in development have higher marginal returns from additional budget than mature projects near performance plateaus.

3. Diversify Risk Don't concentrate all budget on a single speculative project. Spread allocation to ensure some projects definitely succeed.

4. Reserve Contingency Maintain 10-20% budget reserve for unexpected opportunities or project extensions. AutoML often reveals promising directions that warrant further exploration.

Example Portfolio Budget Allocation
Project	Business Value	Current Stage	Marginal ROI	Budget %
Customer Churn	High ($5M/yr impact)	Development	High	30%
Fraud Detection	Critical (risk mitigation)	Refinement	Medium	25%
Pricing Optimization	High ($3M/yr impact)	Exploration	Very High	20%
Inventory Forecasting	Medium ($500K/yr impact)	Production	Low	10%
Customer Segmentation	Medium (strategic)	Development	High	10%
Contingency Reserve	—	—	—	5%

The 70/20/10 Rule

Summary: Mastering Resource Budgets

We've covered the comprehensive landscape of AutoML resource budgeting. Let's consolidate the essential principles:

Key Takeaways

•Diminishing returns are fundamental — Most performance gain occurs in early budget expenditure; extended search provides marginal improvement. The 80/20 rule applies.
•Set both total and per-model limits — Total limits prevent runaway costs; per-model limits prevent single configurations from consuming the entire budget.
•Implement early stopping at multiple levels — Model-level stopping prevents wasted training; search-level stopping prevents continued exploration after plateau.
•Multi-fidelity methods provide dramatic speedups — Evaluating on data subsets or reduced epochs before full evaluation enables exploring more configurations within budget.
•Cloud cost optimization is orthogonal to search optimization — Spot instances, right-sizing, and smart scheduling reduce costs without affecting model quality.
•Portfolio thinking applies to multi-project budgets — Allocate based on business impact, marginal returns, and risk diversification rather than equal distribution.

What's Next:

Page Complete

2 / 5