Loading learning content...
In the previous page, we explored algorithm selection: choosing the right learning algorithm for a dataset. But this tells only half the story. Every algorithm comes with hyperparameters—learning rates, regularization strengths, tree depths, network architectures—that dramatically affect performance.
Traditionally, practitioners first select an algorithm, then tune its hyperparameters. But this sequential approach has a fundamental flaw: the best algorithm with default hyperparameters may not be the best algorithm with optimized hyperparameters. A random forest with defaults might beat an SVM with defaults, but a well-tuned SVM might outperform both.
This realization led to the formulation of the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem. CASH treats algorithm selection and hyperparameter tuning as a single, unified optimization problem. It's the mathematical foundation of modern AutoML systems like Auto-sklearn, Auto-WEKA, and TPOT.
Understanding CASH is essential for anyone working with AutoML—it explains why these systems work, their computational challenges, and how to design better automated machine learning pipelines.
By the end of this page, you will understand the formal CASH problem definition, hierarchical configuration spaces, search strategies (Bayesian optimization, evolutionary algorithms, random search), the theoretical complexity of CASH, and practical implementations in production AutoML systems.
The CASH problem was formally defined in the foundational Auto-WEKA paper (Thornton et al., 2013). It unifies algorithm selection and hyperparameter optimization into a single search problem.
Formal Definition:
Given:
The CASH problem is to find:
A, λ = argmin_{A⁽ⁱ⁾ ∈ A, λ ∈ Λ⁽ⁱ⁾} L(A⁽ⁱ⁾λ, D{train}, D_{valid})**
In words: jointly find the algorithm A* and its hyperparameters λ* that minimize the validation loss when trained on D_train and evaluated on D_valid.
Key Insights:
Unified search space: The combined space is Λ = Λ⁽¹⁾ ∪ Λ⁽²⁾ ∪ ... ∪ Λ⁽ᵏ⁾, with algorithm choice as a categorical hyperparameter.
Hierarchical/conditional structure: Hyperparameters are conditional—SVM's kernel hyperparameters only matter if SVM is selected.
Black-box optimization: We can evaluate configurations but don't have gradients with respect to algorithm/hyperparameter choices.
Expensive evaluations: Each configuration requires training a model and measuring validation performance.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
import numpy as npfrom sklearn.model_selection import cross_val_scorefrom typing import Dict, Any, List, Tuple class CASHProblem: """ Formal representation of the CASH problem. Encapsulates the combined algorithm selection and hyperparameter optimization as a black-box optimization problem. """ def __init__(self, algorithms: Dict[str, Any], hyperparameter_spaces: Dict[str, Dict], X, y, cv=5, scoring='accuracy'): """ Parameters: algorithms: Dict mapping algorithm names to (estimator_class, default_params) hyperparameter_spaces: Dict mapping algorithm names to their param distributions X: Feature matrix y: Target vector cv: Cross-validation folds scoring: Scoring metric """ self.algorithms = algorithms self.hyperparameter_spaces = hyperparameter_spaces self.X = X self.y = y self.cv = cv self.scoring = scoring # Track evaluations self.evaluation_history = [] def evaluate_configuration(self, algorithm_name: str, hyperparameters: Dict[str, Any]) -> float: """ Evaluate a single configuration (algorithm + hyperparameters). This is the black-box objective function that CASH optimizers query repeatedly. Parameters: algorithm_name: Which algorithm to use hyperparameters: Hyperparameter settings for that algorithm Returns: Negative cross-validation score (we minimize) """ if algorithm_name not in self.algorithms: raise ValueError(f"Unknown algorithm: {algorithm_name}") estimator_class, default_params = self.algorithms[algorithm_name] # Merge default params with provided hyperparameters params = {**default_params, **hyperparameters} try: # Instantiate and evaluate estimator = estimator_class(**params) scores = cross_val_score(estimator, self.X, self.y, cv=self.cv, scoring=self.scoring) score = scores.mean() # Track evaluation self.evaluation_history.append({ 'algorithm': algorithm_name, 'hyperparameters': hyperparameters, 'score': score, 'std': scores.std() }) return -score # Negative because we minimize except Exception as e: # Invalid configuration - return worst possible score return 0.0 # Return 0 accuracy (worst) def get_configuration_space_size(self) -> Dict[str, int]: """ Estimate the size of the configuration space. Returns approximate number of configurations per algorithm. """ sizes = {} for algo_name, param_space in self.hyperparameter_spaces.items(): # Very rough estimate: product of parameter cardinalities size = 1 for param_name, param_dist in param_space.items(): if hasattr(param_dist, '__len__'): size *= len(param_dist) else: size *= 100 # Continuous parameter - discretize sizes[algo_name] = size return sizes def get_best_configuration(self) -> Tuple[str, Dict, float]: """ Get the best configuration found so far. """ if not self.evaluation_history: return None, None, None best = max(self.evaluation_history, key=lambda x: x['score']) return best['algorithm'], best['hyperparameters'], best['score']CASH is a mixed-categorical, hierarchical, black-box optimization problem with expensive evaluations. Standard optimization techniques (gradient descent, convex optimization) don't apply. We must use specialized algorithms designed for this structure.
A crucial aspect of CASH is the hierarchical or conditional structure of the configuration space. Different algorithms have different hyperparameters, and some hyperparameters only matter when certain choices are made.
Example: The Conditional Structure
Consider three algorithms: Random Forest, SVM, and Neural Network.
Configuration Space:
├── algorithm: {RandomForest, SVM, NeuralNetwork}
│
├── [If algorithm = RandomForest]
│ ├── n_estimators: [10, 500]
│ ├── max_depth: [1, 50]
│ └── min_samples_split: [2, 20]
│
├── [If algorithm = SVM]
│ ├── kernel: {linear, rbf, poly}
│ ├── C: [1e-3, 1e3] (log-scale)
│ ├── [If kernel = rbf]
│ │ └── gamma: [1e-4, 1e1]
│ └── [If kernel = poly]
│ ├── degree: [2, 5]
│ └── coef0: [0, 10]
│
└── [If algorithm = NeuralNetwork]
├── hidden_layers: [1, 5]
├── hidden_size: [16, 512]
├── activation: {relu, tanh, sigmoid}
└── learning_rate: [1e-5, 1e-1]
Notice how:
gamma only matters if kernel = rbfdegree only matters if kernel = polyalgorithm ≠ SVMThis conditional structure is essential—ignoring it leads to wasted search effort on irrelevant hyperparameters.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117
from ConfigSpace import ConfigurationSpace, Configurationfrom ConfigSpace import CategoricalHyperparameter, UniformIntegerHyperparameterfrom ConfigSpace import UniformFloatHyperparameter, EqualsCondition, InCondition def create_cash_configuration_space(): """ Create a hierarchical configuration space for CASH. Uses ConfigSpace library (used by Auto-sklearn) to properly represent conditional hyperparameters. """ cs = ConfigurationSpace() # Root: Algorithm choice algorithm = CategoricalHyperparameter( "algorithm", ["RandomForest", "SVM", "NeuralNetwork", "GradientBoosting"] ) cs.add_hyperparameter(algorithm) # ================= Random Forest ================= rf_n_estimators = UniformIntegerHyperparameter( "rf_n_estimators", lower=10, upper=500, default_value=100 ) rf_max_depth = UniformIntegerHyperparameter( "rf_max_depth", lower=1, upper=50, default_value=10 ) rf_min_samples_split = UniformIntegerHyperparameter( "rf_min_samples_split", lower=2, upper=20, default_value=2 ) cs.add_hyperparameters([rf_n_estimators, rf_max_depth, rf_min_samples_split]) # Conditionals: These only matter if algorithm = RandomForest cs.add_condition(EqualsCondition(rf_n_estimators, algorithm, "RandomForest")) cs.add_condition(EqualsCondition(rf_max_depth, algorithm, "RandomForest")) cs.add_condition(EqualsCondition(rf_min_samples_split, algorithm, "RandomForest")) # ================= SVM ================= svm_kernel = CategoricalHyperparameter( "svm_kernel", ["linear", "rbf", "poly"], default_value="rbf" ) svm_C = UniformFloatHyperparameter( "svm_C", lower=1e-3, upper=1e3, log=True, default_value=1.0 ) svm_gamma = UniformFloatHyperparameter( "svm_gamma", lower=1e-4, upper=1e1, log=True, default_value=0.1 ) svm_degree = UniformIntegerHyperparameter( "svm_degree", lower=2, upper=5, default_value=3 ) svm_coef0 = UniformFloatHyperparameter( "svm_coef0", lower=0.0, upper=10.0, default_value=0.0 ) cs.add_hyperparameters([svm_kernel, svm_C, svm_gamma, svm_degree, svm_coef0]) # SVM hyperparams conditional on algorithm = SVM cs.add_condition(EqualsCondition(svm_kernel, algorithm, "SVM")) cs.add_condition(EqualsCondition(svm_C, algorithm, "SVM")) # gamma only matters for RBF kernel cs.add_condition(EqualsCondition(svm_gamma, svm_kernel, "rbf")) # degree and coef0 only matter for polynomial kernel cs.add_condition(EqualsCondition(svm_degree, svm_kernel, "poly")) cs.add_condition(EqualsCondition(svm_coef0, svm_kernel, "poly")) # ================= Neural Network ================= nn_hidden_layers = UniformIntegerHyperparameter( "nn_hidden_layers", lower=1, upper=5, default_value=2 ) nn_hidden_size = UniformIntegerHyperparameter( "nn_hidden_size", lower=16, upper=512, default_value=64 ) nn_activation = CategoricalHyperparameter( "nn_activation", ["relu", "tanh", "sigmoid"], default_value="relu" ) nn_learning_rate = UniformFloatHyperparameter( "nn_learning_rate", lower=1e-5, upper=1e-1, log=True, default_value=1e-3 ) cs.add_hyperparameters([nn_hidden_layers, nn_hidden_size, nn_activation, nn_learning_rate]) # Neural network params conditional on algorithm = NeuralNetwork for hp in [nn_hidden_layers, nn_hidden_size, nn_activation, nn_learning_rate]: cs.add_condition(EqualsCondition(hp, algorithm, "NeuralNetwork")) # ================= Gradient Boosting ================= gb_n_estimators = UniformIntegerHyperparameter( "gb_n_estimators", lower=50, upper=500, default_value=100 ) gb_learning_rate = UniformFloatHyperparameter( "gb_learning_rate", lower=0.01, upper=1.0, log=True, default_value=0.1 ) gb_max_depth = UniformIntegerHyperparameter( "gb_max_depth", lower=1, upper=15, default_value=3 ) cs.add_hyperparameters([gb_n_estimators, gb_learning_rate, gb_max_depth]) for hp in [gb_n_estimators, gb_learning_rate, gb_max_depth]: cs.add_condition(EqualsCondition(hp, algorithm, "GradientBoosting")) return cs # Usagecs = create_cash_configuration_space()print(f"Configuration space has {len(cs)} hyperparameters")print(f"Conditionals ensure only relevant params are active") # Sample valid configurationsfor i in range(3): config = cs.sample_configuration() print(f"Sampled: {dict(config)}")Why Hierarchical Structure Matters:
Reduced effective dimensionality: Instead of searching N = Σᵢ|Λ⁽ⁱ⁾| hyperparameters simultaneously, we only search over the relevant subset for each algorithm.
Valid configurations: Without conditionals, we could generate meaningless configurations (e.g., setting SVM's gamma when using Random Forest).
Efficient surrogate modeling: Bayesian optimization can build better surrogate models when hierarchical structure is respected.
Interpretable results: Final configurations only show relevant hyperparameters.
The ConfigSpace Library:
ConfigSpace is the de facto standard for defining hierarchical configuration spaces in Python AutoML. It's used by Auto-sklearn, SMAC, and other major frameworks. Key concepts:
Given the CASH problem's complexity—hierarchical, mixed-type, expensive-to-evaluate—specialized search strategies are needed. Let's examine the major approaches:
1. Random Search
Surprisingly effective baseline:
Random search has theoretical guarantees: with probability 1-δ, it finds a configuration in the top ε fraction after O(1/(ε·δ)) evaluations. It's embarrassingly parallel and doesn't suffer from poor surrogate models.
When random search works well:
2. Grid Search (Don't Use for CASH)
Grid search is exponential in dimensionality and fails catastrophically in high-dimensional conditional spaces. It's mentioned only to emphasize: don't use grid search for CASH.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
import numpy as npfrom typing import Callable, Dict, Any, List, Tuplefrom concurrent.futures import ProcessPoolExecutor class RandomSearchCASH: """ Random search for the CASH problem. Despite its simplicity, random search is a strong baseline that's hard to beat with small evaluation budgets. """ def __init__(self, config_space, n_workers=1, random_state=42): """ Parameters: config_space: ConfigSpace object defining the search space n_workers: Number of parallel workers random_state: Random seed for reproducibility """ self.config_space = config_space self.n_workers = n_workers self.rng = np.random.RandomState(random_state) def optimize(self, objective_func: Callable, n_evaluations: int) -> Tuple[Dict, float]: """ Run random search. Parameters: objective_func: Function that takes a config dict, returns loss n_evaluations: Number of configurations to try Returns: (best_config, best_loss) """ # Sample all configurations upfront configs = [self.config_space.sample_configuration() for _ in range(n_evaluations)] # Evaluate (potentially in parallel) if self.n_workers == 1: results = [objective_func(dict(config)) for config in configs] else: with ProcessPoolExecutor(max_workers=self.n_workers) as executor: results = list(executor.map( lambda c: objective_func(dict(c)), configs )) # Find best best_idx = np.argmin(results) return dict(configs[best_idx]), results[best_idx] def optimize_with_budget(self, objective_func: Callable, time_budget_seconds: float, min_evaluations: int = 10) -> Tuple[Dict, float]: """ Run random search with a time budget instead of fixed evaluations. Useful for fair comparison with adaptive methods. """ import time best_config = None best_loss = float('inf') n_evaluated = 0 start_time = time.time() while True: # Check budget elapsed = time.time() - start_time if elapsed >= time_budget_seconds and n_evaluated >= min_evaluations: break # Sample and evaluate config = self.config_space.sample_configuration() loss = objective_func(dict(config)) n_evaluated += 1 if loss < best_loss: best_loss = loss best_config = dict(config) print(f"Random search: {n_evaluated} evaluations in {elapsed:.1f}s") return best_config, best_loss3. Bayesian Optimization (SMAC)
Bayesian optimization is the dominant approach for CASH. It builds a surrogate model of the objective function and uses it to guide search:
SMAC (Sequential Model-based Algorithm Configuration) is the state-of-the-art Bayesian optimizer for CASH. Key features:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
from smac import HyperparameterOptimizationFacade, Scenariofrom ConfigSpace import ConfigurationSpaceimport numpy as np def run_smac_optimization(config_space: ConfigurationSpace, objective_func, n_trials: int = 100, n_workers: int = 1, seed: int = 42): """ Run SMAC (Sequential Model-based Algorithm Configuration) for CASH. SMAC is the state-of-the-art Bayesian optimizer for hierarchical, conditional configuration spaces. Parameters: config_space: ConfigSpace defining algorithms and hyperparameters objective_func: Function mapping config dict to loss (minimize) n_trials: Number of configurations to evaluate n_workers: Parallel workers seed: Random seed Returns: Best configuration found """ # Define the scenario scenario = Scenario( configspace=config_space, n_trials=n_trials, # Total evaluations n_workers=n_workers, # Parallelism seed=seed, deterministic=True ) # Wrap objective for SMAC interface def smac_objective(config, seed=None): """SMAC-compatible objective wrapper.""" return objective_func(dict(config)) # Create SMAC optimizer smac = HyperparameterOptimizationFacade( scenario=scenario, target_function=smac_objective, overwrite=True # Overwrite previous runs ) # Run optimization incumbent = smac.optimize() return dict(incumbent) # Example usage with Auto-sklearn style CASH problemdef example_cash_with_smac(): """Demonstrate SMAC on a CASH problem.""" from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier from sklearn.svm import SVC # Generate dataset X, y = make_classification(n_samples=500, n_features=20, n_informative=10, random_state=42) # Build configuration space (simplified) cs = create_cash_configuration_space() # From previous example # Define objective def objective(config): algo = config['algorithm'] if algo == 'RandomForest': clf = RandomForestClassifier( n_estimators=config.get('rf_n_estimators', 100), max_depth=config.get('rf_max_depth', None), random_state=42 ) elif algo == 'GradientBoosting': clf = GradientBoostingClassifier( n_estimators=config.get('gb_n_estimators', 100), learning_rate=config.get('gb_learning_rate', 0.1), max_depth=config.get('gb_max_depth', 3), random_state=42 ) elif algo == 'SVM': clf = SVC( kernel=config.get('svm_kernel', 'rbf'), C=config.get('svm_C', 1.0), gamma=config.get('svm_gamma', 'scale'), random_state=42 ) else: raise ValueError(f"Unknown algorithm: {algo}") # Return negative accuracy (since we minimize) scores = cross_val_score(clf, X, y, cv=5, scoring='accuracy') return -scores.mean() # Run SMAC best_config = run_smac_optimization(cs, objective, n_trials=50) print(f"Best configuration: {best_config}") print(f"Best score: {-objective(best_config):.4f}")4. Evolutionary Algorithms (TPOT)
Evolutionary algorithms treat configurations as individuals in a population:
TPOT uses genetic programming to evolve not just hyperparameters but entire ML pipelines (feature selection → preprocessing → model).
5. Hyperband and Successive Halving
These are multi-fidelity approaches that evaluate configurations at multiple resource levels:
This is especially effective when early performance correlates with final performance—common in deep learning.
Comparison of Strategies:
| Strategy | Mechanism | Strengths | Weaknesses |
|---|---|---|---|
| Random Search | Uniform sampling | Simple, parallelizable, robust | No learning, inefficient for peaked landscapes |
| SMAC (Bayesian) | Surrogate + acquisition | Sample efficient, handles conditionals | Sequential, complex implementation |
| TPE | Density ratio modeling | Handles conditionals, simpler than GP | Less sample efficient than SMAC |
| Evolutionary | Population evolution | Diverse solutions, flexible pipelines | Many hyperparameters, slow convergence |
| Hyperband | Multi-fidelity elimination | Fast, efficient resource use | Requires meaningful fidelity proxy |
Understanding the theoretical complexity of CASH helps set realistic expectations and guides design decisions.
Configuration Space Size
For a CASH problem with:
The total configuration space size is:
|Λ| = Σᵢ₌₁ᵏ Πⱼ₌₁ⁿⁱ mᵢⱼ
For example:
|Λ| = 5 × 10⁵ = 500,000 configurations
With continuous hyperparameters, the space is infinite. Even with discretization, it's astronomically large.
Computational Complexity
Each configuration evaluation requires training a model. If training takes time T(n, d) for n samples and d features:
This motivates:
A single configuration evaluation might take seconds to hours depending on dataset size and algorithm. With 1000 configurations and 1 minute per evaluation, CASH takes ~17 hours. This is why sample efficiency is crucial—every evaluation counts.
Regret Bounds and Sample Complexity
For Bayesian optimization in CASH, theoretical results show:
However, these bounds assume:
Practically, convergence is problem-dependent. Meta-learning and warm-starting significantly accelerate convergence on structured problem distributions.
Hardness Results
CASH includes algorithm selection as a special case. Algorithm selection is at least as hard as zero-one integer programming in the worst case (NP-hard). No polynomial-time algorithm can solve general CASH problems optimally.
Implications:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
import numpy as npimport timefrom typing import Dict def analyze_cash_complexity(algorithms: Dict, hyperparameter_ranges: Dict, dataset_size: int, cv_folds: int = 5): """ Analyze the computational complexity of a CASH problem. Provides estimates of: - Configuration space size - Estimated time per evaluation - Total search time for different budgets """ # Configuration space size total_configs = 0 space_details = {} for algo_name, param_ranges in hyperparameter_ranges.items(): algo_configs = 1 for param_name, param_range in param_ranges.items(): if isinstance(param_range, list): algo_configs *= len(param_range) elif isinstance(param_range, tuple) and len(param_range) == 2: # Continuous: estimate 100 effective values algo_configs *= 100 space_details[algo_name] = algo_configs total_configs += algo_configs print("=== CASH Complexity Analysis ===") print(f"Number of algorithms: {len(algorithms)}") print(f"Configurations per algorithm:") for algo, count in space_details.items(): print(f" {algo}: ~{count:,} configurations") print(f"Total configuration space: ~{total_configs:,} configurations") # Estimate time per evaluation # Very rough: assume O(n * d) training, where d ~ n/10 d_approx = max(10, dataset_size // 10) time_estimates = { 'RandomForest': dataset_size * d_approx * 100 / 1e8, # ~100 trees 'SVM': (dataset_size ** 2) * d_approx / 1e9, # Kernel computation 'GradientBoosting': dataset_size * d_approx * 100 / 1e8, 'NeuralNetwork': dataset_size * d_approx * 1000 / 1e8, # ~1000 iterations } avg_time = np.mean([time_estimates.get(algo, 1.0) for algo in algorithms.keys()]) print(f"=== Time Estimates (dataset size={dataset_size}) ===") print(f"Estimated time per evaluation: ~{avg_time * cv_folds:.1f} seconds") print(f" (with {cv_folds}-fold CV)") # Total time for different evaluation budgets budgets = [50, 100, 500, 1000] print(f"=== Total Search Time Estimates ===") for budget in budgets: total_time = budget * avg_time * cv_folds hours = total_time / 3600 print(f" {budget} evaluations: ~{hours:.1f} hours") # Coverage print(f"=== Search Coverage ===") for budget in budgets: coverage = budget / total_configs * 100 print(f" {budget} evaluations: {coverage:.4f}% of space explored") return { 'total_configs': total_configs, 'avg_time_per_eval': avg_time * cv_folds, 'space_details': space_details } # Example usagealgorithms = { 'RandomForest': None, 'SVM': None, 'GradientBoosting': None, 'NeuralNetwork': None} hyperparameter_ranges = { 'RandomForest': { 'n_estimators': (10, 500), 'max_depth': (1, 50), 'min_samples_split': (2, 20) }, 'SVM': { 'kernel': ['linear', 'rbf', 'poly'], 'C': (1e-3, 1e3), 'gamma': (1e-4, 1e1) }, 'GradientBoosting': { 'n_estimators': (50, 500), 'learning_rate': (0.01, 1.0), 'max_depth': (1, 15) }, 'NeuralNetwork': { 'hidden_layers': (1, 5), 'hidden_size': (16, 512), 'learning_rate': (1e-5, 1e-1), 'activation': ['relu', 'tanh', 'sigmoid'] }} analysis = analyze_cash_complexity(algorithms, hyperparameter_ranges, dataset_size=10000)Let's examine how leading AutoML systems implement CASH:
Auto-sklearn 2.0
Auto-sklearn is the reference implementation for CASH-based AutoML:
Key architectural decisions:
Auto-WEKA
The original CASH paper's implementation:
Auto-WEKA demonstrated CASH's viability but is less maintained than Auto-sklearn.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
import autosklearn.classificationfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitimport numpy as np def run_autosklearn_cash(time_budget_minutes=10): """ Run Auto-sklearn, demonstrating CASH in action. Auto-sklearn internally solves the CASH problem using SMAC with meta-learning warm-starting. """ # Load dataset X, y = load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) print(f"Dataset: {X_train.shape[0]} training samples, " f"{X_train.shape[1]} features") print(f"Time budget: {time_budget_minutes} minutes") # Initialize Auto-sklearn # This sets up the CASH problem internally automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=time_budget_minutes * 60, # Total seconds per_run_time_limit=60, # Max seconds per configuration memory_limit=4096, # MB # CASH settings include={ 'classifier': [ 'random_forest', 'gradient_boosting', 'adaboost', 'extra_trees', 'libsvm_svc', 'liblinear_svc', 'k_nearest_neighbors', 'decision_tree', ] }, # Meta-learning for warm-starting initial_configurations_via_metalearning=25, # Ensembling ensemble_kwargs={'ensemble_size': 50}, # Reproducibility seed=42, n_jobs=1 ) # Fit - this runs SMAC to solve the CASH problem print("Starting Auto-sklearn (CASH optimization)...") automl.fit(X_train, y_train) # Results print("=== CASH Results ===") print(f"Test accuracy: {automl.score(X_test, y_test):.4f}") # Show examined configurations print(f"Configurations examined: {len(automl.cv_results_['mean_test_score'])}") # Show best configuration (the CASH solution) print("Best configuration found:") print(automl.show_models()) # Leaderboard print("Top 5 configurations:") leaderboard = automl.leaderboard(detailed=True, ensemble_size=False)[['rank', 'type', 'cost']] print(leaderboard.head()) return automl # Run CASH with Auto-sklearn# automl = run_autosklearn_cash(time_budget_minutes=10)TPOT (Genetic Programming CASH)
TPOT takes a different approach using evolutionary algorithms:
H2O AutoML
H2O emphasizes robustness over selection:
This is a portfolio approach that trades compute efficiency for reliability.
Design Patterns Across Systems:
When using or designing CASH systems, consider these practical guidelines:
Designing Configuration Spaces:
Budget Allocation:
Evaluation Strategies:
In practice, a well-tuned gradient boosting model (XGBoost, LightGBM, CatBoost) with 50-100 random hyperparameter trials often matches or beats elaborate CASH optimization for tabular data. CASH's biggest wins come on diverse problem portfolios where no single algorithm dominates.
We've covered the Combined Algorithm Selection and Hyperparameter optimization problem—the mathematical foundation of modern AutoML. Let's consolidate the key takeaways:
What's Next:
Meta-learning is a crucial accelerator for CASH—it allows systems to transfer knowledge from past experiments to new datasets. The next page explores meta-learning for model selection in depth, showing how systems learn to predict algorithm performance and warm-start optimization effectively.
You now understand the CASH problem—how algorithm selection and hyperparameter optimization are unified, the structure of hierarchical configuration spaces, and how leading AutoML systems implement CASH in practice. This knowledge equips you to use, evaluate, and extend AutoML tools effectively.