Loading content...
Not all hyperparameters are created equal—some only matter when others take specific values. Consider choosing between Adam and SGD as your optimizer. If you choose Adam, its specific $\beta_1, \beta_2$ parameters become relevant. If you choose SGD, those parameters don't exist, but momentum and Nesterov acceleration become relevant instead.
These are conditional hyperparameters (also called hierarchical or nested hyperparameters). They create tree-like or DAG-like structures in the search space, where different branches have different hyperparameters. Properly handling conditional dependencies is critical for efficient HPO—ignoring them wastes budget on invalid or meaningless configurations.
By the end of this page, you will: • Understand the formal structure of conditional hyperparameter spaces • Recognize common patterns of conditional dependencies • Know how to encode conditional spaces for different HPO algorithms • Implement conditional logic in major HPO frameworks
A conditional hyperparameter is one whose existence or relevance depends on the value of one or more parent hyperparameters.
Formal Definition:
Let $\lambda_p$ be a parent hyperparameter and $\lambda_c$ be a child hyperparameter. We say $\lambda_c$ is conditional on $\lambda_p$ if:
$$\lambda_c \in \Lambda_c \text{ is defined } \iff \lambda_p \in P_c \subseteq \Lambda_p$$
where $P_c$ is the activation set—the values of the parent that activate the child.
Example: For an SVM:
This creates a hierarchical structure where the parent choice branches into different subspaces.
Configuration Space as a DAG
Conditional hyperparameters form a Directed Acyclic Graph (DAG) where:
The DAG structure ensures no circular dependencies: a hyperparameter cannot be conditional on itself or on a hyperparameter that depends on it.
Valid Configurations:
A configuration $\lambda = (\lambda_1, ..., \lambda_d)$ is valid if and only if:
Conditional dependencies aren't just about what's valid—they're about what's meaningful. Setting γ for a linear SVM is technically possible but completely meaningless. HPO should understand this to avoid wasting evaluations on configurations where inactive hyperparameters are varied.
Conditional dependencies appear throughout machine learning. Here are the most common patterns:
Pattern: A categorical choice of algorithm, where each algorithm has its own hyperparameters.
Example: Optimizer Selection
optimizer ∈ {SGD, Adam, AdamW}
│
├── if SGD:
│ ├── momentum ∈ [0, 0.99]
│ ├── nesterov ∈ {True, False}
│ └── dampening ∈ [0, 1]
│
├── if Adam:
│ ├── beta1 ∈ [0.8, 0.99]
│ └── beta2 ∈ [0.9, 0.9999]
│
└── if AdamW:
├── beta1 ∈ [0.8, 0.99]
├── beta2 ∈ [0.9, 0.9999]
└── weight_decay ∈ [1e-6, 0.1] # Decoupled
Key observation: Different algorithms have completely different hyperparameters. Sharing hyperparameters across algorithms (e.g., 'momentum' for Adam) would be invalid.
Conditional hyperparameters create several technical challenges for HPO algorithms:
The Comparison Problem in Detail
Consider two configurations:
$$\lambda_1 = (\text{optimizer=SGD}, \text{momentum}=0.9, \text{lr}=0.01)$$ $$\lambda_2 = (\text{optimizer=Adam}, \beta_1=0.9, \text{lr}=0.001)$$
What's the 'distance' between $\lambda_1$ and $\lambda_2$? Standard approaches fail:
Solution approaches:
SMAC (Sequential Model-based Algorithm Configuration) was specifically designed for conditional spaces. It uses random forests that naturally handle missing values (inactive hyperparameters) and can split on parent hyperparameters before considering children.
Let's examine how conditional hyperparameters are implemented in practice, with working examples in major frameworks.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336
"""Conditional Hyperparameters: Implementation Examples This module demonstrates how to define and work with conditionalhyperparameter spaces across different scenarios."""import numpy as npfrom dataclasses import dataclass, fieldfrom typing import Dict, Any, List, Optional, Union, Callablefrom enum import Enum class ParameterStatus(Enum): """Status of a conditional hyperparameter.""" ACTIVE = "active" INACTIVE = "inactive" @dataclassclass ConditionalParameter: """ A hyperparameter with optional conditional activation. Attributes: name: Parameter identifier param_type: 'continuous', 'integer', 'categorical' bounds: For continuous/integer, (low, high). For categorical, list of choices. parent: Name of parent parameter (None if unconditional) parent_values: Values of parent that activate this parameter log_scale: Whether to use log scale (for continuous) default: Default value when active """ name: str param_type: str bounds: Union[tuple, List] parent: Optional[str] = None parent_values: Optional[List] = None log_scale: bool = False default: Any = None def is_conditional(self) -> bool: """Check if this parameter is conditional.""" return self.parent is not None def is_active(self, config: Dict[str, Any]) -> bool: """Check if this parameter is active given a (partial) configuration.""" if not self.is_conditional(): return True if self.parent not in config: return False # Parent not yet set return config[self.parent] in self.parent_values class ConditionalConfigurationSpace: """ A configuration space with conditional dependencies. Supports: - Defining unconditional and conditional parameters - Validating configurations - Sampling valid configurations - Computing distance between configurations """ def __init__(self): self.parameters: Dict[str, ConditionalParameter] = {} self._dependency_order: List[str] = [] # Topological order def add_parameter(self, param: ConditionalParameter): """Add a parameter to the space.""" # Validate parent exists if param.parent is not None and param.parent not in self.parameters: raise ValueError(f"Parent {param.parent} not found for {param.name}") self.parameters[param.name] = param self._update_dependency_order() def _update_dependency_order(self): """Compute topological order of parameters for sampling.""" # Simple implementation: unconditional first, then by depth unconditional = [p for p in self.parameters.values() if not p.is_conditional()] conditional = [p for p in self.parameters.values() if p.is_conditional()] self._dependency_order = [p.name for p in unconditional] # Add conditional params in dependency order remaining = conditional.copy() while remaining: for param in remaining.copy(): if param.parent in self._dependency_order: self._dependency_order.append(param.name) remaining.remove(param) def sample(self) -> Dict[str, Any]: """ Sample a valid configuration respecting conditional dependencies. Samples parameters in topological order, only sampling child parameters if their parent activates them. """ config = {} for param_name in self._dependency_order: param = self.parameters[param_name] # Check if this parameter should be active if not param.is_active(config): config[param_name] = None # Mark as inactive continue # Sample an active value config[param_name] = self._sample_parameter(param) return config def _sample_parameter(self, param: ConditionalParameter) -> Any: """Sample a single parameter value.""" if param.param_type == 'categorical': return np.random.choice(param.bounds) elif param.param_type == 'continuous': low, high = param.bounds if param.log_scale: return np.exp(np.random.uniform(np.log(low), np.log(high))) else: return np.random.uniform(low, high) elif param.param_type == 'integer': low, high = param.bounds if param.log_scale: log_val = np.random.uniform(np.log(low), np.log(high)) return int(np.round(np.exp(log_val))) else: return np.random.randint(low, high + 1) def validate(self, config: Dict[str, Any]) -> bool: """ Validate a configuration. Checks: 1. All unconditional parameters are present and valid 2. Active conditional parameters have valid values 3. Inactive parameters are None or absent """ for param_name, param in self.parameters.items(): is_active = param.is_active(config) value = config.get(param_name) if is_active: if value is None: return False # Active param must have value if not self._value_valid(param, value): return False else: # Inactive: value should be None or absent if value is not None: return False return True def _value_valid(self, param: ConditionalParameter, value: Any) -> bool: """Check if a value is valid for a parameter.""" if param.param_type == 'categorical': return value in param.bounds elif param.param_type in ('continuous', 'integer'): low, high = param.bounds return low <= value <= high return True def get_active_parameters(self, config: Dict[str, Any]) -> List[str]: """Get list of active parameter names for a configuration.""" return [name for name, param in self.parameters.items() if param.is_active(config)] def create_optimizer_space() -> ConditionalConfigurationSpace: """ Create a configuration space for optimizer selection. Demonstrates the algorithm-choice conditional pattern. """ space = ConditionalConfigurationSpace() # Root: optimizer choice space.add_parameter(ConditionalParameter( name='optimizer', param_type='categorical', bounds=['sgd', 'adam', 'adamw'], )) # Learning rate (shared across all optimizers) space.add_parameter(ConditionalParameter( name='learning_rate', param_type='continuous', bounds=(1e-5, 0.1), log_scale=True, )) # SGD-specific parameters space.add_parameter(ConditionalParameter( name='momentum', param_type='continuous', bounds=(0.0, 0.99), parent='optimizer', parent_values=['sgd'], )) space.add_parameter(ConditionalParameter( name='nesterov', param_type='categorical', bounds=[True, False], parent='optimizer', parent_values=['sgd'], )) # Adam-specific parameters space.add_parameter(ConditionalParameter( name='beta1', param_type='continuous', bounds=(0.8, 0.99), parent='optimizer', parent_values=['adam', 'adamw'], # Shared by both Adam variants )) space.add_parameter(ConditionalParameter( name='beta2', param_type='continuous', bounds=(0.9, 0.9999), parent='optimizer', parent_values=['adam', 'adamw'], )) # AdamW-specific: explicit weight decay space.add_parameter(ConditionalParameter( name='weight_decay', param_type='continuous', bounds=(1e-6, 0.1), log_scale=True, parent='optimizer', parent_values=['adamw'], )) return space def create_neural_network_space() -> ConditionalConfigurationSpace: """ Create a configuration space for neural network architecture. Demonstrates depth-dependent conditional parameters. """ space = ConditionalConfigurationSpace() # Number of hidden layers space.add_parameter(ConditionalParameter( name='num_layers', param_type='integer', bounds=(1, 4), )) # Layer 1 (always exists) space.add_parameter(ConditionalParameter( name='layer1_units', param_type='integer', bounds=(32, 512), log_scale=True, )) # Layer 2 (conditional on num_layers >= 2) space.add_parameter(ConditionalParameter( name='layer2_units', param_type='integer', bounds=(32, 512), log_scale=True, parent='num_layers', parent_values=[2, 3, 4], )) # Layer 3 (conditional on num_layers >= 3) space.add_parameter(ConditionalParameter( name='layer3_units', param_type='integer', bounds=(32, 512), log_scale=True, parent='num_layers', parent_values=[3, 4], )) # Layer 4 (conditional on num_layers == 4) space.add_parameter(ConditionalParameter( name='layer4_units', param_type='integer', bounds=(32, 512), log_scale=True, parent='num_layers', parent_values=[4], )) return space # Example usageif __name__ == "__main__": print("Optimizer Configuration Space") print("=" * 50) opt_space = create_optimizer_space() for _ in range(5): config = opt_space.sample() active = opt_space.get_active_parameters(config) print(f"\nOptimizer: {config['optimizer']}") print(f" Active params: {active}") print(f" Config: {config}") print(f" Valid: {opt_space.validate(config)}") print("\n" + "=" * 50) print("Neural Network Architecture Space") print("=" * 50) nn_space = create_neural_network_space() for _ in range(5): config = nn_space.sample() active = nn_space.get_active_parameters(config) print(f"\nNum layers: {config['num_layers']}") print(f" Active params: {active}") layer_units = [config.get(f'layer{i}_units') for i in range(1, 5) if config.get(f'layer{i}_units')] print(f" Architecture: {layer_units}")Different HPO frameworks have varying levels of support for conditional hyperparameters. Here's how the major tools handle them:
Optuna handles conditional hyperparameters via dynamic search spaces defined by the trial object. The suggest_* methods are called conditionally based on earlier choices.
import optuna
def objective(trial):
# Parent hyperparameter
optimizer = trial.suggest_categorical('optimizer',
['sgd', 'adam', 'adamw'])
lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True)
# Conditional: SGD-specific
if optimizer == 'sgd':
momentum = trial.suggest_float('momentum', 0.0, 0.99)
nesterov = trial.suggest_categorical('nesterov', [True, False])
opt = SGD(lr=lr, momentum=momentum, nesterov=nesterov)
# Conditional: Adam variants
elif optimizer in ['adam', 'adamw']:
beta1 = trial.suggest_float('beta1', 0.8, 0.99)
beta2 = trial.suggest_float('beta2', 0.9, 0.9999)
if optimizer == 'adam':
opt = Adam(lr=lr, betas=(beta1, beta2))
else:
wd = trial.suggest_float('weight_decay', 1e-6, 0.1, log=True)
opt = AdamW(lr=lr, betas=(beta1, beta2), weight_decay=wd)
return train_and_evaluate(opt)
Optuna TPE naturally handles conditionals because each hyperparameter is modeled independently.
| Framework | Approach | Conditional Support | Best For |
|---|---|---|---|
| Optuna | Dynamic/trial-based | ⭐⭐⭐ Native | Complex conditionals, flexibility |
| ConfigSpace/SMAC | Declarative Conditions | ⭐⭐⭐ Native | Validated configs, algorithm config |
| Hyperopt | Nested dictionaries | ⭐⭐ Supported | Tree-structured spaces |
| Ray Tune | Via integrations | ⭐⭐ Via Optuna/SMAC | Distribution, scheduling |
| sklearn GridSearchCV | None | ❌ Not supported | Simple, flat spaces only |
Based on extensive experience with conditional HPO, here are recommended patterns for structuring your search spaces:
beta1 means the same for Adam and AdamW, share it. Don't create separate adam_beta1 and adamw_beta1 unless they should differ.It's tempting to encode every possible architectural choice as conditional hyperparameters. Resist this temptation. A space with 50 hyperparameters nested 5 levels deep is nearly impossible to optimize effectively. Start simple and add complexity only when simpler spaces are exhausted.
When to Flatten vs When to Use Conditionals
Flatten when:
Use conditionals when:
Conditional hyperparameters add a layer of complexity that reflects the true structure of many ML problems. Handling them correctly leads to more efficient HPO.
What's Next
With all aspects of search space structure now covered—continuous, discrete, and conditional—we'll next explore hyperparameter importance: understanding which hyperparameters matter most, and how to use this knowledge to focus your optimization efforts.
You now understand how to define and work with conditional hyperparameter spaces, and how different HPO frameworks handle hierarchical dependencies. This knowledge enables you to model complex algorithm configuration problems correctly.