Hyperparameter Fundamentals - Learning Module

Loading content...

0/245

Conditional Hyperparameters

When Hyperparameters Depend on Other Hyperparameters

Not all hyperparameters are created equal—some only matter when others take specific values. Consider choosing between Adam and SGD as your optimizer. If you choose Adam, its specific $\beta_1, \beta_2$ parameters become relevant. If you choose SGD, those parameters don't exist, but momentum and Nesterov acceleration become relevant instead.

These are conditional hyperparameters (also called hierarchical or nested hyperparameters). They create tree-like or DAG-like structures in the search space, where different branches have different hyperparameters. Properly handling conditional dependencies is critical for efficient HPO—ignoring them wastes budget on invalid or meaningless configurations.

What You Will Learn

By the end of this page, you will: • Understand the formal structure of conditional hyperparameter spaces • Recognize common patterns of conditional dependencies • Know how to encode conditional spaces for different HPO algorithms • Implement conditional logic in major HPO frameworks

Formal Definition of Conditional Spaces

A conditional hyperparameter is one whose existence or relevance depends on the value of one or more parent hyperparameters.

Formal Definition:

Let $\lambda_p$ be a parent hyperparameter and $\lambda_c$ be a child hyperparameter. We say $\lambda_c$ is conditional on $\lambda_p$ if:

$$\lambda_c \in \Lambda_c \text{ is defined } \iff \lambda_p \in P_c \subseteq \Lambda_p$$

where $P_c$ is the activation set—the values of the parent that activate the child.

Example: For an SVM:

$\lambda_p = \text{kernel} \in {\text{linear}, \text{poly}, \text{rbf}}$
$\lambda_c = \gamma \in [10^{-5}, 10^2]$ is only defined when $\lambda_p \in {\text{poly}, \text{rbf}}$
$\lambda_c' = \text{degree} \in {2, 3, 4, 5}$ is only defined when $\lambda_p = \text{poly}$

This creates a hierarchical structure where the parent choice branches into different subspaces.

Configuration Space as a DAG

Conditional hyperparameters form a Directed Acyclic Graph (DAG) where:

Nodes are hyperparameters
Edges represent conditional dependencies (parent → child)
Root nodes are unconditional hyperparameters
Leaf nodes may be conditional or unconditional

The DAG structure ensures no circular dependencies: a hyperparameter cannot be conditional on itself or on a hyperparameter that depends on it.

Valid Configurations:

A configuration $\lambda = (\lambda_1, ..., \lambda_d)$ is valid if and only if:

All unconditional hyperparameters have valid values
For each conditional hyperparameter, either:
- Its parent(s) activate it AND it has a valid value, OR
- Its parent(s) don't activate it AND it has a special 'inactive' value (often NaN or None)

Not Just About Validity

Conditional dependencies aren't just about what's valid—they're about what's meaningful. Setting γ for a linear SVM is technically possible but completely meaningless. HPO should understand this to avoid wasting evaluations on configurations where inactive hyperparameters are varied.

Common Conditional Hyperparameter Patterns

Conditional dependencies appear throughout machine learning. Here are the most common patterns:

Pattern: A categorical choice of algorithm, where each algorithm has its own hyperparameters.

Example: Optimizer Selection

optimizer ∈ {SGD, Adam, AdamW}
│
├── if SGD:
│   ├── momentum ∈ [0, 0.99]
│   ├── nesterov ∈ {True, False}
│   └── dampening ∈ [0, 1]
│
├── if Adam:
│   ├── beta1 ∈ [0.8, 0.99]
│   └── beta2 ∈ [0.9, 0.9999]
│
└── if AdamW:
    ├── beta1 ∈ [0.8, 0.99]
    ├── beta2 ∈ [0.9, 0.9999]
    └── weight_decay ∈ [1e-6, 0.1]  # Decoupled

Key observation: Different algorithms have completely different hyperparameters. Sharing hyperparameters across algorithms (e.g., 'momentum' for Adam) would be invalid.

Challenges in Optimizing Conditional Spaces

Conditional hyperparameters create several technical challenges for HPO algorithms:

Key Challenges

•Invalid Configurations: Naive sampling might generate invalid configs (e.g., setting gamma for linear SVM). We need conditional sampling.
•Incomparable Configurations: How do you compare (SGD, momentum=0.9) with (Adam, beta1=0.95)? Standard distance metrics don't work across different branches.
•Inactive Value Problem: What value should inactive hyperparameters take? NaN? Default? This affects kernels and similarity metrics.
•Uneven Exploration: If branches have different numbers of hyperparameters, the branch with more hyperparameters might get explored more simply due to its higher dimensionality.
•Transfer Across Branches: Information from evaluating Adam configs might not help predict SGD performance, limiting knowledge transfer.

The Comparison Problem in Detail

Consider two configurations:

$$\lambda_1 = (\text{optimizer=SGD}, \text{momentum}=0.9, \text{lr}=0.01)$$ $$\lambda_2 = (\text{optimizer=Adam}, \beta_1=0.9, \text{lr}=0.001)$$

What's the 'distance' between $\lambda_1$ and $\lambda_2$? Standard approaches fail:

Euclidean distance: momentum and $\beta_1$ aren't the same dimension
Hamming distance: Only captures that optimizer differs, ignoring magnitudes
One-hot + Euclidean: Loses semantic relationships

Solution approaches:

Separate surrogate models per branch: Model SGD and Adam configurations independently
Hierarchical kernels: Design kernels that handle conditional structure
Graph-based representations: Represent configs as graphs, use graph kernels
Imputation: Fill inactive hyperparameters with defaults, use standard kernels

The SMAC Approach

SMAC (Sequential Model-based Algorithm Configuration) was specifically designed for conditional spaces. It uses random forests that naturally handle missing values (inactive hyperparameters) and can split on parent hyperparameters before considering children.

Implementation Strategies

Let's examine how conditional hyperparameters are implemented in practice, with working examples in major frameworks.

conditional_hyperparameters.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
"""
Conditional Hyperparameters: Implementation Examples
 
This module demonstrates how to define and work with conditional
hyperparameter spaces across different scenarios.
"""
import numpy as np
from dataclasses import dataclass, field
from typing import Dict, Any, List, Optional, Union, Callable
from enum import Enum
 
 
class ParameterStatus(Enum):
    """Status of a conditional hyperparameter."""
    ACTIVE = "active"
    INACTIVE = "inactive"
 
 
@dataclass
class ConditionalParameter:
    """
    A hyperparameter with optional conditional activation.
    
    Attributes:
        name: Parameter identifier
        param_type: 'continuous', 'integer', 'categorical'
        bounds: For continuous/integer, (low, high). For categorical, list of choices.
        parent: Name of parent parameter (None if unconditional)
        parent_values: Values of parent that activate this parameter
        log_scale: Whether to use log scale (for continuous)
        default: Default value when active
    """
    name: str
    param_type: str
    bounds: Union[tuple, List]
    parent: Optional[str] = None
    parent_values: Optional[List] = None
    log_scale: bool = False
    default: Any = None
    
    def is_conditional(self) -> bool:
        """Check if this parameter is conditional."""
        return self.parent is not None
    
    def is_active(self, config: Dict[str, Any]) -> bool:
        """Check if this parameter is active given a (partial) configuration."""
        if not self.is_conditional():
            return True
        
        if self.parent not in config:
            return False  # Parent not yet set
        
        return config[self.parent] in self.parent_values
 
 
class ConditionalConfigurationSpace:
    """
    A configuration space with conditional dependencies.
    
    Supports:
    - Defining unconditional and conditional parameters
    - Validating configurations
    - Sampling valid configurations
    - Computing distance between configurations
    """
    
    def __init__(self):
        self.parameters: Dict[str, ConditionalParameter] = {}
        self._dependency_order: List[str] = []  # Topological order
    
    def add_parameter(self, param: ConditionalParameter):
        """Add a parameter to the space."""
        # Validate parent exists
        if param.parent is not None and param.parent not in self.parameters:
            raise ValueError(f"Parent {param.parent} not found for {param.name}")
        
        self.parameters[param.name] = param
        self._update_dependency_order()
    
    def _update_dependency_order(self):
        """Compute topological order of parameters for sampling."""
        # Simple implementation: unconditional first, then by depth
        unconditional = [p for p in self.parameters.values() 
                        if not p.is_conditional()]
        conditional = [p for p in self.parameters.values() 
                      if p.is_conditional()]
        
        self._dependency_order = [p.name for p in unconditional]
        
        # Add conditional params in dependency order
        remaining = conditional.copy()
        while remaining:
            for param in remaining.copy():
                if param.parent in self._dependency_order:
                    self._dependency_order.append(param.name)
                    remaining.remove(param)
    
    def sample(self) -> Dict[str, Any]:
        """
        Sample a valid configuration respecting conditional dependencies.
        
        Samples parameters in topological order, only sampling child
        parameters if their parent activates them.
        """
        config = {}
        
        for param_name in self._dependency_order:
            param = self.parameters[param_name]
            
            # Check if this parameter should be active
            if not param.is_active(config):
                config[param_name] = None  # Mark as inactive
                continue
            
            # Sample an active value
            config[param_name] = self._sample_parameter(param)
        
        return config
    
    def _sample_parameter(self, param: ConditionalParameter) -> Any:
        """Sample a single parameter value."""
        if param.param_type == 'categorical':
            return np.random.choice(param.bounds)
        
        elif param.param_type == 'continuous':
            low, high = param.bounds
            if param.log_scale:
                return np.exp(np.random.uniform(np.log(low), np.log(high)))
            else:
                return np.random.uniform(low, high)
        
        elif param.param_type == 'integer':
            low, high = param.bounds
            if param.log_scale:
                log_val = np.random.uniform(np.log(low), np.log(high))
                return int(np.round(np.exp(log_val)))
            else:
                return np.random.randint(low, high + 1)
    
    def validate(self, config: Dict[str, Any]) -> bool:
        """
        Validate a configuration.
        
        Checks:
        1. All unconditional parameters are present and valid
        2. Active conditional parameters have valid values
        3. Inactive parameters are None or absent
        """
        for param_name, param in self.parameters.items():
            is_active = param.is_active(config)
            value = config.get(param_name)
            
            if is_active:
                if value is None:
                    return False  # Active param must have value
                if not self._value_valid(param, value):
                    return False
            else:
                # Inactive: value should be None or absent
                if value is not None:
                    return False
        
        return True
    
    def _value_valid(self, param: ConditionalParameter, value: Any) -> bool:
        """Check if a value is valid for a parameter."""
        if param.param_type == 'categorical':
            return value in param.bounds
        elif param.param_type in ('continuous', 'integer'):
            low, high = param.bounds
            return low <= value <= high
        return True
    
    def get_active_parameters(self, config: Dict[str, Any]) -> List[str]:
        """Get list of active parameter names for a configuration."""
        return [name for name, param in self.parameters.items()
                if param.is_active(config)]
 
 
def create_optimizer_space() -> ConditionalConfigurationSpace:
    """
    Create a configuration space for optimizer selection.
    
    Demonstrates the algorithm-choice conditional pattern.
    """
    space = ConditionalConfigurationSpace()
    
    # Root: optimizer choice
    space.add_parameter(ConditionalParameter(
        name='optimizer',
        param_type='categorical',
        bounds=['sgd', 'adam', 'adamw'],
    ))
    
    # Learning rate (shared across all optimizers)
    space.add_parameter(ConditionalParameter(
        name='learning_rate',
        param_type='continuous',
        bounds=(1e-5, 0.1),
        log_scale=True,
    ))
    
    # SGD-specific parameters
    space.add_parameter(ConditionalParameter(
        name='momentum',
        param_type='continuous',
        bounds=(0.0, 0.99),
        parent='optimizer',
        parent_values=['sgd'],
    ))
    
    space.add_parameter(ConditionalParameter(
        name='nesterov',
        param_type='categorical',
        bounds=[True, False],
        parent='optimizer',
        parent_values=['sgd'],
    ))
    
    # Adam-specific parameters
    space.add_parameter(ConditionalParameter(
        name='beta1',
        param_type='continuous',
        bounds=(0.8, 0.99),
        parent='optimizer',
        parent_values=['adam', 'adamw'],  # Shared by both Adam variants
    ))
    
    space.add_parameter(ConditionalParameter(
        name='beta2',
        param_type='continuous',
        bounds=(0.9, 0.9999),
        parent='optimizer',
        parent_values=['adam', 'adamw'],
    ))
    
    # AdamW-specific: explicit weight decay
    space.add_parameter(ConditionalParameter(
        name='weight_decay',
        param_type='continuous',
        bounds=(1e-6, 0.1),
        log_scale=True,
        parent='optimizer',
        parent_values=['adamw'],
    ))
    
    return space
 
 
def create_neural_network_space() -> ConditionalConfigurationSpace:
    """
    Create a configuration space for neural network architecture.
    
    Demonstrates depth-dependent conditional parameters.
    """
    space = ConditionalConfigurationSpace()
    
    # Number of hidden layers
    space.add_parameter(ConditionalParameter(
        name='num_layers',
        param_type='integer',
        bounds=(1, 4),
    ))
    
    # Layer 1 (always exists)
    space.add_parameter(ConditionalParameter(
        name='layer1_units',
        param_type='integer',
        bounds=(32, 512),
        log_scale=True,
    ))
    
    # Layer 2 (conditional on num_layers >= 2)
    space.add_parameter(ConditionalParameter(
        name='layer2_units',
        param_type='integer',
        bounds=(32, 512),
        log_scale=True,
        parent='num_layers',
        parent_values=[2, 3, 4],
    ))
    
    # Layer 3 (conditional on num_layers >= 3)
    space.add_parameter(ConditionalParameter(
        name='layer3_units',
        param_type='integer',
        bounds=(32, 512),
        log_scale=True,
        parent='num_layers',
        parent_values=[3, 4],
    ))
    
    # Layer 4 (conditional on num_layers == 4)
    space.add_parameter(ConditionalParameter(
        name='layer4_units',
        param_type='integer',
        bounds=(32, 512),
        log_scale=True,
        parent='num_layers',
        parent_values=[4],
    ))
    
    return space
 
 
# Example usage
if __name__ == "__main__":
    print("Optimizer Configuration Space")
    print("=" * 50)
    
    opt_space = create_optimizer_space()
    
    for _ in range(5):
        config = opt_space.sample()
        active = opt_space.get_active_parameters(config)
        
        print(f"\nOptimizer: {config['optimizer']}")
        print(f"  Active params: {active}")
        print(f"  Config: {config}")
        print(f"  Valid: {opt_space.validate(config)}")
    
    print("\n" + "=" * 50)
    print("Neural Network Architecture Space")
    print("=" * 50)
    
    nn_space = create_neural_network_space()
    
    for _ in range(5):
        config = nn_space.sample()
        active = nn_space.get_active_parameters(config)
        
        print(f"\nNum layers: {config['num_layers']}")
        print(f"  Active params: {active}")
        layer_units = [config.get(f'layer{i}_units') 
                      for i in range(1, 5) if config.get(f'layer{i}_units')]
        print(f"  Architecture: {layer_units}")

Conditional Hyperparameters in HPO Frameworks

Different HPO frameworks have varying levels of support for conditional hyperparameters. Here's how the major tools handle them:

Optuna handles conditional hyperparameters via dynamic search spaces defined by the trial object. The suggest_* methods are called conditionally based on earlier choices.

import optuna

def objective(trial):
    # Parent hyperparameter
    optimizer = trial.suggest_categorical('optimizer', 
                                          ['sgd', 'adam', 'adamw'])
    lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True)
    
    # Conditional: SGD-specific
    if optimizer == 'sgd':
        momentum = trial.suggest_float('momentum', 0.0, 0.99)
        nesterov = trial.suggest_categorical('nesterov', [True, False])
        opt = SGD(lr=lr, momentum=momentum, nesterov=nesterov)
    
    # Conditional: Adam variants
    elif optimizer in ['adam', 'adamw']:
        beta1 = trial.suggest_float('beta1', 0.8, 0.99)
        beta2 = trial.suggest_float('beta2', 0.9, 0.9999)
        
        if optimizer == 'adam':
            opt = Adam(lr=lr, betas=(beta1, beta2))
        else:
            wd = trial.suggest_float('weight_decay', 1e-6, 0.1, log=True)
            opt = AdamW(lr=lr, betas=(beta1, beta2), weight_decay=wd)
    
    return train_and_evaluate(opt)

Optuna TPE naturally handles conditionals because each hyperparameter is modeled independently.

Framework Support for Conditional Hyperparameters
Framework	Approach	Conditional Support	Best For
Optuna	Dynamic/trial-based	⭐⭐⭐ Native	Complex conditionals, flexibility
ConfigSpace/SMAC	Declarative Conditions	⭐⭐⭐ Native	Validated configs, algorithm config
Hyperopt	Nested dictionaries	⭐⭐ Supported	Tree-structured spaces
Ray Tune	Via integrations	⭐⭐ Via Optuna/SMAC	Distribution, scheduling
sklearn GridSearchCV	None	❌ Not supported	Simple, flat spaces only

Design Patterns for Conditional Spaces

Based on extensive experience with conditional HPO, here are recommended patterns for structuring your search spaces:

Best Practices for Conditional Spaces

•Keep the hierarchy shallow: Deeply nested conditionals are hard to optimize. If possible, restructure to have at most 2-3 levels of nesting.
•Share hyperparameters when semantically appropriate: If beta1 means the same for Adam and AdamW, share it. Don't create separate adam_beta1 and adamw_beta1 unless they should differ.
•Group related conditionals: Put all architecture-related conditionals together, all optimizer-related together. This improves readability and maintenance.
•Consider factoring out common sub-spaces: If multiple branches share the same structure (e.g., regularization settings), factor them into unconditional parameters.
•Document the dependency graph: Complex conditional spaces benefit from visual documentation showing the tree/DAG structure.
•Test sampling thoroughly: Verify that your space samples valid configurations and covers all branches appropriately.

The Complexity Trap

It's tempting to encode every possible architectural choice as conditional hyperparameters. Resist this temptation. A space with 50 hyperparameters nested 5 levels deep is nearly impossible to optimize effectively. Start simple and add complexity only when simpler spaces are exhausted.

When to Flatten vs When to Use Conditionals

Flatten when:

Different branches have similar hyperparameters (just rename)
The conditional logic adds complexity without expressiveness
You have limited optimization budget

Use conditionals when:

Different branches have genuinely different hyperparameters
Invalid configurations would waste evaluations
The structure matches your mental model of the problem

Summary: Mastering Conditional Hyperparameters

Conditional hyperparameters add a layer of complexity that reflects the true structure of many ML problems. Handling them correctly leads to more efficient HPO.

Key Takeaways

•Conditional hyperparameters only matter when their parent hyperparameters take certain values, creating hierarchical structures.
•Common patterns include algorithm choice (optimizer, kernel), feature toggles (use_dropout), and depth-dependent parameters (per-layer settings).
•Challenges include invalid configurations, incomparable configurations across branches, and inactive value handling.
•Modern frameworks (Optuna, ConfigSpace/SMAC) have native support for conditional dependencies.
•Design carefully: Keep hierarchies shallow, share hyperparameters when appropriate, and document the dependency structure.
•Use the right tool: Tree-based methods (TPE, SMAC) handle conditional spaces more naturally than GP-based methods.

What's Next

With all aspects of search space structure now covered—continuous, discrete, and conditional—we'll next explore hyperparameter importance: understanding which hyperparameters matter most, and how to use this knowledge to focus your optimization efforts.

Page Complete

You now understand how to define and work with conditional hyperparameter spaces, and how different HPO frameworks handle hierarchical dependencies. This knowledge enables you to model complex algorithm configuration problems correctly.