Machine LearningAutoML & Neural Architecture Search

AutoML Overview

LevelAdvanced

Duration90 mins

TopicAutoML & Neural Architecture Search

3 / 5

Search Space

Defining What's Possible

The search space is the universe of configurations that an AutoML system can explore. It defines the boundaries of what's achievable—configurations outside the search space can never be discovered, no matter how effective the search strategy.

Search space design is a critical skill that separates effective AutoML from wasted computation. Too narrow a space may exclude the optimal solution; too broad wastes resources exploring irrelevant configurations.

This page covers the theory and practice of search space design: parameter types, space structure, conditioning, and the art of balancing expressiveness with tractability.

What You Will Learn

You'll master continuous, discrete, categorical, and conditional parameter spaces. You'll understand hierarchical structures, learn to handle constraints, and develop intuition for designing spaces that are both expressive and searchable.

Search Space Fundamentals

A search space $\mathcal{X}$ is a set of possible configurations. Each configuration $x \in \mathcal{X}$ specifies values for all hyperparameters:

$$x = (x_1, x_2, ..., x_n)$$

where each $x_i$ is drawn from its corresponding domain $\mathcal{X}_i$.

Parameter Types:

Different parameters have fundamentally different characteristics that affect how they should be searched:

Hyperparameter Types and Properties
Type	Domain	Examples	Search Considerations
Continuous	Real interval [a, b]	Learning rate, regularization strength	Gradient-based or model-based optimization
Integer	Discrete range {a, ..., b}	Hidden layer size, n_estimators	Can quantize continuous or use integer-aware methods
Categorical	Unordered set {a, b, c}	Optimizer type, kernel type	No metric; requires enumeration or encoding
Ordinal	Ordered set {low, med, high}	Model complexity levels	Has order but not necessarily metric
Boolean	Binary {True, False}	Use dropout?, early stopping?	Special case of categorical with 2 values

Scale Matters:

Many parameters are best searched on logarithmic scale. Learning rates, regularization strengths, and other multiplicative factors vary over orders of magnitude:

Learning rate: $10^{-5}$ to $10^{-1}$ (5 orders of magnitude)
L2 regularization: $10^{-6}$ to $10^{0}$ (6 orders of magnitude)

Searching uniformly in linear space wastes most samples in the upper ranges. Log-scale sampling distributes samples evenly across magnitudes:

search_space_scales.py

When to Use Log Scale

Use log scale when: (1) the parameter spans multiple orders of magnitude, (2) the effect is multiplicative rather than additive, (3) small values need fine granularity. Common log-scale parameters: learning rate, regularization, dropout, weight decay, kernel bandwidth.

Conditional Parameter Spaces

Real ML configurations have conditional structure: some parameters only matter when others take specific values. This creates a hierarchical, tree-structured search space.

Example: Neural Network Configuration

Consider configuring a neural network optimizer:

If optimizer = 'SGD': momentum matters, beta1/beta2 don't
If optimizer = 'Adam': beta1/beta2 matter, momentum doesn't
If use_dropout = True: dropout_rate matters
If use_dropout = False: dropout_rate is irrelevant

Naively treating all parameters as independent wastes search effort on meaningless combinations.

conditional_space.py

Why Conditional Spaces Matter:

Reduced Effective Dimensionality: A 20-parameter space with conditions may have only 8-10 active parameters for any given configuration.
Meaningful Configurations Only: Prevents wasting evaluations on nonsensical combinations.
Better Surrogate Models: Optimization algorithms can learn structure rather than treating inactive parameters as noise.
Hardware/Memory Constraints: Can encode constraints like "if model_size = large, then batch_size ≤ 32".

Handling Inactive Parameters

When a conditional parameter is inactive, search algorithms must handle it carefully. Common approaches: (1) treat as missing/imputed value, (2) use special 'inactive' marker, (3) use separate surrogate models per parent value. ConfigSpace and similar libraries handle this automatically.

The CASH Search Space

Combined Algorithm Selection and Hyperparameter optimization (CASH) unifies algorithm choice with parameter tuning into a single hierarchical search space.

Structure:

Root: algorithm ∈ {RandomForest, GradientBoosting, SVM, NeuralNet, ...}
│
├── If RandomForest:
│   ├── n_estimators ∈ [50, 500]
│   ├── max_depth ∈ [None, 5, 10, 20, 50]
│   └── min_samples_split ∈ [2, 20]
│
├── If GradientBoosting:
│   ├── n_estimators ∈ [50, 500]  
│   ├── learning_rate ∈ [0.001, 0.3] (log)
│   └── max_depth ∈ [3, 10]
│
├── If SVM:
│   ├── C ∈ [0.001, 1000] (log)
│   ├── kernel ∈ {linear, rbf, poly}
│   └── If kernel = rbf: gamma ∈ [1e-5, 10] (log)
│
└── If NeuralNet:
    ├── n_layers ∈ [1, 5]
    ├── hidden_size ∈ [16, 512]
    └── activation ∈ {relu, tanh, elu}

This hierarchical structure naturally encodes that different algorithms have different hyperparameters.

cash_space_definition.py

CASH Space Complexity

CASH spaces can be enormous. With 5 algorithms averaging 100,000 hyperparameter combinations each, plus preprocessing options (120 combinations), the total space exceeds 50 million configurations. Efficient search strategies are essential—exhaustive evaluation is impossible.

Neural Architecture Search Spaces

NAS search spaces define the universe of possible neural network architectures. The design of this space profoundly affects both the quality of discovered architectures and the computational cost of search.

Cell-Based Search Spaces:

Modern NAS typically searches for cells—small repeating units—rather than entire networks. A cell contains nodes connected by operations. The network is constructed by stacking cells.

NAS Search Space Components

•Operation Set: {3×3 conv, 5×5 conv, 3×3 sep conv, max pool, avg pool, skip, zero} — what transformations can edges perform?
•Number of Nodes: Typically 4-7 intermediate nodes per cell — how complex can the cell be?
•Connection Pattern: Which nodes can connect to which? Dense (all predecessors) or constrained?
•Cell Types: Normal cells (preserve resolution) vs reduction cells (downsample) — different cells for different functions?
•Macro Structure: How many cells? How are they arranged? Where do reductions occur?

Search Space Size Calculation:

For a cell with B nodes, each receiving 2 inputs from prior nodes, with K operations:

$$|\mathcal{X}| = \prod_{i=2}^{B+1} K^2 \cdot \binom{i}{2} = K^{2B} \cdot \frac{(B+1)!}{2}$$

With B=4 nodes and K=8 operations: $$|\mathcal{X}| = 8^8 \cdot 60 \approx 10^9 \text{ architectures}$$

Exhaustive search is clearly impossible.

Constraining the Space:

Effective NAS requires balancing expressiveness with tractability:

Reduce operations: Fewer primitive operations shrink the space exponentially
Limit connections: Force each node to select exactly 2 inputs rather than any subset
Share cells: Use same cell throughout rather than unique cells per layer
Pre-define macro: Fix overall network structure, search only within cells

Search Space Bias

The search space encodes architectural priors. Cell-based spaces assume repeating structure. Operation sets encode inductive biases (conv for vision, attention for sequences). Good search space design incorporates domain knowledge to narrow search to promising regions.

Handling Constraints and Multiple Objectives

Real-world AutoML must optimize not just accuracy but multiple objectives subject to constraints.

Common Objectives:

Prediction accuracy/error
Inference latency
Model size (memory footprint)
Training time
Energy consumption
Interpretability

Common Constraints:

Maximum inference time per prediction
Maximum model size for deployment
Minimum accuracy threshold
Hardware-specific limits (TPU/GPU memory)

Approaches to Constrained Multi-Objective AutoML
Approach	Mechanism	Pros	Cons
Constraint as Penalty	Add constraint violation to objective	Simple, works with single-objective solvers	Requires tuning penalty weight
Feasibility Filtering	Reject infeasible configurations	Clean search space	May reject good approximate solutions
Pareto Optimization	Find Pareto front of non-dominated solutions	Returns full tradeoff surface	More complex, harder to automate selection
Scalarization	Weighted sum of objectives	Reduces to single-objective	May miss concave Pareto regions
Lexicographic	Optimize objectives in priority order	Clear priorities	Ignores tradeoffs between lower-priority objectives

constrained_optimization.py

Hardware-Aware NAS

Modern NAS increasingly incorporates hardware constraints directly. Systems like Once-For-All train a supernet once, then extract subnets meeting specific constraints (latency on iPhone, memory on IoT device). This amortizes search cost across many deployment targets.

Search Space Design Principles

Effective search space design balances competing concerns. These principles guide the process:

Search Space Design Principles

•Include the Optimum: Ensure the optimal configuration (or a good approximation) is reachable. Overly narrow spaces may exclude excellent solutions.
•Use Prior Knowledge: Leverage experience about reasonable parameter ranges. Don't search learning rates from 0 to 1000.
•Match Scale to Sensitivity: Use log scale for parameters with exponential effect; finer granularity for sensitive parameters.
•Encode Conditional Structure: Use hierarchical/conditional spaces to avoid nonsensical combinations and reduce effective dimensionality.
•Start Broad, Narrow Iteratively: Begin with wide ranges, analyze results, focus on promising regions in subsequent searches.
•Consider Compute Budget: Larger spaces require more evaluations. Match space size to available compute.
•Include Baselines: Ensure default/standard configurations are within the space for comparison.

Common Mistakes:

Mistake	Consequence	Fix
Too narrow ranges	Miss optimal regions	Analyze sensitivity, expand
Linear scale for log-scale params	Waste samples in one region	Use log-uniform
Ignoring conditionals	Evaluate meaningless configs	Model hierarchical structure
Including irrelevant params	Curse of dimensionality	Remove parameters with no effect
Too many categorical choices	Combinatorial explosion	Group similar, limit options

Iterative Refinement

Search space design is iterative. Run initial search with broad ranges, analyze which regions perform well, then define focused space for deeper search. This 'zoom-in' pattern efficiently allocates budget.

Summary: Search Space Design

Key Takeaways

•Search space defines reachability — Configurations outside the space can never be found. Design carefully.
•Parameter types matter — Continuous, discrete, categorical, and conditional parameters require different handling.
•Log scale is often essential — Learning rates, regularization, and other multiplicative parameters need log-uniform sampling.
•Conditional structure reduces waste — Hierarchical spaces avoid evaluating meaningless parameter combinations.
•CASH unifies algorithm and hyperparameter search — Treat algorithm choice as a categorical hyperparameter with conditional children.
•NAS spaces are enormous — Cell-based designs and operation constraints make search tractable.
•Constraints require explicit handling — Multi-objective optimization or penalty methods address real-world requirements.

What's Next:

With the search space defined, we need search strategies to explore it efficiently. The next page covers methods from random search through Bayesian optimization to evolutionary algorithms—the engines that power AutoML exploration.

Page Complete

You now understand how to define AutoML search spaces: parameter types, scaling, conditional structure, CASH formulation, NAS spaces, and design principles. Next, we'll explore the search strategies that navigate these spaces efficiently.

3 / 5

Loading learning content...

Machine LearningAutoML & Neural Architecture Search

AutoML Overview

LevelAdvanced

Duration90 mins

TopicAutoML & Neural Architecture Search

3 / 5

Search Space

Defining What's Possible

This page covers the theory and practice of search space design: parameter types, space structure, conditioning, and the art of balancing expressiveness with tractability.

What You Will Learn

Search Space Fundamentals

A search space $\mathcal{X}$ is a set of possible configurations. Each configuration $x \in \mathcal{X}$ specifies values for all hyperparameters:

$$x = (x_1, x_2, ..., x_n)$$

where each $x_i$ is drawn from its corresponding domain $\mathcal{X}_i$.

Parameter Types:

Different parameters have fundamentally different characteristics that affect how they should be searched:

Hyperparameter Types and Properties
Type	Domain	Examples	Search Considerations
Continuous	Real interval [a, b]	Learning rate, regularization strength	Gradient-based or model-based optimization
Integer	Discrete range {a, ..., b}	Hidden layer size, n_estimators	Can quantize continuous or use integer-aware methods
Categorical	Unordered set {a, b, c}	Optimizer type, kernel type	No metric; requires enumeration or encoding
Ordinal	Ordered set {low, med, high}	Model complexity levels	Has order but not necessarily metric
Boolean	Binary {True, False}	Use dropout?, early stopping?	Special case of categorical with 2 values

Scale Matters:

Many parameters are best searched on logarithmic scale. Learning rates, regularization strengths, and other multiplicative factors vary over orders of magnitude:

Learning rate: $10^{-5}$ to $10^{-1}$ (5 orders of magnitude)
L2 regularization: $10^{-6}$ to $10^{0}$ (6 orders of magnitude)

Searching uniformly in linear space wastes most samples in the upper ranges. Log-scale sampling distributes samples evenly across magnitudes:

search_space_scales.py

When to Use Log Scale

Conditional Parameter Spaces

Real ML configurations have conditional structure: some parameters only matter when others take specific values. This creates a hierarchical, tree-structured search space.

Example: Neural Network Configuration

Consider configuring a neural network optimizer:

If optimizer = 'SGD': momentum matters, beta1/beta2 don't
If optimizer = 'Adam': beta1/beta2 matter, momentum doesn't
If use_dropout = True: dropout_rate matters
If use_dropout = False: dropout_rate is irrelevant

Naively treating all parameters as independent wastes search effort on meaningless combinations.

conditional_space.py

Why Conditional Spaces Matter:

Reduced Effective Dimensionality: A 20-parameter space with conditions may have only 8-10 active parameters for any given configuration.
Meaningful Configurations Only: Prevents wasting evaluations on nonsensical combinations.
Better Surrogate Models: Optimization algorithms can learn structure rather than treating inactive parameters as noise.
Hardware/Memory Constraints: Can encode constraints like "if model_size = large, then batch_size ≤ 32".

Handling Inactive Parameters

The CASH Search Space

Combined Algorithm Selection and Hyperparameter optimization (CASH) unifies algorithm choice with parameter tuning into a single hierarchical search space.

Structure:

Root: algorithm ∈ {RandomForest, GradientBoosting, SVM, NeuralNet, ...}
│
├── If RandomForest:
│   ├── n_estimators ∈ [50, 500]
│   ├── max_depth ∈ [None, 5, 10, 20, 50]
│   └── min_samples_split ∈ [2, 20]
│
├── If GradientBoosting:
│   ├── n_estimators ∈ [50, 500]  
│   ├── learning_rate ∈ [0.001, 0.3] (log)
│   └── max_depth ∈ [3, 10]
│
├── If SVM:
│   ├── C ∈ [0.001, 1000] (log)
│   ├── kernel ∈ {linear, rbf, poly}
│   └── If kernel = rbf: gamma ∈ [1e-5, 10] (log)
│
└── If NeuralNet:
    ├── n_layers ∈ [1, 5]
    ├── hidden_size ∈ [16, 512]
    └── activation ∈ {relu, tanh, elu}

This hierarchical structure naturally encodes that different algorithms have different hyperparameters.

cash_space_definition.py

CASH Space Complexity

Neural Architecture Search Spaces

Cell-Based Search Spaces:

Modern NAS typically searches for cells—small repeating units—rather than entire networks. A cell contains nodes connected by operations. The network is constructed by stacking cells.

NAS Search Space Components

•Operation Set: {3×3 conv, 5×5 conv, 3×3 sep conv, max pool, avg pool, skip, zero} — what transformations can edges perform?
•Number of Nodes: Typically 4-7 intermediate nodes per cell — how complex can the cell be?
•Connection Pattern: Which nodes can connect to which? Dense (all predecessors) or constrained?
•Cell Types: Normal cells (preserve resolution) vs reduction cells (downsample) — different cells for different functions?
•Macro Structure: How many cells? How are they arranged? Where do reductions occur?

Search Space Size Calculation:

For a cell with B nodes, each receiving 2 inputs from prior nodes, with K operations:

$$|\mathcal{X}| = \prod_{i=2}^{B+1} K^2 \cdot \binom{i}{2} = K^{2B} \cdot \frac{(B+1)!}{2}$$

With B=4 nodes and K=8 operations: $$|\mathcal{X}| = 8^8 \cdot 60 \approx 10^9 \text{ architectures}$$

Exhaustive search is clearly impossible.

Constraining the Space:

Effective NAS requires balancing expressiveness with tractability:

Reduce operations: Fewer primitive operations shrink the space exponentially
Limit connections: Force each node to select exactly 2 inputs rather than any subset
Share cells: Use same cell throughout rather than unique cells per layer
Pre-define macro: Fix overall network structure, search only within cells

Search Space Bias

Handling Constraints and Multiple Objectives

Real-world AutoML must optimize not just accuracy but multiple objectives subject to constraints.

Common Objectives:

Prediction accuracy/error
Inference latency
Model size (memory footprint)
Training time
Energy consumption
Interpretability

Common Constraints:

Maximum inference time per prediction
Maximum model size for deployment
Minimum accuracy threshold
Hardware-specific limits (TPU/GPU memory)

Approaches to Constrained Multi-Objective AutoML
Approach	Mechanism	Pros	Cons
Constraint as Penalty	Add constraint violation to objective	Simple, works with single-objective solvers	Requires tuning penalty weight
Feasibility Filtering	Reject infeasible configurations	Clean search space	May reject good approximate solutions
Pareto Optimization	Find Pareto front of non-dominated solutions	Returns full tradeoff surface	More complex, harder to automate selection
Scalarization	Weighted sum of objectives	Reduces to single-objective	May miss concave Pareto regions
Lexicographic	Optimize objectives in priority order	Clear priorities	Ignores tradeoffs between lower-priority objectives

constrained_optimization.py

Hardware-Aware NAS

Search Space Design Principles

Effective search space design balances competing concerns. These principles guide the process:

Search Space Design Principles

•Include the Optimum: Ensure the optimal configuration (or a good approximation) is reachable. Overly narrow spaces may exclude excellent solutions.
•Use Prior Knowledge: Leverage experience about reasonable parameter ranges. Don't search learning rates from 0 to 1000.
•Match Scale to Sensitivity: Use log scale for parameters with exponential effect; finer granularity for sensitive parameters.
•Encode Conditional Structure: Use hierarchical/conditional spaces to avoid nonsensical combinations and reduce effective dimensionality.
•Start Broad, Narrow Iteratively: Begin with wide ranges, analyze results, focus on promising regions in subsequent searches.
•Consider Compute Budget: Larger spaces require more evaluations. Match space size to available compute.
•Include Baselines: Ensure default/standard configurations are within the space for comparison.

Common Mistakes:

Mistake	Consequence	Fix
Too narrow ranges	Miss optimal regions	Analyze sensitivity, expand
Linear scale for log-scale params	Waste samples in one region	Use log-uniform
Ignoring conditionals	Evaluate meaningless configs	Model hierarchical structure
Including irrelevant params	Curse of dimensionality	Remove parameters with no effect
Too many categorical choices	Combinatorial explosion	Group similar, limit options

Iterative Refinement

Summary: Search Space Design

Key Takeaways

•Search space defines reachability — Configurations outside the space can never be found. Design carefully.
•Parameter types matter — Continuous, discrete, categorical, and conditional parameters require different handling.
•Log scale is often essential — Learning rates, regularization, and other multiplicative parameters need log-uniform sampling.
•Conditional structure reduces waste — Hierarchical spaces avoid evaluating meaningless parameter combinations.
•CASH unifies algorithm and hyperparameter search — Treat algorithm choice as a categorical hyperparameter with conditional children.
•NAS spaces are enormous — Cell-based designs and operation constraints make search tractable.
•Constraints require explicit handling — Multi-objective optimization or penalty methods address real-world requirements.

What's Next:

Page Complete

3 / 5