Loading content...
The search space is arguably the most critical component of any Neural Architecture Search system. It determines which architectures can possibly be discovered—no search algorithm, however sophisticated, can find an architecture outside its search space.
The fundamental tension:
Designing an effective search space requires balancing expressiveness (can we represent good architectures?) with tractability (can we search efficiently?). This page provides a deep understanding of search space design principles and common paradigms.
Master the principles of search space design: cell-based vs macro search, operation selection, connectivity constraints, and how design choices impact search efficiency and discovered architecture quality.
A search space $\mathcal{A}$ is formally the set of all architectures that can be expressed under a given parameterization. Every architecture $a \in \mathcal{A}$ corresponds to a specific instantiation of architectural choices.
Dimensionality of Search Spaces:
The size of a search space grows combinatorially with the number of choices:
$$|\mathcal{A}| = \prod_{i=1}^{n} |C_i|$$
Where $C_i$ is the set of choices for the $i$-th architectural decision. With $n$ decisions and $k$ options each, the space contains $k^n$ architectures.
Example: A Simple Cell Space
Consider a cell with 4 intermediate nodes, where each node:
For node $i$ (0-indexed), there are $(2+i)^2$ input combinations and $7^2 = 49$ operation combinations.
The total space size: $$|\mathcal{A}| = \prod_{i=0}^{3} (2+i)^2 \times 49 = 2^2 \times 3^2 \times 4^2 \times 5^2 \times 49^4 \approx 3.3 \times 10^{10}$$
Even this "simple" cell contains billions of possible architectures.
| Search Space | Approximate Size | Notable Architectures |
|---|---|---|
| NAS-Bench-101 | 423,624 | Limited but fully evaluated |
| NAS-Bench-201 | 15,625 | Multiple datasets, smaller space |
| NASNet Cell Space | ~10^{18} | Original NAS; huge but structured |
| DARTS Space | ~10^{18} | Continuous relaxation makes this tractable |
| Unconstrained DAG (10 nodes) | ~10^{30}+ | Impractical without constraints |
The cell-based paradigm revolutionized NAS by dramatically reducing search space complexity while maintaining expressiveness. Instead of searching for the entire network, we search for a small repeating unit (cell) that is stacked to form the full architecture.
Key insight: Many successful architectures (ResNet, Inception, etc.) consist of repeated blocks. Searching for the block rather than the full network reduces complexity from $O(|C|^L)$ to $O(|C|^B)$, where $L$ is network depth and $B$ is block size ($B \ll L$).
NASNet Cell Structure:
The NASNet cell design introduced key concepts still used today:
12345678910111213141516171819202122232425262728293031323334353637
"""Cell-Based Search Space Implementation"""from dataclasses import dataclassfrom typing import List, Tupleimport itertools # Standard operations in NASNet/DARTS-style spacesOPERATIONS = [ 'none', # Zero operation (skip) 'skip_connect', # Identity connection 'sep_conv_3x3', # Separable convolution 3x3 'sep_conv_5x5', # Separable convolution 5x5 'dil_conv_3x3', # Dilated convolution 3x3 'dil_conv_5x5', # Dilated convolution 5x5 'avg_pool_3x3', # Average pooling 3x3 'max_pool_3x3', # Max pooling 3x3] @dataclassclass CellGenotype: """Represents a cell architecture""" normal: List[Tuple[str, int, str, int]] # [(op1, input1, op2, input2), ...] reduce: List[Tuple[str, int, str, int]] # Same for reduction cell def enumerate_cell_space(num_nodes: int = 4) -> int: """Calculate size of cell search space""" total = 1 for i in range(num_nodes): num_inputs = 2 + i # 2 cell inputs + previous nodes # Each node: 2 input selections × 2 operation selections node_choices = (num_inputs ** 2) * (len(OPERATIONS) ** 2) total *= node_choices return total print(f"Cell space size (4 nodes, 8 ops): {enumerate_cell_space():,}")# Output: Cell space size (4 nodes, 8 ops): ~33 billionCells searched on small proxy tasks (CIFAR-10) often transfer to larger tasks (ImageNet) by stacking more cells. This enables searching on cheaper settings and deploying at scale—a key efficiency advantage.
Macro search directly searches for the entire network architecture rather than repeating cells. This provides maximum flexibility but at significantly higher search cost.
Chain-Structured Macro Search:
The simplest macro space defines a sequence of layers, choosing the operation at each position:
$$a = (o_1, o_2, ..., o_L)$$
where $o_i \in \mathcal{O}$ is the operation at layer $i$.
Multi-Branch Macro Search:
More expressive spaces allow branches and skip connections:
When to Use Macro vs Cell Search:
The choice of candidate operations fundamentally shapes what architectures can be discovered. Operations must be:
| Domain | Typical Operations | Notes |
|---|---|---|
| Image Classification | 3×3/5×5/7×7 conv, separable conv, dilated conv, pooling, skip | Separable convs reduce params |
| Object Detection | Above + deformable conv, FPN connections | Multi-scale is critical |
| Transformers | Self-attention, FFN variants, different head counts | Attention is computationally heavy |
| Mobile | MBConv (inverted residual), SE blocks, small kernels | Efficiency-focused operations |
If operations differ greatly in parameter count or computation, search may favor simpler options regardless of performance. Normalizing operation costs or using multi-objective search helps.
Connectivity constraints define how nodes can connect. Unconstrained DAGs are too expensive to search; practical spaces use structural constraints.
Common Connectivity Patterns:
123456789101112
def valid_connections_cell(node_idx: int, num_cell_inputs: int = 2): """ For cell-based NAS: node i can connect to: - 2 cell input nodes (indices 0, 1) - Previous intermediate nodes (indices 2 to 2+i-1) """ return list(range(num_cell_inputs + node_idx)) # Node 0: can use inputs [0, 1] (the 2 cell inputs)# Node 1: can use inputs [0, 1, 2] (2 cell inputs + node 0's output)# Node 2: can use inputs [0, 1, 2, 3] (+ node 1's output)# etc.DARTS (Differentiable Architecture Search) introduced a search space that enables gradient-based optimization through continuous relaxation.
Key Design Principles:
Mixed Operations: Instead of selecting one operation per edge, DARTS places a weighted mixture of all operations. Architecture weights $\alpha$ determine the mixture.
Continuous Relaxation: The discrete choice is relaxed: $$\bar{o}^{(i,j)}(x) = \sum_{o \in \mathcal{O}} \frac{\exp(\alpha_o^{(i,j)})}{\sum_{o'} \exp(\alpha_{o'}^{(i,j)})} \cdot o(x)$$
Derivation: After search, the architecture is discretized by taking $\arg\max$ of the architecture weights for each edge.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
import torchimport torch.nn as nnimport torch.nn.functional as F class MixedOp(nn.Module): """DARTS-style mixed operation for differentiable NAS""" def __init__(self, channels, operations): super().__init__() self.ops = nn.ModuleList([ self._build_op(op, channels) for op in operations ]) def forward(self, x, weights): """ Args: x: Input tensor weights: Architecture weights (softmax applied) Returns: Weighted sum of all operation outputs """ return sum(w * op(x) for w, op in zip(weights, self.ops)) def _build_op(self, op_name, channels): # Build operation based on name if op_name == 'none': return Zero(channels) elif op_name == 'skip_connect': return nn.Identity() elif op_name == 'sep_conv_3x3': return SepConv(channels, channels, 3) # ... other operations class DARTSCell(nn.Module): """A DARTS cell with mixed operations on all edges""" def __init__(self, channels, num_nodes=4): super().__init__() self.num_nodes = num_nodes # Create mixed ops for all edges self.edges = nn.ModuleDict() for i in range(num_nodes): for j in range(2 + i): # Can connect to cell inputs + prev nodes self.edges[f'{j}->{2+i}'] = MixedOp(channels, OPERATIONS) def forward(self, s0, s1, arch_weights): """Forward with architecture weights determining op mixtures""" states = [s0, s1] # Cell inputs for i in range(self.num_nodes): node_input = sum( self.edges[f'{j}->{2+i}'](states[j], arch_weights[f'{j}->{2+i}']) for j in range(2 + i) ) states.append(node_input) # Concatenate intermediate node outputs return torch.cat(states[2:], dim=1)The architecture trained with soft mixtures may perform differently when discretized. This 'discretization gap' is a known DARTS limitation—the searched continuous architecture doesn't perfectly match the final discrete one.
Hierarchical search spaces combine the benefits of cell-based and macro search by operating at multiple levels of abstraction.
Two-Level Hierarchy (Common):
Example: Auto-DeepLab
Auto-DeepLab searches for semantic segmentation architectures at two levels:
This discovers both optimal local computations AND optimal multi-scale structure.
You now understand how to design effective NAS search spaces. Next, we'll explore the search strategies that navigate these spaces to find optimal architectures.