Data Augmentation - Learning Module

Loading content...

0/245

AutoAugment: Learning Augmentation Policies

The Augmentation Search Problem

Hand-designing augmentation pipelines is laborious, error-prone, and domain-specific. A strategy that works for ImageNet may fail for medical imaging; what helps satellite imagery may hurt document analysis. Each new domain requires extensive experimentation to find the right combination of transformations, magnitudes, and probabilities.

AutoAugment changes this paradigm fundamentally. Instead of manually designing augmentation policies, we formulate augmentation design as a search problem and let algorithms discover optimal strategies automatically. The result: policies that consistently outperform human intuition, often by discovering non-obvious augmentation combinations.

This page explores the landscape of learned augmentation—from the original reinforcement learning approach through efficient differentiable alternatives. Understanding these methods is essential for practitioners working across diverse domains where transfer of hand-designed augmentations fails.

What You Will Master

By the end of this page, you will understand the AutoAugment search space formulation, implement RandAugment's simplified approach requiring only two hyperparameters, apply FastAutoAugment and differentiable augmentation search, and know when learned policies provide meaningful improvements over manual design.

The AutoAugment Framework

AutoAugment (Cubuk et al., 2019) was the first systematic approach to learning augmentation policies. It formulates augmentation design as a discrete search problem over a structured policy space.

The Search Space

An AutoAugment policy consists of 5 sub-policies, each containing 2 operations. Each operation specifies:

Transformation type: One of 16 possible transformations (rotate, shear, color, etc.)
Probability: How often to apply (0 to 1 in 11 discrete steps)
Magnitude: How strongly to apply (0 to 10 in 11 discrete steps)

During training, a random sub-policy is selected for each image, and both operations within that sub-policy are applied sequentially (if their probability check passes).

Search Space Size

The total number of possible policies is astronomical:

$$|\mathcal{S}| = (16 \times 11 \times 11)^{2 \times 5} = (1936)^{10} \approx 2.9 \times 10^{32}$$

Exhaustive search is impossible. AutoAugment uses Proximal Policy Optimization (PPO), a reinforcement learning algorithm, where:

State: Current augmentation policy
Action: Select operation parameters
Reward: Validation accuracy after training with the policy

AutoAugment Transformation Types
Category	Transformations	Magnitude Meaning
Geometric	ShearX, ShearY, TranslateX, TranslateY, Rotate	Degrees or pixel offset
Photometric	Brightness, Color, Contrast, Sharpness	Intensity factor [0.1, 1.9]
Distortion	AutoContrast, Equalize, Invert, Solarize, Posterize	Threshold or bit depth
Mixing	Cutout, SamplePairing	Region size or blend ratio

The Search Process

Controller RNN generates candidate policies by sequentially predicting operation parameters
Child network is trained from scratch using the candidate policy
Validation accuracy provides the reward signal back to the controller
PPO update adjusts controller weights to favor high-reward policies
Repeat for thousands of iterations until convergence

This process is computationally expensive—the original AutoAugment required 15,000 GPU hours for ImageNet. However, learned policies transfer well: the ImageNet policy works on CIFAR-10, and the CIFAR-10 policy often helps other small-scale datasets.

Discovered Policies Are Surprising

AutoAugment often discovers non-intuitive strategies. For ImageNet, it heavily uses geometric transforms (rotation, shear) but avoids Cutout. For SVHN digit recognition, it emphasizes color inversion and shearing. These discoveries would be difficult to anticipate through manual design.

RandAugment: Simplicity Through Randomization

The complexity of AutoAugment's search motivated simpler alternatives. RandAugment (Cubuk et al., 2020) dramatically simplifies the approach while achieving comparable or better results.

The RandAugment Insight

AutoAugment searches for optimal (operation, probability, magnitude) tuples. RandAugment observes that:

Optimal magnitudes correlate with model/dataset size—larger models benefit from stronger augmentation
Random operation selection works surprisingly well—the specific policy matters less than overall augmentation intensity
Per-operation probabilities add minimal value—uniform random selection is sufficient

RandAugment Formulation

RandAugment needs only two hyperparameters:

N: Number of transformations to apply sequentially (typically 1-3)
M: Global magnitude for all transformations (typically 5-15 on 0-30 scale)

For each image:

Randomly sample N transformations from the pool (uniform with replacement)
Apply each transformation with magnitude M
No per-operation probability—all selected operations are applied

This reduces the search space from $10^{32}$ possibilities to approximately $30 \times 3 = 90$ (N, M) pairs that can be searched via simple grid search.

randaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import torch
import numpy as np
from PIL import Image, ImageOps, ImageEnhance
from typing import List, Tuple, Callable
import random
 
# Define augmentation operations
def shear_x(img: Image.Image, magnitude: float) -> Image.Image:
    """Shear image along x-axis."""
    return img.transform(
        img.size, Image.AFFINE, (1, magnitude, 0, 0, 1, 0),
        resample=Image.BILINEAR
    )
 
def shear_y(img: Image.Image, magnitude: float) -> Image.Image:
    """Shear image along y-axis."""
    return img.transform(
        img.size, Image.AFFINE, (1, 0, 0, magnitude, 1, 0),
        resample=Image.BILINEAR
    )
 
def translate_x(img: Image.Image, magnitude: float) -> Image.Image:
    """Translate image along x-axis."""
    pixels = int(magnitude * img.size[0])
    return img.transform(
        img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
        resample=Image.BILINEAR
    )
 
def translate_y(img: Image.Image, magnitude: float) -> Image.Image:
    """Translate image along y-axis."""
    pixels = int(magnitude * img.size[1])
    return img.transform(
        img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
        resample=Image.BILINEAR
    )
 
def rotate(img: Image.Image, magnitude: float) -> Image.Image:
    """Rotate image by specified degrees."""
    return img.rotate(magnitude, resample=Image.BILINEAR, fillcolor=(128, 128, 128))
 
def brightness(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust image brightness."""
    return ImageEnhance.Brightness(img).enhance(1 + magnitude)
 
def color(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust color saturation."""
    return ImageEnhance.Color(img).enhance(1 + magnitude)
 
def contrast(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust image contrast."""
    return ImageEnhance.Contrast(img).enhance(1 + magnitude)
 
def sharpness(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust image sharpness."""
    return ImageEnhance.Sharpness(img).enhance(1 + magnitude)
 
def auto_contrast(img: Image.Image, magnitude: float) -> Image.Image:
    """Apply auto contrast (magnitude unused)."""
    return ImageOps.autocontrast(img)
 
def equalize(img: Image.Image, magnitude: float) -> Image.Image:
    """Histogram equalization (magnitude unused)."""
    return ImageOps.equalize(img)
 
def invert(img: Image.Image, magnitude: float) -> Image.Image:
    """Invert colors (magnitude unused)."""
    return ImageOps.invert(img)
 
def solarize(img: Image.Image, magnitude: float) -> Image.Image:
    """Solarize image above threshold."""
    threshold = int(magnitude * 255)
    return ImageOps.solarize(img, threshold)
 
def posterize(img: Image.Image, magnitude: float) -> Image.Image:
    """Reduce color bit depth."""
    bits = int(8 - magnitude * 4)  # map magnitude to 4-8 bits
    bits = max(1, min(8, bits))
    return ImageOps.posterize(img, bits)
 
 
class RandAugment:
    """
    RandAugment data augmentation as described in:
    'RandAugment: Practical automated data augmentation
     with a reduced search space' (Cubuk et al., 2020)
    
    Applies N random transformations each with magnitude M.
    """
    
    # Operation pool with (function, magnitude_range)
    AUGMENTATION_POOL = [
        ("ShearX", shear_x, (-0.3, 0.3)),
        ("ShearY", shear_y, (-0.3, 0.3)),
        ("TranslateX", translate_x, (-0.45, 0.45)),
        ("TranslateY", translate_y, (-0.45, 0.45)),
        ("Rotate", rotate, (-30, 30)),
        ("Brightness", brightness, (-0.9, 0.9)),
        ("Color", color, (-0.9, 0.9)),
        ("Contrast", contrast, (-0.9, 0.9)),
        ("Sharpness", sharpness, (-0.9, 0.9)),
        ("AutoContrast", auto_contrast, (0, 0)),
        ("Equalize", equalize, (0, 0)),
        ("Invert", invert, (0, 0)),
        ("Solarize", solarize, (0, 1)),
        ("Posterize", posterize, (0, 1)),
    ]
    
    def __init__(
        self,
        n: int = 2,
        m: int = 10,
        max_magnitude: int = 30
    ):
        """
        Parameters:
        -----------
        n : int
            Number of transformations to apply per image
        m : int
            Global magnitude (0 to max_magnitude)
        max_magnitude : int
            Maximum possible magnitude value
        """
        self.n = n
        self.m = m
        self.max_magnitude = max_magnitude
    
    def _apply_op(
        self,
        img: Image.Image,
        op_name: str,
        op_fn: Callable,
        magnitude_range: Tuple[float, float]
    ) -> Image.Image:
        """
        Apply operation with magnitude scaled to range.
        """
        min_val, max_val = magnitude_range
        magnitude = (self.m / self.max_magnitude) * (max_val - min_val) + min_val
        
        # Random sign for symmetric operations
        if random.random() < 0.5 and min_val < 0:
            magnitude = -magnitude
        
        return op_fn(img, magnitude)
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """
        Apply RandAugment to image.
        
        Parameters:
        -----------
        img : PIL Image
            Input image
        
        Returns:
        --------
        Augmented PIL Image
        """
        # Randomly select N operations
        ops = random.choices(self.AUGMENTATION_POOL, k=self.n)
        
        # Apply each operation
        for op_name, op_fn, magnitude_range in ops:
            img = self._apply_op(img, op_name, op_fn, magnitude_range)
        
        return img
 
 
# Example usage
def get_imagenet_train_transform(randaug_n: int = 2, randaug_m: int = 9):
    """
    Create ImageNet training transform with RandAugment.
    """
    from torchvision import transforms
    
    return transforms.Compose([
        transforms.RandomResizedCrop(224, scale=(0.08, 1.0)),
        transforms.RandomHorizontalFlip(),
        RandAugment(n=randaug_n, m=randaug_m),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        ),
    ])

Scaling Magnitude with Model Size

A key insight of RandAugment is that optimal magnitude scales with model size and dataset size:

Model	Dataset	Optimal N	Optimal M
ResNet-50	ImageNet	2	9
EfficientNet-B7	ImageNet	2	17
Wide-ResNet-28-10	CIFAR-10	3	6
ResNet-200	ImageNet	2	14

Larger models have more capacity and need stronger regularization (higher M). Smaller datasets also benefit from stronger augmentation to prevent overfitting.

Quick RandAugment Tuning

Start with N=2, M=10 for most applications. If training shows underfitting (training accuracy low), reduce M. If validation accuracy lags training (overfitting), increase M. Grid search over M ∈ {5, 7, 9, 11, 13, 15} typically suffices.

Fast AutoAugment: Efficient Policy Search

The 15,000 GPU-hour cost of AutoAugment motivated more efficient search methods. Fast AutoAugment (Lim et al., 2019) reduces search time to under 5 GPU-hours while achieving comparable performance.

The Key Insight: Density Matching

AutoAugment trains child models to completion for each candidate policy—the main computational bottleneck. Fast AutoAugment observes that effective augmentation policies should create augmented images that match the distribution of held-out validation data:

$$\text{Policy}^* = \arg\min_\pi D_{KL}(p_{aug}(x|\pi) || p_{val}(x))$$

This can be estimated without full training through density matching proxies.

The Search Algorithm

Split training data into K folds (K=5 typically)
For each fold:
- Train a baseline model on K-1 folds
- Search for policies that make holdout fold most predictable
Evaluate policies using the trained model as a probe
Combine top policies from all folds

The search uses Bayesian optimization with Tree-structured Parzen Estimators (TPE) to efficiently explore the policy space.

fast_autoaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
import numpy as np
from typing import List, Tuple, Dict
from dataclasses import dataclass
import torch
import torch.nn as nn
 
@dataclass
class AugmentOperation:
    """Represents a single augmentation operation."""
    name: str
    probability: float
    magnitude: float
 
@dataclass  
class SubPolicy:
    """A sub-policy contains two operations."""
    op1: AugmentOperation
    op2: AugmentOperation
 
@dataclass
class Policy:
    """A full policy contains 5 sub-policies."""
    sub_policies: List[SubPolicy]
 
 
class FastAutoAugmentSearch:
    """
    Simplified Fast AutoAugment search implementation.
    
    Uses Bayesian optimization to find policies that maximize
    validation set predictability under augmentation.
    """
    
    def __init__(
        self,
        model: nn.Module,
        train_loader: torch.utils.data.DataLoader,
        val_loader: torch.utils.data.DataLoader,
        device: str = 'cuda',
        num_trials: int = 100
    ):
        self.model = model
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.device = device
        self.num_trials = num_trials
        
        # Available operations
        self.operations = [
            'ShearX', 'ShearY', 'TranslateX', 'TranslateY',
            'Rotate', 'AutoContrast', 'Invert', 'Equalize',
            'Solarize', 'Posterize', 'Contrast', 'Color',
            'Brightness', 'Sharpness', 'Cutout'
        ]
    
    def _sample_subpolicy(self) -> SubPolicy:
        """Sample a random sub-policy."""
        return SubPolicy(
            op1=AugmentOperation(
                name=np.random.choice(self.operations),
                probability=np.random.uniform(0, 1),
                magnitude=np.random.uniform(0, 1)
            ),
            op2=AugmentOperation(
                name=np.random.choice(self.operations),
                probability=np.random.uniform(0, 1),
                magnitude=np.random.uniform(0, 1)
            )
        )
    
    def _sample_policy(self) -> Policy:
        """Sample a random policy with 5 sub-policies."""
        return Policy(
            sub_policies=[self._sample_subpolicy() for _ in range(5)]
        )
    
    def _evaluate_policy(
        self,
        policy: Policy
    ) -> float:
        """
        Evaluate a policy by measuring how well the model predicts
        augmented validation samples.
        
        Uses the insight that good augmentations create samples
        that the model can still correctly classify.
        """
        self.model.eval()
        correct = 0
        total = 0
        
        with torch.no_grad():
            for images, labels in self.val_loader:
                # Apply policy to validation images
                augmented = self._apply_policy(images, policy)
                augmented = augmented.to(self.device)
                labels = labels.to(self.device)
                
                outputs = self.model(augmented)
                _, predicted = outputs.max(1)
                
                correct += predicted.eq(labels).sum().item()
                total += labels.size(0)
        
        return correct / total
    
    def _apply_policy(
        self,
        images: torch.Tensor,
        policy: Policy
    ) -> torch.Tensor:
        """
        Apply policy to batch of images.
        
        Randomly selects one sub-policy per image.
        """
        B = images.size(0)
        augmented = images.clone()
        
        for i in range(B):
            # Random sub-policy selection
            sub_policy = np.random.choice(policy.sub_policies)
            
            # Apply operations probabilistically
            if np.random.random() < sub_policy.op1.probability:
                augmented[i] = self._apply_operation(
                    augmented[i],
                    sub_policy.op1.name,
                    sub_policy.op1.magnitude
                )
            
            if np.random.random() < sub_policy.op2.probability:
                augmented[i] = self._apply_operation(
                    augmented[i],
                    sub_policy.op2.name,
                    sub_policy.op2.magnitude
                )
        
        return augmented
    
    def _apply_operation(
        self,
        img: torch.Tensor,
        op_name: str,
        magnitude: float
    ) -> torch.Tensor:
        """
        Apply a single operation to an image tensor.
        Implementation would map op_name to actual transforms.
        """
        # Placeholder - actual implementation would apply PIL transforms
        return img
    
    def search(self) -> Policy:
        """
        Run the policy search.
        
        Uses random search as a simple baseline.
        Full implementation would use TPE Bayesian optimization.
        """
        best_policy = None
        best_score = 0
        
        for trial in range(self.num_trials):
            # Sample candidate policy
            policy = self._sample_policy()
            
            # Evaluate
            score = self._evaluate_policy(policy)
            
            if score > best_score:
                best_score = score
                best_policy = policy
                print(f"Trial {trial}: New best score = {score:.4f}")
        
        return best_policy

Faster with Population-Based Training

Population-Based Augmentation (PBA) takes a different approach: instead of searching for a fixed policy, it evolves policies during training:

Maintain a population of models with different augmentation policies
Periodically evaluate all models on validation
Exploit: Copy weights from best-performing models
Explore: Perturb copied policies with random mutations

This discovers dynamic schedules where augmentation intensity changes through training—often starting mild and increasing.

DADA: Differentiable AutoAugment

DADA (Li et al., 2020) makes the search fully differentiable by relaxing discrete choices:

$$\text{aug}(x) = \sum_{o \in \mathcal{O}} \alpha_o \cdot T_o(x, m_o)$$

where $\alpha_o$ are softmax-weighted probabilities over operations $o$, learned end-to-end with the model. This reduces search to a single training run.

Practical Recommendation

For most practitioners, RandAugment or the pretrained AutoAugment policies are sufficient. Fast AutoAugment or DADA are worthwhile only when: (1) working on a specialized domain where standard policies don't transfer, (2) training many models where amortized search cost is low, or (3) seeking the last 0.1-0.3% accuracy improvement.

TrivialAugment: Zero Hyperparameters

Even RandAugment's two hyperparameters (N, M) require tuning. TrivialAugment (Müller & Hutter, 2021) takes simplification to its logical conclusion: zero hyperparameters.

The TrivialAugment Algorithm

For each image:

Sample one random operation from the pool (uniform)
Sample a random magnitude uniformly from the valid range
Apply the operation

That's it. No N to tune, no M to set. Despite (or perhaps because of) this simplicity, TrivialAugment matches or exceeds RandAugment's performance.

Why Does This Work?

The success of TrivialAugment challenges assumptions about augmentation design:

1. Single operations suffice Applying multiple transforms (N>1) doesn't consistently improve results. The key is consistent exposure to transformations, not stacking them.

2. Random magnitude reduces overfitting Fixed magnitude can cause the model to memorize specific distortion levels. Random magnitude forces learning across the full transform spectrum.

3. Hyperparameter sensitivity harms generalization Models trained with "optimal" (N, M) for the validation set may overfit those specific settings. Random sampling is more robust.

trivialaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import random
from PIL import Image, ImageOps, ImageEnhance
from typing import Tuple, List
 
class TrivialAugment:
    """
    TrivialAugment data augmentation.
    
    No hyperparameters: randomly samples one operation with
    random magnitude for each image.
    
    Reference:
    'TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation'
    (Müller & Hutter, 2021)
    """
    
    # Each operation: (name, function, magnitude_bins)
    # magnitude_bins[i] = magnitude value for bin i
    AUGMENTATION_SPACE = [
        ("Identity", lambda img, v: img, [0]),
        ("ShearX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, v, 0, 0, 1, 0), resample=Image.BILINEAR
        ), [-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3]),
        ("ShearY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, v, 1, 0), resample=Image.BILINEAR  
        ), [-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3]),
        ("TranslateX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, int(v * img.size[0]), 0, 1, 0), resample=Image.BILINEAR
        ), [-0.45, -0.30, -0.15, 0, 0.15, 0.30, 0.45]),
        ("TranslateY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, 0, 1, int(v * img.size[1])), resample=Image.BILINEAR
        ), [-0.45, -0.30, -0.15, 0, 0.15, 0.30, 0.45]),
        ("Rotate", lambda img, v: img.rotate(v, resample=Image.BILINEAR, fillcolor=(128,128,128)),
         [-30, -20, -10, 0, 10, 20, 30]),
        ("Brightness", lambda img, v: ImageEnhance.Brightness(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("Color", lambda img, v: ImageEnhance.Color(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("Contrast", lambda img, v: ImageEnhance.Contrast(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("Sharpness", lambda img, v: ImageEnhance.Sharpness(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("AutoContrast", lambda img, v: ImageOps.autocontrast(img), [0]),
        ("Equalize", lambda img, v: ImageOps.equalize(img), [0]),
        ("Solarize", lambda img, v: ImageOps.solarize(img, int(v)),
         [256, 200, 150, 100, 50, 0]),
        ("Posterize", lambda img, v: ImageOps.posterize(img, int(v)),
         [8, 7, 6, 5, 4, 3, 2, 1]),
    ]
    
    def __init__(self, exclude_identity: bool = False):
        """
        Parameters:
        -----------
        exclude_identity : bool
            If True, never sample the Identity (no-op) transform
        """
        self.augmentations = [
            aug for aug in self.AUGMENTATION_SPACE
            if not (exclude_identity and aug[0] == "Identity")
        ]
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """
        Apply TrivialAugment to a single image.
        
        1. Uniformly sample one operation
        2. Uniformly sample one magnitude from that operation's range
        3. Apply and return
        """
        # Sample random operation
        op_name, op_fn, magnitudes = random.choice(self.augmentations)
        
        # Sample random magnitude
        magnitude = random.choice(magnitudes)
        
        # Apply operation
        return op_fn(img, magnitude)
 
 
class TrivialAugmentWide(TrivialAugment):
    """
    TrivialAugment-Wide variant with continuous magnitude sampling.
    
    Instead of discrete magnitude bins, samples uniformly from
    the full continuous range for each operation.
    """
    
    # (name, function, (min_magnitude, max_magnitude))
    AUGMENTATION_SPACE_WIDE = [
        ("Identity", lambda img, v: img, (0, 0)),
        ("ShearX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, v, 0, 0, 1, 0), resample=Image.BILINEAR
        ), (-0.99, 0.99)),
        ("ShearY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, v, 1, 0), resample=Image.BILINEAR
        ), (-0.99, 0.99)),
        ("TranslateX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, int(v * img.size[0]), 0, 1, 0)
        ), (-0.5, 0.5)),
        ("TranslateY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, 0, 1, int(v * img.size[1]))
        ), (-0.5, 0.5)),
        ("Rotate", lambda img, v: img.rotate(v, fillcolor=(128, 128, 128)),
         (-135, 135)),
        ("Brightness", lambda img, v: ImageEnhance.Brightness(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("Color", lambda img, v: ImageEnhance.Color(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("Contrast", lambda img, v: ImageEnhance.Contrast(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("Sharpness", lambda img, v: ImageEnhance.Sharpness(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("AutoContrast", lambda img, v: ImageOps.autocontrast(img), (0, 0)),
        ("Equalize", lambda img, v: ImageOps.equalize(img), (0, 0)),
    ]
    
    def __init__(self):
        self.augmentations = self.AUGMENTATION_SPACE_WIDE
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """Apply TrivialAugment-Wide with continuous magnitude."""
        op_name, op_fn, (min_mag, max_mag) = random.choice(self.augmentations)
        magnitude = random.uniform(min_mag, max_mag)
        return op_fn(img, magnitude)

Comparison of Learned Augmentation Methods
Method	Hyperparameters	Search Cost	ImageNet Top-1	Simplicity
AutoAugment	~30 per operation	15,000 GPU-hours	77.6%	★☆☆☆☆
Fast AutoAugment	~30 per operation	3.5 GPU-hours	77.6%	★★☆☆☆
RandAugment	2 (N, M)	Grid search	77.6%	★★★★☆
TrivialAugment	0	None	77.7%	★★★★★

TrivialAugment as Default

For new projects, TrivialAugment is an excellent default choice. It requires no tuning, matches state-of-the-art performance, and eliminates augmentation hyperparameter search from your workflow. Only consider alternatives if TrivialAugment underperforms on your specific domain.

Domain-Specific Augmentation Policies

While general-purpose policies work well for natural images, specialized domains often require carefully designed or searched policies that respect domain-specific constraints.

Medical Imaging

Constraints:

Color information may be clinically significant (no random hue shifts)
Orientation may matter (no arbitrary rotation for chest X-rays)
Intensity distributions encode diagnostic information

Appropriate augmentations:

Elastic deformation (tissue variability)
Intensity windowing (imaging protocol variations)
Limited rotation (patient positioning)
Additive Gaussian noise (sensor noise)

Satellite and Aerial Imagery

Constraints:

All orientations may be valid (clouds don't have a canonical "up")
Scale varies significantly between acquisitions
Atmospheric effects change color balance

Appropriate augmentations:

90° rotations (all equally valid)
Scale jitter (varying altitudes)
Color/contrast adjustment (atmospheric conditions)
Cutout/occlusion (cloud coverage)

Domain-Specific Recommendations

•Natural Images: Standard RandAugment/TrivialAugment
•Documents/OCR: Perspective, elastic, background variation
•Histopathology: Color normalization, rotation, stain augmentation
•Autonomous Driving: Weather simulation, lighting, sensor noise
•Industrial Inspection: Lighting variation, geometric distortion

Augmentations to Avoid

•Medical: Aggressive color changes, extreme crops
•OCR: Horizontal flipping (reverses text)
•Satellite: Vertical flipping (sometimes), extreme blur
•Face Recognition: Heavy occlusion, extreme rotation
•Time Series (images): Temporal inconsistency-inducing transforms

Searching Domain-Specific Policies

When standard policies don't transfer, domain-specific search is worthwhile:

1. Define a custom operation pool

Include domain-relevant transforms
Exclude inappropriate ones
Add specialized transforms (stain augmentation, weather effects)

2. Run efficient search

RandAugment-style grid search: O(100) evaluations
Fast AutoAugment: O(1000) evaluations with density matching
Population-based training: evolves policies during training

3. Validate on held-out domain data

Test on challenging domain shifts
Check that augmentations preserve task-relevant information
Verify no mode collapse (some classes disappearing)

domain_specific_randaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
from typing import List, Tuple, Callable
import random
from PIL import Image
 
class DomainSpecificRandAugment:
    """
    RandAugment with customizable operation pool for domain-specific
    applications.
    
    Allows defining custom operations and excluding inappropriate
    standard operations.
    """
    
    def __init__(
        self,
        n: int = 2,
        m: int = 10,
        operations: List[Tuple[str, Callable, Tuple[float, float]]] = None
    ):
        """
        Parameters:
        -----------
        n : int
            Number of operations to apply
        m : int  
            Global magnitude (0-30 scale)
        operations : list
            Custom operation pool. Each tuple contains:
            (name, function, (min_magnitude, max_magnitude))
            If None, uses default RandAugment operations
        """
        self.n = n
        self.m = m
        self.operations = operations or self._default_operations()
    
    def _default_operations(self):
        """Default RandAugment operation pool."""
        # Standard operations - see full RandAugment implementation
        return []
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """Apply domain-specific RandAugment."""
        ops = random.choices(self.operations, k=self.n)
        
        for name, op_fn, (min_mag, max_mag) in ops:
            # Scale magnitude to operation range
            magnitude = (self.m / 30) * (max_mag - min_mag) + min_mag
            img = op_fn(img, magnitude)
        
        return img
 
 
# Example: Medical Imaging Policy
def create_medical_randaugment(n: int = 2, m: int = 7):
    """
    Create RandAugment policy appropriate for medical imaging.
    
    Excludes color changes that might affect diagnosis.
    Includes elastic deformation for tissue variability.
    """
    from scipy.ndimage import gaussian_filter, map_coordinates
    import numpy as np
    
    def elastic_deformation(img: Image.Image, magnitude: float) -> Image.Image:
        """Apply elastic deformation appropriate for medical images."""
        img_array = np.array(img)
        alpha = magnitude * 100  # Displacement intensity
        sigma = magnitude * 5   # Smoothness
        
        shape = img_array.shape[:2]
        dx = gaussian_filter(
            (np.random.rand(*shape) * 2 - 1), sigma
        ) * alpha
        dy = gaussian_filter(
            (np.random.rand(*shape) * 2 - 1), sigma
        ) * alpha
        
        x, y = np.meshgrid(np.arange(shape[1]), np.arange(shape[0]))
        indices = [np.reshape(y + dy, (-1,)), np.reshape(x + dx, (-1,))]
        
        result = np.zeros_like(img_array)
        for c in range(img_array.shape[2] if img_array.ndim == 3 else 1):
            channel = img_array[:, :, c] if img_array.ndim == 3 else img_array
            result[:, :, c] = map_coordinates(
                channel, indices, order=1, mode='reflect'
            ).reshape(shape)
        
        return Image.fromarray(result.astype(np.uint8))
    
    def intensity_shift(img: Image.Image, magnitude: float) -> Image.Image:
        """Shift pixel intensities (simulates exposure variations)."""
        img_array = np.array(img).astype(np.float32)
        shift = magnitude * 30 - 15  # +/- 15 intensity units
        img_array = np.clip(img_array + shift, 0, 255)
        return Image.fromarray(img_array.astype(np.uint8))
    
    MEDICAL_OPERATIONS = [
        # Allowed geometric transforms
        ("Rotate", lambda img, v: img.rotate(v * 30 - 15, fillcolor=0), (0, 1)),
        ("TranslateX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, int((v - 0.5) * 0.1 * img.size[0]), 0, 1, 0)
        ), (0, 1)),
        ("TranslateY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, 0, 1, int((v - 0.5) * 0.1 * img.size[1]))
        ), (0, 1)),
        
        # Medical-specific
        ("ElasticDeform", elastic_deformation, (0.1, 0.5)),
        ("IntensityShift", intensity_shift, (0, 1)),
        
        # Safe photometric
        ("GaussianNoise", lambda img, v: img, (0, 1)),  # Placeholder
        ("GaussianBlur", lambda img, v: img.filter(
            ImageFilter.GaussianBlur(radius=v)
        ), (0, 2)),
    ]
    
    return DomainSpecificRandAugment(
        n=n,
        m=m,
        operations=MEDICAL_OPERATIONS
    )

Summary: Learned Augmentation Policies

We've traversed the landscape of learned augmentation—from expensive reinforcement learning search through surprisingly effective zero-hyperparameter approaches.

Key Takeaways

•AutoAugment pioneered policy search—formulating augmentation design as RL over a structured search space with ~10^32 possibilities.
•RandAugment reduces complexity drastically—just two hyperparameters (N, M) achieve comparable results via random operation selection.
•Fast AutoAugment uses density matching—reducing search from 15,000 GPU-hours to under 5 by using pretrained models as probes.
•TrivialAugment eliminates hyperparameters entirely—random single-operation selection with random magnitude matches state-of-the-art.
•Optimal magnitude scales with model size—larger models benefit from stronger augmentation, providing a simple tuning heuristic.
•Domain-specific policies may be necessary—medical, satellite, and document domains require customized operation pools.
•Default recommendation: TrivialAugment—zero tuning required, strong results, simple implementation.

What's Next:

We've explored training-time augmentation extensively. Now we'll examine Test-Time Augmentation (TTA)—where augmentations are applied at inference to improve prediction robustness and uncertainty estimation. TTA provides nearly free accuracy gains by aggregating predictions across augmented views of test inputs.

Learned Augmentation Mastered

You now understand the evolution of learned augmentation from expensive AutoAugment through practical TrivialAugment. For most applications, TrivialAugment or RandAugment provides an excellent starting point without hyperparameter tuning overhead.

AutoAugment: Learning Augmentation Policies

The Augmentation Search Problem

What You Will Master

The AutoAugment Framework

The Search Space

An AutoAugment policy consists of 5 sub-policies, each containing 2 operations. Each operation specifies:

Transformation type: One of 16 possible transformations (rotate, shear, color, etc.)
Probability: How often to apply (0 to 1 in 11 discrete steps)
Magnitude: How strongly to apply (0 to 10 in 11 discrete steps)

During training, a random sub-policy is selected for each image, and both operations within that sub-policy are applied sequentially (if their probability check passes).

Search Space Size

The total number of possible policies is astronomical:

$$|\mathcal{S}| = (16 \times 11 \times 11)^{2 \times 5} = (1936)^{10} \approx 2.9 \times 10^{32}$$

Exhaustive search is impossible. AutoAugment uses Proximal Policy Optimization (PPO), a reinforcement learning algorithm, where:

State: Current augmentation policy
Action: Select operation parameters
Reward: Validation accuracy after training with the policy

AutoAugment Transformation Types
Category	Transformations	Magnitude Meaning
Geometric	ShearX, ShearY, TranslateX, TranslateY, Rotate	Degrees or pixel offset
Photometric	Brightness, Color, Contrast, Sharpness	Intensity factor [0.1, 1.9]
Distortion	AutoContrast, Equalize, Invert, Solarize, Posterize	Threshold or bit depth
Mixing	Cutout, SamplePairing	Region size or blend ratio

The Search Process

Controller RNN generates candidate policies by sequentially predicting operation parameters
Child network is trained from scratch using the candidate policy
Validation accuracy provides the reward signal back to the controller
PPO update adjusts controller weights to favor high-reward policies
Repeat for thousands of iterations until convergence

Discovered Policies Are Surprising

RandAugment: Simplicity Through Randomization

The complexity of AutoAugment's search motivated simpler alternatives. RandAugment (Cubuk et al., 2020) dramatically simplifies the approach while achieving comparable or better results.

The RandAugment Insight

AutoAugment searches for optimal (operation, probability, magnitude) tuples. RandAugment observes that:

Optimal magnitudes correlate with model/dataset size—larger models benefit from stronger augmentation
Random operation selection works surprisingly well—the specific policy matters less than overall augmentation intensity
Per-operation probabilities add minimal value—uniform random selection is sufficient

RandAugment Formulation

RandAugment needs only two hyperparameters:

N: Number of transformations to apply sequentially (typically 1-3)
M: Global magnitude for all transformations (typically 5-15 on 0-30 scale)

For each image:

Randomly sample N transformations from the pool (uniform with replacement)
Apply each transformation with magnitude M
No per-operation probability—all selected operations are applied

This reduces the search space from $10^{32}$ possibilities to approximately $30 \times 3 = 90$ (N, M) pairs that can be searched via simple grid search.

randaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import torch
import numpy as np
from PIL import Image, ImageOps, ImageEnhance
from typing import List, Tuple, Callable
import random
 
# Define augmentation operations
def shear_x(img: Image.Image, magnitude: float) -> Image.Image:
    """Shear image along x-axis."""
    return img.transform(
        img.size, Image.AFFINE, (1, magnitude, 0, 0, 1, 0),
        resample=Image.BILINEAR
    )
 
def shear_y(img: Image.Image, magnitude: float) -> Image.Image:
    """Shear image along y-axis."""
    return img.transform(
        img.size, Image.AFFINE, (1, 0, 0, magnitude, 1, 0),
        resample=Image.BILINEAR
    )
 
def translate_x(img: Image.Image, magnitude: float) -> Image.Image:
    """Translate image along x-axis."""
    pixels = int(magnitude * img.size[0])
    return img.transform(
        img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
        resample=Image.BILINEAR
    )
 
def translate_y(img: Image.Image, magnitude: float) -> Image.Image:
    """Translate image along y-axis."""
    pixels = int(magnitude * img.size[1])
    return img.transform(
        img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
        resample=Image.BILINEAR
    )
 
def rotate(img: Image.Image, magnitude: float) -> Image.Image:
    """Rotate image by specified degrees."""
    return img.rotate(magnitude, resample=Image.BILINEAR, fillcolor=(128, 128, 128))
 
def brightness(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust image brightness."""
    return ImageEnhance.Brightness(img).enhance(1 + magnitude)
 
def color(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust color saturation."""
    return ImageEnhance.Color(img).enhance(1 + magnitude)
 
def contrast(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust image contrast."""
    return ImageEnhance.Contrast(img).enhance(1 + magnitude)
 
def sharpness(img: Image.Image, magnitude: float) -> Image.Image:
    """Adjust image sharpness."""
    return ImageEnhance.Sharpness(img).enhance(1 + magnitude)
 
def auto_contrast(img: Image.Image, magnitude: float) -> Image.Image:
    """Apply auto contrast (magnitude unused)."""
    return ImageOps.autocontrast(img)
 
def equalize(img: Image.Image, magnitude: float) -> Image.Image:
    """Histogram equalization (magnitude unused)."""
    return ImageOps.equalize(img)
 
def invert(img: Image.Image, magnitude: float) -> Image.Image:
    """Invert colors (magnitude unused)."""
    return ImageOps.invert(img)
 
def solarize(img: Image.Image, magnitude: float) -> Image.Image:
    """Solarize image above threshold."""
    threshold = int(magnitude * 255)
    return ImageOps.solarize(img, threshold)
 
def posterize(img: Image.Image, magnitude: float) -> Image.Image:
    """Reduce color bit depth."""
    bits = int(8 - magnitude * 4)  # map magnitude to 4-8 bits
    bits = max(1, min(8, bits))
    return ImageOps.posterize(img, bits)
 
 
class RandAugment:
    """
    RandAugment data augmentation as described in:
    'RandAugment: Practical automated data augmentation
     with a reduced search space' (Cubuk et al., 2020)
    
    Applies N random transformations each with magnitude M.
    """
    
    # Operation pool with (function, magnitude_range)
    AUGMENTATION_POOL = [
        ("ShearX", shear_x, (-0.3, 0.3)),
        ("ShearY", shear_y, (-0.3, 0.3)),
        ("TranslateX", translate_x, (-0.45, 0.45)),
        ("TranslateY", translate_y, (-0.45, 0.45)),
        ("Rotate", rotate, (-30, 30)),
        ("Brightness", brightness, (-0.9, 0.9)),
        ("Color", color, (-0.9, 0.9)),
        ("Contrast", contrast, (-0.9, 0.9)),
        ("Sharpness", sharpness, (-0.9, 0.9)),
        ("AutoContrast", auto_contrast, (0, 0)),
        ("Equalize", equalize, (0, 0)),
        ("Invert", invert, (0, 0)),
        ("Solarize", solarize, (0, 1)),
        ("Posterize", posterize, (0, 1)),
    ]
    
    def __init__(
        self,
        n: int = 2,
        m: int = 10,
        max_magnitude: int = 30
    ):
        """
        Parameters:
        -----------
        n : int
            Number of transformations to apply per image
        m : int
            Global magnitude (0 to max_magnitude)
        max_magnitude : int
            Maximum possible magnitude value
        """
        self.n = n
        self.m = m
        self.max_magnitude = max_magnitude
    
    def _apply_op(
        self,
        img: Image.Image,
        op_name: str,
        op_fn: Callable,
        magnitude_range: Tuple[float, float]
    ) -> Image.Image:
        """
        Apply operation with magnitude scaled to range.
        """
        min_val, max_val = magnitude_range
        magnitude = (self.m / self.max_magnitude) * (max_val - min_val) + min_val
        
        # Random sign for symmetric operations
        if random.random() < 0.5 and min_val < 0:
            magnitude = -magnitude
        
        return op_fn(img, magnitude)
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """
        Apply RandAugment to image.
        
        Parameters:
        -----------
        img : PIL Image
            Input image
        
        Returns:
        --------
        Augmented PIL Image
        """
        # Randomly select N operations
        ops = random.choices(self.AUGMENTATION_POOL, k=self.n)
        
        # Apply each operation
        for op_name, op_fn, magnitude_range in ops:
            img = self._apply_op(img, op_name, op_fn, magnitude_range)
        
        return img
 
 
# Example usage
def get_imagenet_train_transform(randaug_n: int = 2, randaug_m: int = 9):
    """
    Create ImageNet training transform with RandAugment.
    """
    from torchvision import transforms
    
    return transforms.Compose([
        transforms.RandomResizedCrop(224, scale=(0.08, 1.0)),
        transforms.RandomHorizontalFlip(),
        RandAugment(n=randaug_n, m=randaug_m),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        ),
    ])

Scaling Magnitude with Model Size

A key insight of RandAugment is that optimal magnitude scales with model size and dataset size:

Model	Dataset	Optimal N	Optimal M
ResNet-50	ImageNet	2	9
EfficientNet-B7	ImageNet	2	17
Wide-ResNet-28-10	CIFAR-10	3	6
ResNet-200	ImageNet	2	14

Larger models have more capacity and need stronger regularization (higher M). Smaller datasets also benefit from stronger augmentation to prevent overfitting.

Quick RandAugment Tuning

Fast AutoAugment: Efficient Policy Search

The Key Insight: Density Matching

$$\text{Policy}^* = \arg\min_\pi D_{KL}(p_{aug}(x|\pi) || p_{val}(x))$$

This can be estimated without full training through density matching proxies.

The Search Algorithm

Split training data into K folds (K=5 typically)
For each fold:
- Train a baseline model on K-1 folds
- Search for policies that make holdout fold most predictable
Evaluate policies using the trained model as a probe
Combine top policies from all folds

The search uses Bayesian optimization with Tree-structured Parzen Estimators (TPE) to efficiently explore the policy space.

fast_autoaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
import numpy as np
from typing import List, Tuple, Dict
from dataclasses import dataclass
import torch
import torch.nn as nn
 
@dataclass
class AugmentOperation:
    """Represents a single augmentation operation."""
    name: str
    probability: float
    magnitude: float
 
@dataclass  
class SubPolicy:
    """A sub-policy contains two operations."""
    op1: AugmentOperation
    op2: AugmentOperation
 
@dataclass
class Policy:
    """A full policy contains 5 sub-policies."""
    sub_policies: List[SubPolicy]
 
 
class FastAutoAugmentSearch:
    """
    Simplified Fast AutoAugment search implementation.
    
    Uses Bayesian optimization to find policies that maximize
    validation set predictability under augmentation.
    """
    
    def __init__(
        self,
        model: nn.Module,
        train_loader: torch.utils.data.DataLoader,
        val_loader: torch.utils.data.DataLoader,
        device: str = 'cuda',
        num_trials: int = 100
    ):
        self.model = model
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.device = device
        self.num_trials = num_trials
        
        # Available operations
        self.operations = [
            'ShearX', 'ShearY', 'TranslateX', 'TranslateY',
            'Rotate', 'AutoContrast', 'Invert', 'Equalize',
            'Solarize', 'Posterize', 'Contrast', 'Color',
            'Brightness', 'Sharpness', 'Cutout'
        ]
    
    def _sample_subpolicy(self) -> SubPolicy:
        """Sample a random sub-policy."""
        return SubPolicy(
            op1=AugmentOperation(
                name=np.random.choice(self.operations),
                probability=np.random.uniform(0, 1),
                magnitude=np.random.uniform(0, 1)
            ),
            op2=AugmentOperation(
                name=np.random.choice(self.operations),
                probability=np.random.uniform(0, 1),
                magnitude=np.random.uniform(0, 1)
            )
        )
    
    def _sample_policy(self) -> Policy:
        """Sample a random policy with 5 sub-policies."""
        return Policy(
            sub_policies=[self._sample_subpolicy() for _ in range(5)]
        )
    
    def _evaluate_policy(
        self,
        policy: Policy
    ) -> float:
        """
        Evaluate a policy by measuring how well the model predicts
        augmented validation samples.
        
        Uses the insight that good augmentations create samples
        that the model can still correctly classify.
        """
        self.model.eval()
        correct = 0
        total = 0
        
        with torch.no_grad():
            for images, labels in self.val_loader:
                # Apply policy to validation images
                augmented = self._apply_policy(images, policy)
                augmented = augmented.to(self.device)
                labels = labels.to(self.device)
                
                outputs = self.model(augmented)
                _, predicted = outputs.max(1)
                
                correct += predicted.eq(labels).sum().item()
                total += labels.size(0)
        
        return correct / total
    
    def _apply_policy(
        self,
        images: torch.Tensor,
        policy: Policy
    ) -> torch.Tensor:
        """
        Apply policy to batch of images.
        
        Randomly selects one sub-policy per image.
        """
        B = images.size(0)
        augmented = images.clone()
        
        for i in range(B):
            # Random sub-policy selection
            sub_policy = np.random.choice(policy.sub_policies)
            
            # Apply operations probabilistically
            if np.random.random() < sub_policy.op1.probability:
                augmented[i] = self._apply_operation(
                    augmented[i],
                    sub_policy.op1.name,
                    sub_policy.op1.magnitude
                )
            
            if np.random.random() < sub_policy.op2.probability:
                augmented[i] = self._apply_operation(
                    augmented[i],
                    sub_policy.op2.name,
                    sub_policy.op2.magnitude
                )
        
        return augmented
    
    def _apply_operation(
        self,
        img: torch.Tensor,
        op_name: str,
        magnitude: float
    ) -> torch.Tensor:
        """
        Apply a single operation to an image tensor.
        Implementation would map op_name to actual transforms.
        """
        # Placeholder - actual implementation would apply PIL transforms
        return img
    
    def search(self) -> Policy:
        """
        Run the policy search.
        
        Uses random search as a simple baseline.
        Full implementation would use TPE Bayesian optimization.
        """
        best_policy = None
        best_score = 0
        
        for trial in range(self.num_trials):
            # Sample candidate policy
            policy = self._sample_policy()
            
            # Evaluate
            score = self._evaluate_policy(policy)
            
            if score > best_score:
                best_score = score
                best_policy = policy
                print(f"Trial {trial}: New best score = {score:.4f}")
        
        return best_policy

Faster with Population-Based Training

Population-Based Augmentation (PBA) takes a different approach: instead of searching for a fixed policy, it evolves policies during training:

Maintain a population of models with different augmentation policies
Periodically evaluate all models on validation
Exploit: Copy weights from best-performing models
Explore: Perturb copied policies with random mutations

This discovers dynamic schedules where augmentation intensity changes through training—often starting mild and increasing.

DADA: Differentiable AutoAugment

DADA (Li et al., 2020) makes the search fully differentiable by relaxing discrete choices:

$$\text{aug}(x) = \sum_{o \in \mathcal{O}} \alpha_o \cdot T_o(x, m_o)$$

where $\alpha_o$ are softmax-weighted probabilities over operations $o$, learned end-to-end with the model. This reduces search to a single training run.

Practical Recommendation

TrivialAugment: Zero Hyperparameters

Even RandAugment's two hyperparameters (N, M) require tuning. TrivialAugment (Müller & Hutter, 2021) takes simplification to its logical conclusion: zero hyperparameters.

The TrivialAugment Algorithm

For each image:

Sample one random operation from the pool (uniform)
Sample a random magnitude uniformly from the valid range
Apply the operation

That's it. No N to tune, no M to set. Despite (or perhaps because of) this simplicity, TrivialAugment matches or exceeds RandAugment's performance.

Why Does This Work?

The success of TrivialAugment challenges assumptions about augmentation design:

1. Single operations suffice Applying multiple transforms (N>1) doesn't consistently improve results. The key is consistent exposure to transformations, not stacking them.

2. Random magnitude reduces overfitting Fixed magnitude can cause the model to memorize specific distortion levels. Random magnitude forces learning across the full transform spectrum.

3. Hyperparameter sensitivity harms generalization Models trained with "optimal" (N, M) for the validation set may overfit those specific settings. Random sampling is more robust.

trivialaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import random
from PIL import Image, ImageOps, ImageEnhance
from typing import Tuple, List
 
class TrivialAugment:
    """
    TrivialAugment data augmentation.
    
    No hyperparameters: randomly samples one operation with
    random magnitude for each image.
    
    Reference:
    'TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation'
    (Müller & Hutter, 2021)
    """
    
    # Each operation: (name, function, magnitude_bins)
    # magnitude_bins[i] = magnitude value for bin i
    AUGMENTATION_SPACE = [
        ("Identity", lambda img, v: img, [0]),
        ("ShearX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, v, 0, 0, 1, 0), resample=Image.BILINEAR
        ), [-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3]),
        ("ShearY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, v, 1, 0), resample=Image.BILINEAR  
        ), [-0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3]),
        ("TranslateX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, int(v * img.size[0]), 0, 1, 0), resample=Image.BILINEAR
        ), [-0.45, -0.30, -0.15, 0, 0.15, 0.30, 0.45]),
        ("TranslateY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, 0, 1, int(v * img.size[1])), resample=Image.BILINEAR
        ), [-0.45, -0.30, -0.15, 0, 0.15, 0.30, 0.45]),
        ("Rotate", lambda img, v: img.rotate(v, resample=Image.BILINEAR, fillcolor=(128,128,128)),
         [-30, -20, -10, 0, 10, 20, 30]),
        ("Brightness", lambda img, v: ImageEnhance.Brightness(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("Color", lambda img, v: ImageEnhance.Color(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("Contrast", lambda img, v: ImageEnhance.Contrast(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("Sharpness", lambda img, v: ImageEnhance.Sharpness(img).enhance(1 + v),
         [-0.9, -0.6, -0.3, 0, 0.3, 0.6, 0.9]),
        ("AutoContrast", lambda img, v: ImageOps.autocontrast(img), [0]),
        ("Equalize", lambda img, v: ImageOps.equalize(img), [0]),
        ("Solarize", lambda img, v: ImageOps.solarize(img, int(v)),
         [256, 200, 150, 100, 50, 0]),
        ("Posterize", lambda img, v: ImageOps.posterize(img, int(v)),
         [8, 7, 6, 5, 4, 3, 2, 1]),
    ]
    
    def __init__(self, exclude_identity: bool = False):
        """
        Parameters:
        -----------
        exclude_identity : bool
            If True, never sample the Identity (no-op) transform
        """
        self.augmentations = [
            aug for aug in self.AUGMENTATION_SPACE
            if not (exclude_identity and aug[0] == "Identity")
        ]
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """
        Apply TrivialAugment to a single image.
        
        1. Uniformly sample one operation
        2. Uniformly sample one magnitude from that operation's range
        3. Apply and return
        """
        # Sample random operation
        op_name, op_fn, magnitudes = random.choice(self.augmentations)
        
        # Sample random magnitude
        magnitude = random.choice(magnitudes)
        
        # Apply operation
        return op_fn(img, magnitude)
 
 
class TrivialAugmentWide(TrivialAugment):
    """
    TrivialAugment-Wide variant with continuous magnitude sampling.
    
    Instead of discrete magnitude bins, samples uniformly from
    the full continuous range for each operation.
    """
    
    # (name, function, (min_magnitude, max_magnitude))
    AUGMENTATION_SPACE_WIDE = [
        ("Identity", lambda img, v: img, (0, 0)),
        ("ShearX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, v, 0, 0, 1, 0), resample=Image.BILINEAR
        ), (-0.99, 0.99)),
        ("ShearY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, v, 1, 0), resample=Image.BILINEAR
        ), (-0.99, 0.99)),
        ("TranslateX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, int(v * img.size[0]), 0, 1, 0)
        ), (-0.5, 0.5)),
        ("TranslateY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, 0, 1, int(v * img.size[1]))
        ), (-0.5, 0.5)),
        ("Rotate", lambda img, v: img.rotate(v, fillcolor=(128, 128, 128)),
         (-135, 135)),
        ("Brightness", lambda img, v: ImageEnhance.Brightness(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("Color", lambda img, v: ImageEnhance.Color(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("Contrast", lambda img, v: ImageEnhance.Contrast(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("Sharpness", lambda img, v: ImageEnhance.Sharpness(img).enhance(1 + v),
         (-0.99, 0.99)),
        ("AutoContrast", lambda img, v: ImageOps.autocontrast(img), (0, 0)),
        ("Equalize", lambda img, v: ImageOps.equalize(img), (0, 0)),
    ]
    
    def __init__(self):
        self.augmentations = self.AUGMENTATION_SPACE_WIDE
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """Apply TrivialAugment-Wide with continuous magnitude."""
        op_name, op_fn, (min_mag, max_mag) = random.choice(self.augmentations)
        magnitude = random.uniform(min_mag, max_mag)
        return op_fn(img, magnitude)

Comparison of Learned Augmentation Methods
Method	Hyperparameters	Search Cost	ImageNet Top-1	Simplicity
AutoAugment	~30 per operation	15,000 GPU-hours	77.6%	★☆☆☆☆
Fast AutoAugment	~30 per operation	3.5 GPU-hours	77.6%	★★☆☆☆
RandAugment	2 (N, M)	Grid search	77.6%	★★★★☆
TrivialAugment	0	None	77.7%	★★★★★

TrivialAugment as Default

Domain-Specific Augmentation Policies

While general-purpose policies work well for natural images, specialized domains often require carefully designed or searched policies that respect domain-specific constraints.

Medical Imaging

Constraints:

Color information may be clinically significant (no random hue shifts)
Orientation may matter (no arbitrary rotation for chest X-rays)
Intensity distributions encode diagnostic information

Appropriate augmentations:

Elastic deformation (tissue variability)
Intensity windowing (imaging protocol variations)
Limited rotation (patient positioning)
Additive Gaussian noise (sensor noise)

Satellite and Aerial Imagery

Constraints:

All orientations may be valid (clouds don't have a canonical "up")
Scale varies significantly between acquisitions
Atmospheric effects change color balance

Appropriate augmentations:

90° rotations (all equally valid)
Scale jitter (varying altitudes)
Color/contrast adjustment (atmospheric conditions)
Cutout/occlusion (cloud coverage)

Domain-Specific Recommendations

•Natural Images: Standard RandAugment/TrivialAugment
•Documents/OCR: Perspective, elastic, background variation
•Histopathology: Color normalization, rotation, stain augmentation
•Autonomous Driving: Weather simulation, lighting, sensor noise
•Industrial Inspection: Lighting variation, geometric distortion

Augmentations to Avoid

•Medical: Aggressive color changes, extreme crops
•OCR: Horizontal flipping (reverses text)
•Satellite: Vertical flipping (sometimes), extreme blur
•Face Recognition: Heavy occlusion, extreme rotation
•Time Series (images): Temporal inconsistency-inducing transforms

Searching Domain-Specific Policies

When standard policies don't transfer, domain-specific search is worthwhile:

1. Define a custom operation pool

Include domain-relevant transforms
Exclude inappropriate ones
Add specialized transforms (stain augmentation, weather effects)

2. Run efficient search

RandAugment-style grid search: O(100) evaluations
Fast AutoAugment: O(1000) evaluations with density matching
Population-based training: evolves policies during training

3. Validate on held-out domain data

Test on challenging domain shifts
Check that augmentations preserve task-relevant information
Verify no mode collapse (some classes disappearing)

domain_specific_randaugment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
from typing import List, Tuple, Callable
import random
from PIL import Image
 
class DomainSpecificRandAugment:
    """
    RandAugment with customizable operation pool for domain-specific
    applications.
    
    Allows defining custom operations and excluding inappropriate
    standard operations.
    """
    
    def __init__(
        self,
        n: int = 2,
        m: int = 10,
        operations: List[Tuple[str, Callable, Tuple[float, float]]] = None
    ):
        """
        Parameters:
        -----------
        n : int
            Number of operations to apply
        m : int  
            Global magnitude (0-30 scale)
        operations : list
            Custom operation pool. Each tuple contains:
            (name, function, (min_magnitude, max_magnitude))
            If None, uses default RandAugment operations
        """
        self.n = n
        self.m = m
        self.operations = operations or self._default_operations()
    
    def _default_operations(self):
        """Default RandAugment operation pool."""
        # Standard operations - see full RandAugment implementation
        return []
    
    def __call__(self, img: Image.Image) -> Image.Image:
        """Apply domain-specific RandAugment."""
        ops = random.choices(self.operations, k=self.n)
        
        for name, op_fn, (min_mag, max_mag) in ops:
            # Scale magnitude to operation range
            magnitude = (self.m / 30) * (max_mag - min_mag) + min_mag
            img = op_fn(img, magnitude)
        
        return img
 
 
# Example: Medical Imaging Policy
def create_medical_randaugment(n: int = 2, m: int = 7):
    """
    Create RandAugment policy appropriate for medical imaging.
    
    Excludes color changes that might affect diagnosis.
    Includes elastic deformation for tissue variability.
    """
    from scipy.ndimage import gaussian_filter, map_coordinates
    import numpy as np
    
    def elastic_deformation(img: Image.Image, magnitude: float) -> Image.Image:
        """Apply elastic deformation appropriate for medical images."""
        img_array = np.array(img)
        alpha = magnitude * 100  # Displacement intensity
        sigma = magnitude * 5   # Smoothness
        
        shape = img_array.shape[:2]
        dx = gaussian_filter(
            (np.random.rand(*shape) * 2 - 1), sigma
        ) * alpha
        dy = gaussian_filter(
            (np.random.rand(*shape) * 2 - 1), sigma
        ) * alpha
        
        x, y = np.meshgrid(np.arange(shape[1]), np.arange(shape[0]))
        indices = [np.reshape(y + dy, (-1,)), np.reshape(x + dx, (-1,))]
        
        result = np.zeros_like(img_array)
        for c in range(img_array.shape[2] if img_array.ndim == 3 else 1):
            channel = img_array[:, :, c] if img_array.ndim == 3 else img_array
            result[:, :, c] = map_coordinates(
                channel, indices, order=1, mode='reflect'
            ).reshape(shape)
        
        return Image.fromarray(result.astype(np.uint8))
    
    def intensity_shift(img: Image.Image, magnitude: float) -> Image.Image:
        """Shift pixel intensities (simulates exposure variations)."""
        img_array = np.array(img).astype(np.float32)
        shift = magnitude * 30 - 15  # +/- 15 intensity units
        img_array = np.clip(img_array + shift, 0, 255)
        return Image.fromarray(img_array.astype(np.uint8))
    
    MEDICAL_OPERATIONS = [
        # Allowed geometric transforms
        ("Rotate", lambda img, v: img.rotate(v * 30 - 15, fillcolor=0), (0, 1)),
        ("TranslateX", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, int((v - 0.5) * 0.1 * img.size[0]), 0, 1, 0)
        ), (0, 1)),
        ("TranslateY", lambda img, v: img.transform(
            img.size, Image.AFFINE, (1, 0, 0, 0, 1, int((v - 0.5) * 0.1 * img.size[1]))
        ), (0, 1)),
        
        # Medical-specific
        ("ElasticDeform", elastic_deformation, (0.1, 0.5)),
        ("IntensityShift", intensity_shift, (0, 1)),
        
        # Safe photometric
        ("GaussianNoise", lambda img, v: img, (0, 1)),  # Placeholder
        ("GaussianBlur", lambda img, v: img.filter(
            ImageFilter.GaussianBlur(radius=v)
        ), (0, 2)),
    ]
    
    return DomainSpecificRandAugment(
        n=n,
        m=m,
        operations=MEDICAL_OPERATIONS
    )

Summary: Learned Augmentation Policies

We've traversed the landscape of learned augmentation—from expensive reinforcement learning search through surprisingly effective zero-hyperparameter approaches.

Key Takeaways

•AutoAugment pioneered policy search—formulating augmentation design as RL over a structured search space with ~10^32 possibilities.
•RandAugment reduces complexity drastically—just two hyperparameters (N, M) achieve comparable results via random operation selection.
•Fast AutoAugment uses density matching—reducing search from 15,000 GPU-hours to under 5 by using pretrained models as probes.
•TrivialAugment eliminates hyperparameters entirely—random single-operation selection with random magnitude matches state-of-the-art.
•Optimal magnitude scales with model size—larger models benefit from stronger augmentation, providing a simple tuning heuristic.
•Domain-specific policies may be necessary—medical, satellite, and document domains require customized operation pools.
•Default recommendation: TrivialAugment—zero tuning required, strong results, simple implementation.

What's Next:

Learned Augmentation Mastered