Machine LearningSemi-Supervised & Self-Supervised Learning

Contrastive Learning

LevelAdvanced

Duration90 mins

TopicSemi-Supervised & Self-Supervised Learning

5 / 5

Data Augmentation: The Cornerstone of Contrastive Learning

Why Augmentation Matters More Than Architecture

If there is one lesson from the contrastive learning revolution, it is this: data augmentation matters more than almost any other design choice. SimCLR's ablation studies showed that the right augmentation strategy can improve performance by 10-15%, far exceeding the impact of architectural changes.

This isn't coincidental—augmentation is the mechanism through which we define positive pairs, and positive pairs define what the representation should capture. Augmentation is not a data preprocessing trick; it is the specification of what invariances you want your model to learn.

What You Will Learn

By the end of this page, you will understand: (1) Why augmentation is uniquely important for contrastive learning, (2) The principles of effective augmentation design, (3) Common augmentation strategies and their effects, (4) How to compose augmentations for maximum impact, and (5) Domain-specific augmentation considerations.

The Unique Role of Augmentation in Contrastive Learning

In supervised learning, augmentation provides regularization and implicit data expansion. In contrastive learning, augmentation does something fundamentally different: it defines the learning task itself.

Augmentation Creates the Pretext Task

Without augmentation, contrastive learning has no positive pairs (excluding multi-view or temporal setups). The entire self-supervised signal comes from the relationship between augmented views:

$$\text{Learning signal} = f(\text{view}_1, \text{view}_2)$$

The augmentations determine:

What positives share — Semantic content preserved under augmentation
What positives differ in — Variations the model learns to ignore
Task difficulty — How hard it is to recognize positives vs. negatives

Empirical Evidence of Augmentation Dominance

SimCLR Ablation: Impact of Different Design Choices
Component Changed	Accuracy Change	Relative Impact
Add color jittering	+11.1%	Largest single factor
Add random crop (vs. resize)	+7.4%	Second largest
Add Gaussian blur	+1.5%	Moderate
MLP projection head (vs. linear)	+5.2%	Significant
2x wider ResNet	+3.1%	Moderate
4x longer training	+2.8%	Moderate

Color jittering alone provides more improvement than doubling model width or quadrupling training time. This finding fundamentally changed how the field thinks about self-supervised learning.

The Invariance-Variance Tradeoff

Augmentation controls a critical tradeoff:

Strong augmentation:

Creates hard positives → forces learning of semantic features
Makes model invariant to many variations
May remove information useful for some downstream tasks

Weak augmentation:

Creates easy positives → may not learn deep semantics
Preserves more information in representations
May not transfer well to tasks with distribution shift

Think Augmentation First

When starting a contrastive learning project, invest more time in augmentation strategy than architecture selection. A well-designed augmentation pipeline with a standard ResNet-50 will typically outperform a poorly-designed one with a more powerful backbone.

Core Augmentation Strategies for Vision

Let's examine the most important augmentations for image contrastive learning and understand their effects.

1. Random Resized Crop

The single most important augmentation. It:

Forces the model to recognize objects at different scales
Creates corresponding patches between views
Encourages learning of part-whole relationships
Provides spatial invariance

crop_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import torchvision.transforms as T
 
# SimCLR default: aggressive cropping
simclr_crop = T.RandomResizedCrop(
    size=224,
    scale=(0.08, 1.0),   # Can crop down to 8% of image area
    ratio=(0.75, 1.33),  # Aspect ratio variation
    interpolation=T.InterpolationMode.BICUBIC
)
 
# Conservative alternative (less aggressive)
conservative_crop = T.RandomResizedCrop(
    size=224,
    scale=(0.5, 1.0),    # Minimum 50% of image area
    ratio=(0.9, 1.1),    # Near-square aspect ratio
)
 
# Scale parameter effect:
# scale=(0.08, 1.0): May capture small patches - very hard positives
# scale=(0.2, 1.0):  Moderate patches - balanced difficulty
# scale=(0.5, 1.0):  Large patches - easier positives

2. Color Distortion

Color jittering prevents the model from using color histograms as shortcuts. Without it, the model can distinguish images purely by color statistics, never learning semantic features.

Components:

Brightness — Overall lightness adjustment
Contrast — Range of luminance values
Saturation — Color intensity
Hue — Color rotation in HSV space

Color Jittering Parameters Across Methods
Method	Brightness	Contrast	Saturation	Hue	Jitter Prob
SimCLR	0.8	0.8	0.8	0.2	0.8
MoCo v2	0.4	0.4	0.4	0.1	0.8
BYOL	0.4	0.4	0.2	0.1	0.8
SwAV	0.8	0.8	0.8	0.2	0.8

3. Random Grayscale Conversion

With probability 20% (typically), convert image to grayscale. This further prevents color-based shortcuts and encourages learning of shape and texture.

4. Gaussian Blur

Blurs the image with a Gaussian kernel, applied with 50% probability in SimCLR. This:

Reduces reliance on high-frequency texture
Encourages focus on global structure
Makes representations more robust to image quality variations

5. Horizontal Flip

Simple left-right flip with 50% probability. Provides orientation invariance for most objects (though not for text or asymmetric objects like clocks).

The Color Jitter Revelation

SimCLR found that without color jittering, models learn to distinguish images by color histogram—a trivial solution. Adding grayscale increases the importance of color jitter by making color-based shortcuts even less reliable. The combination is crucial.

Composing Augmentations Effectively

Individual augmentations are important, but their composition determines final effectiveness. The order and combination of augmentations creates a distribution of transformations that defines the positive pair distribution.

Composition Order Matters

Augmentations should be applied in a principled order:

Geometric transforms first — Crop, flip, rotate
Color transforms second — Jitter, grayscale
Noise/blur third — Blur, noise
Normalization last — To tensor, normalize

This order ensures that:

Spatial transforms don't create artifacts from color boundaries
Color transforms apply to the actual content (not empty borders)
Blur/noise apply to the final image composition

augmentation_composition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import torchvision.transforms as T
from PIL import ImageFilter
import random
 
class GaussianBlur:
    def __init__(self, sigma=[0.1, 2.0]):
        self.sigma = sigma
 
    def __call__(self, x):
        sigma = random.uniform(self.sigma[0], self.sigma[1])
        x = x.filter(ImageFilter.GaussianBlur(radius=sigma))
        return x
 
class Solarize:
    def __init__(self, threshold=128):
        self.threshold = threshold
 
    def __call__(self, x):
        return ImageOps.solarize(x, self.threshold)
 
def get_contrastive_augmentation(strength='strong'):
    """
    Get augmentation pipeline with configurable strength.
    
    Args:
        strength: 'weak', 'medium', or 'strong'
    
    Returns:
        Composed augmentation transform
    """
    # Strength configurations
    configs = {
        'weak': {
            'crop_scale': (0.5, 1.0),
            'color_strength': 0.4,
            'blur_prob': 0.0,
            'gray_prob': 0.1,
        },
        'medium': {
            'crop_scale': (0.2, 1.0),
            'color_strength': 0.6,
            'blur_prob': 0.3,
            'gray_prob': 0.2,
        },
        'strong': {
            'crop_scale': (0.08, 1.0),
            'color_strength': 0.8,
            'blur_prob': 0.5,
            'gray_prob': 0.2,
        },
    }
    
    cfg = configs[strength]
    s = cfg['color_strength']
    
    transform = T.Compose([
        # 1. Geometric transforms
        T.RandomResizedCrop(224, scale=cfg['crop_scale']),
        T.RandomHorizontalFlip(p=0.5),
        
        # 2. Color transforms
        T.RandomApply([
            T.ColorJitter(
                brightness=0.8 * s,
                contrast=0.8 * s,
                saturation=0.8 * s,
                hue=0.2 * s
            )
        ], p=0.8),
        T.RandomGrayscale(p=cfg['gray_prob']),
        
        # 3. Blur/noise
        T.RandomApply([GaussianBlur()], p=cfg['blur_prob']),
        
        # 4. Normalization
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406],
                   std=[0.229, 0.224, 0.225]),
    ])
    
    return transform

Multi-Crop Strategy (SwAV)

SwAV introduced an influential multi-crop strategy:

Generate 2 full-resolution crops (224×224)
Generate 4-6 low-resolution crops (96×96)

Benefits:

More views per image at lower memory cost
Low-res crops provide local context
High-res crops provide global context
Enables multi-scale representation learning

This strategy has been adopted by many subsequent methods (DINO, BYOL variants) and consistently improves performance.

Augmentation in Other Domains

Contrastive learning extends beyond images, but each domain requires domain-specific augmentation design.

Natural Language Processing

Text Augmentation Strategies for Contrastive Learning
Strategy	Description	Considerations
Back-translation	Translate to another language and back	High quality; computationally expensive
Synonym replacement	Replace words with synonyms	Fast; may change meaning subtly
Word deletion	Randomly remove words	Simple; can remove important content
Sentence reordering	Shuffle sentence order in document	For longer texts; preserves content
Dropout embedding	Apply dropout to word embeddings	Implicit; used in SimCSE

SimCSE's Key Insight: Simply passing the same sentence through the encoder twice with different dropout masks creates positive pairs. This "free" augmentation achieves state-of-the-art sentence embeddings.

Audio and Speech

Time stretching — Speed up or slow down audio
Pitch shifting — Change pitch while preserving duration
Adding noise — Background noise, reverberation
Time masking — Mask contiguous time segments
Frequency masking — Mask frequency bands

Graph Data

Node dropping — Remove random nodes
Edge perturbation — Add/remove edges
Subgraph sampling — Extract random subgraphs
Attribute masking — Mask node/edge features

Domain Expertise Required

Generic augmentation strategies rarely transfer across domains. Medical imaging has different requirements than natural images. Molecule graphs differ from social networks. Always consult domain experts and validate that augmentations preserve semantics relevant to your task.

Domain-Specific Augmentation Design

When applying contrastive learning to specialized domains, augmentation must be carefully adapted.

Medical Imaging

Challenges:

Subtle diagnostic features that aggressive augmentation may destroy
Spatial relationships are often clinically meaningful
Color normalization differs across scanners/stains

Recommendations:

Conservative cropping (preserve lesions)
Careful color jittering (stay within realistic ranges)
Domain-specific transforms (stain normalization for histopathology)
Validate with domain experts that augmentations don't change diagnosis

Medical Imaging Augmentation Checklist

•Verify invariances align with clinical meaning — Rotation may not be valid for chest X-rays; color changes matter for dermatology
•Limit crop aggressiveness — 8% minimum scale may crop out tumors entirely
•Consider intensity normalization — Different scanners produce different intensity distributions
•Test on downstream clinical tasks — Generic ImageNet transfer may not reflect medical performance

Satellite and Aerial Imagery

Considerations:

Rotation is often valid (no canonical "up")
Scale has physical meaning (resolution matter)
Atmospheric conditions create natural color variation
Temporal variation provides natural positives

Autonomous Driving

Considerations:

Weather augmentation (rain, fog, snow) critical for robustness
Horizontal flip may swap left/right lanes
Scale changes affect distance perception
Time-of-day variations important

Document Understanding

Considerations:

Rotation often invalid (text has orientation)
Color jitter may reduce legibility
Cropping may cut text mid-word
Scanning artifacts as natural augmentation

Advanced Augmentation Techniques

Recent work has explored learned and adaptive augmentation strategies.

AutoAugment for Contrastive Learning

Instead of hand-designed augmentation policies, learn them:

Search space — Define set of candidate augmentations
Objective — Optimize for downstream task or contrastive objective
Search method — Reinforcement learning, evolutionary strategies, or gradient-based

Challenges:

Computationally expensive (requires many training runs)
May overfit to specific dataset/task
Difficult to interpret learned policies

Augmentation Curriculum

Start with easier augmentations and gradually increase difficulty:

Early training: Conservative crops, mild color jitter
Mid training: Standard augmentation strength
Late training: Aggressive augmentations for fine-tuning

This curriculum can stabilize training and improve final performance.

Neural Augmentation

Use neural networks to generate augmentations:

Style transfer — Apply artistic styles as augmentation
GAN-generated variations — Generate semantically similar images
Adversarial augmentation — Generate hard positives via optimization

These approaches can create more diverse training signal but add complexity.

Start Simple

Advanced augmentation techniques provide marginal gains in most settings. Start with the standard SimCLR pipeline, validate it works for your domain, then consider advanced techniques only if you've exhausted simpler improvements.

Summary: Augmentation as Specification

Data augmentation in contrastive learning is not a data preprocessing step—it is the specification of what you want the model to learn. Every augmentation choice encodes an assumption about what variations are semantically irrelevant.

Key Takeaways

•Augmentation is the most impactful design choice — More important than architecture or training duration.
•Crop and color jitter are essential — Together they prevent trivial solutions and force semantic learning.
•Composition order matters — Geometric → color → blur → normalize for consistent results.
•Domain-specific adaptation is required — ImageNet augmentations don't transfer to medical imaging or other specialized domains.
•Strong augmentations need many negatives — Hard positives require strong repulsive force to prevent collapse.
•Validate invariances against downstream tasks — Ensure augmentation-encoded invariances match task requirements.

Module Complete

You have completed Module 5: Contrastive Learning. You now understand InfoNCE loss, SimCLR and MoCo frameworks, positive/negative pair dynamics, and the critical role of data augmentation. These principles form the foundation for modern self-supervised visual representation learning and transfer broadly to other modalities.

5 / 5

Loading learning content...

Machine LearningSemi-Supervised & Self-Supervised Learning

Contrastive Learning

LevelAdvanced

Duration90 mins

TopicSemi-Supervised & Self-Supervised Learning

5 / 5

Data Augmentation: The Cornerstone of Contrastive Learning

Why Augmentation Matters More Than Architecture

What You Will Learn

The Unique Role of Augmentation in Contrastive Learning

Augmentation Creates the Pretext Task

Without augmentation, contrastive learning has no positive pairs (excluding multi-view or temporal setups). The entire self-supervised signal comes from the relationship between augmented views:

$$\text{Learning signal} = f(\text{view}_1, \text{view}_2)$$

The augmentations determine:

What positives share — Semantic content preserved under augmentation
What positives differ in — Variations the model learns to ignore
Task difficulty — How hard it is to recognize positives vs. negatives

Empirical Evidence of Augmentation Dominance

SimCLR Ablation: Impact of Different Design Choices
Component Changed	Accuracy Change	Relative Impact
Add color jittering	+11.1%	Largest single factor
Add random crop (vs. resize)	+7.4%	Second largest
Add Gaussian blur	+1.5%	Moderate
MLP projection head (vs. linear)	+5.2%	Significant
2x wider ResNet	+3.1%	Moderate
4x longer training	+2.8%	Moderate

Color jittering alone provides more improvement than doubling model width or quadrupling training time. This finding fundamentally changed how the field thinks about self-supervised learning.

The Invariance-Variance Tradeoff

Augmentation controls a critical tradeoff:

Strong augmentation:

Creates hard positives → forces learning of semantic features
Makes model invariant to many variations
May remove information useful for some downstream tasks

Weak augmentation:

Creates easy positives → may not learn deep semantics
Preserves more information in representations
May not transfer well to tasks with distribution shift

Think Augmentation First

Core Augmentation Strategies for Vision

Let's examine the most important augmentations for image contrastive learning and understand their effects.

1. Random Resized Crop

The single most important augmentation. It:

Forces the model to recognize objects at different scales
Creates corresponding patches between views
Encourages learning of part-whole relationships
Provides spatial invariance

crop_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import torchvision.transforms as T
 
# SimCLR default: aggressive cropping
simclr_crop = T.RandomResizedCrop(
    size=224,
    scale=(0.08, 1.0),   # Can crop down to 8% of image area
    ratio=(0.75, 1.33),  # Aspect ratio variation
    interpolation=T.InterpolationMode.BICUBIC
)
 
# Conservative alternative (less aggressive)
conservative_crop = T.RandomResizedCrop(
    size=224,
    scale=(0.5, 1.0),    # Minimum 50% of image area
    ratio=(0.9, 1.1),    # Near-square aspect ratio
)
 
# Scale parameter effect:
# scale=(0.08, 1.0): May capture small patches - very hard positives
# scale=(0.2, 1.0):  Moderate patches - balanced difficulty
# scale=(0.5, 1.0):  Large patches - easier positives

2. Color Distortion

Color jittering prevents the model from using color histograms as shortcuts. Without it, the model can distinguish images purely by color statistics, never learning semantic features.

Components:

Brightness — Overall lightness adjustment
Contrast — Range of luminance values
Saturation — Color intensity
Hue — Color rotation in HSV space

Color Jittering Parameters Across Methods
Method	Brightness	Contrast	Saturation	Hue	Jitter Prob
SimCLR	0.8	0.8	0.8	0.2	0.8
MoCo v2	0.4	0.4	0.4	0.1	0.8
BYOL	0.4	0.4	0.2	0.1	0.8
SwAV	0.8	0.8	0.8	0.2	0.8

3. Random Grayscale Conversion

With probability 20% (typically), convert image to grayscale. This further prevents color-based shortcuts and encourages learning of shape and texture.

4. Gaussian Blur

Blurs the image with a Gaussian kernel, applied with 50% probability in SimCLR. This:

Reduces reliance on high-frequency texture
Encourages focus on global structure
Makes representations more robust to image quality variations

5. Horizontal Flip

Simple left-right flip with 50% probability. Provides orientation invariance for most objects (though not for text or asymmetric objects like clocks).

The Color Jitter Revelation

Composing Augmentations Effectively

Composition Order Matters

Augmentations should be applied in a principled order:

Geometric transforms first — Crop, flip, rotate
Color transforms second — Jitter, grayscale
Noise/blur third — Blur, noise
Normalization last — To tensor, normalize

This order ensures that:

Spatial transforms don't create artifacts from color boundaries
Color transforms apply to the actual content (not empty borders)
Blur/noise apply to the final image composition

augmentation_composition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import torchvision.transforms as T
from PIL import ImageFilter
import random
 
class GaussianBlur:
    def __init__(self, sigma=[0.1, 2.0]):
        self.sigma = sigma
 
    def __call__(self, x):
        sigma = random.uniform(self.sigma[0], self.sigma[1])
        x = x.filter(ImageFilter.GaussianBlur(radius=sigma))
        return x
 
class Solarize:
    def __init__(self, threshold=128):
        self.threshold = threshold
 
    def __call__(self, x):
        return ImageOps.solarize(x, self.threshold)
 
def get_contrastive_augmentation(strength='strong'):
    """
    Get augmentation pipeline with configurable strength.
    
    Args:
        strength: 'weak', 'medium', or 'strong'
    
    Returns:
        Composed augmentation transform
    """
    # Strength configurations
    configs = {
        'weak': {
            'crop_scale': (0.5, 1.0),
            'color_strength': 0.4,
            'blur_prob': 0.0,
            'gray_prob': 0.1,
        },
        'medium': {
            'crop_scale': (0.2, 1.0),
            'color_strength': 0.6,
            'blur_prob': 0.3,
            'gray_prob': 0.2,
        },
        'strong': {
            'crop_scale': (0.08, 1.0),
            'color_strength': 0.8,
            'blur_prob': 0.5,
            'gray_prob': 0.2,
        },
    }
    
    cfg = configs[strength]
    s = cfg['color_strength']
    
    transform = T.Compose([
        # 1. Geometric transforms
        T.RandomResizedCrop(224, scale=cfg['crop_scale']),
        T.RandomHorizontalFlip(p=0.5),
        
        # 2. Color transforms
        T.RandomApply([
            T.ColorJitter(
                brightness=0.8 * s,
                contrast=0.8 * s,
                saturation=0.8 * s,
                hue=0.2 * s
            )
        ], p=0.8),
        T.RandomGrayscale(p=cfg['gray_prob']),
        
        # 3. Blur/noise
        T.RandomApply([GaussianBlur()], p=cfg['blur_prob']),
        
        # 4. Normalization
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406],
                   std=[0.229, 0.224, 0.225]),
    ])
    
    return transform

Multi-Crop Strategy (SwAV)

SwAV introduced an influential multi-crop strategy:

Generate 2 full-resolution crops (224×224)
Generate 4-6 low-resolution crops (96×96)

Benefits:

More views per image at lower memory cost
Low-res crops provide local context
High-res crops provide global context
Enables multi-scale representation learning

This strategy has been adopted by many subsequent methods (DINO, BYOL variants) and consistently improves performance.

Augmentation in Other Domains

Contrastive learning extends beyond images, but each domain requires domain-specific augmentation design.

Natural Language Processing

Text Augmentation Strategies for Contrastive Learning
Strategy	Description	Considerations
Back-translation	Translate to another language and back	High quality; computationally expensive
Synonym replacement	Replace words with synonyms	Fast; may change meaning subtly
Word deletion	Randomly remove words	Simple; can remove important content
Sentence reordering	Shuffle sentence order in document	For longer texts; preserves content
Dropout embedding	Apply dropout to word embeddings	Implicit; used in SimCSE

Audio and Speech

Time stretching — Speed up or slow down audio
Pitch shifting — Change pitch while preserving duration
Adding noise — Background noise, reverberation
Time masking — Mask contiguous time segments
Frequency masking — Mask frequency bands

Graph Data

Node dropping — Remove random nodes
Edge perturbation — Add/remove edges
Subgraph sampling — Extract random subgraphs
Attribute masking — Mask node/edge features

Domain Expertise Required

Domain-Specific Augmentation Design

When applying contrastive learning to specialized domains, augmentation must be carefully adapted.

Medical Imaging

Challenges:

Subtle diagnostic features that aggressive augmentation may destroy
Spatial relationships are often clinically meaningful
Color normalization differs across scanners/stains

Recommendations:

Conservative cropping (preserve lesions)
Careful color jittering (stay within realistic ranges)
Domain-specific transforms (stain normalization for histopathology)
Validate with domain experts that augmentations don't change diagnosis

Medical Imaging Augmentation Checklist

•Verify invariances align with clinical meaning — Rotation may not be valid for chest X-rays; color changes matter for dermatology
•Limit crop aggressiveness — 8% minimum scale may crop out tumors entirely
•Consider intensity normalization — Different scanners produce different intensity distributions
•Test on downstream clinical tasks — Generic ImageNet transfer may not reflect medical performance

Satellite and Aerial Imagery

Considerations:

Rotation is often valid (no canonical "up")
Scale has physical meaning (resolution matter)
Atmospheric conditions create natural color variation
Temporal variation provides natural positives

Autonomous Driving

Considerations:

Weather augmentation (rain, fog, snow) critical for robustness
Horizontal flip may swap left/right lanes
Scale changes affect distance perception
Time-of-day variations important

Document Understanding

Considerations:

Rotation often invalid (text has orientation)
Color jitter may reduce legibility
Cropping may cut text mid-word
Scanning artifacts as natural augmentation

Advanced Augmentation Techniques

Recent work has explored learned and adaptive augmentation strategies.

AutoAugment for Contrastive Learning

Instead of hand-designed augmentation policies, learn them:

Search space — Define set of candidate augmentations
Objective — Optimize for downstream task or contrastive objective
Search method — Reinforcement learning, evolutionary strategies, or gradient-based

Challenges:

Computationally expensive (requires many training runs)
May overfit to specific dataset/task
Difficult to interpret learned policies

Augmentation Curriculum

Start with easier augmentations and gradually increase difficulty:

Early training: Conservative crops, mild color jitter
Mid training: Standard augmentation strength
Late training: Aggressive augmentations for fine-tuning

This curriculum can stabilize training and improve final performance.

Neural Augmentation

Use neural networks to generate augmentations:

Style transfer — Apply artistic styles as augmentation
GAN-generated variations — Generate semantically similar images
Adversarial augmentation — Generate hard positives via optimization

These approaches can create more diverse training signal but add complexity.

Start Simple

Summary: Augmentation as Specification

Key Takeaways

•Augmentation is the most impactful design choice — More important than architecture or training duration.
•Crop and color jitter are essential — Together they prevent trivial solutions and force semantic learning.
•Composition order matters — Geometric → color → blur → normalize for consistent results.
•Domain-specific adaptation is required — ImageNet augmentations don't transfer to medical imaging or other specialized domains.
•Strong augmentations need many negatives — Hard positives require strong repulsive force to prevent collapse.
•Validate invariances against downstream tasks — Ensure augmentation-encoded invariances match task requirements.

Module Complete

5 / 5