Loading content...
A single neuron deep in a CNN produces a scalar activation. But what input region influenced that number? Does it respond to a single pixel, a small patch, or the entire image?
This question—about the spatial extent of input dependency—leads to one of the most important concepts in CNN design: the receptive field.
Consider ImageNet classification. A network must decide if an image contains a dog. But dogs can be tiny (a 50-pixel Chihuahua in a park scene) or large (a Great Dane filling the frame). For the classification neuron to 'see' dogs of all sizes, its receptive field must encompass large image regions. Yet the first conv layer only sees 3×3 patches. How does local processing build global understanding?
The answer lies in how receptive fields grow through network depth—a hierarchical expansion that enables CNNs to detect patterns at multiple scales while maintaining computational efficiency.
This page provides a comprehensive treatment of receptive fields. You will understand the formal definition and calculation formulas, how receptive fields grow through layers, the effective receptive field concept, design principles for matching receptive fields to tasks, and common pitfalls in architecture design.
Formal Definition:
The receptive field of a neuron (or feature map position) is the region of the input image that can influence the neuron's activation. Changes outside this region have zero effect on the neuron's output.
Mathematical Formulation:
For a neuron at position (i, j) in layer L, its receptive field is the set of input pixels (x, y) such that:
$$\frac{\partial a^{(L)}{i,j}}{\partial x{x,y}} \not\equiv 0$$
where a^(L)_{i,j} is the activation at position (i,j) in layer L.
Single Layer Receptive Field:
For a single conv layer with kernel size k × k:
Kernel 3×3, Stride 1:
Input (5×5): Output (3×3):
┌─────────────────┐ ┌─────────┐
│ ▓ ▓ ▓ ○ ○ │ │ y₀₀ y₀₁ y₀₂ │
│ ▓ ▓ ▓ ○ ○ │ │ y₁₀ y₁₁ y₁₂ │
│ ▓ ▓ ▓ ○ ○ │ │ y₂₀ y₂₁ y₂₂ │
│ ○ ○ ○ ○ ○ │ └─────────┘
│ ○ ○ ○ ○ ○ │
└─────────────────┘
▓ = Receptive field of y₀₀ (3×3 region)
The 'theoretical' receptive field is the maximum possible region of influence. The 'effective' receptive field considers that pixels at the center contribute more than those at edges. We'll explore this distinction later—it has major implications for architecture design.
Receptive Field with Stride:
When stride s > 1, the convolution subsamples the output. The receptive field size doesn't change, but the spacing between receptive fields increases:
Receptive Field with Dilation:
Dilated (atrous) convolution inserts gaps between kernel elements:
$$Y[i,j] = \sum_{m=0}^{k-1} \sum_{n=0}^{k-1} K[m,n] \cdot X[i + m \cdot d, j + n \cdot d]$$
where d is the dilation rate. For a k×k kernel with dilation d:
$$\text{Receptive Field (single layer)} = k + (k-1)(d-1) = (k-1) \cdot d + 1$$
| Kernel | Dilation | RF Size |
|---|---|---|
| 3×3 | 1 | 3×3 |
| 3×3 | 2 | 5×5 |
| 3×3 | 4 | 9×9 |
| 3×3 | 8 | 17×17 |
Dilation expands the receptive field without increasing parameters.
12345678910111213141516171819202122232425262728293031323334
import numpy as np def single_layer_receptive_field(kernel_size, dilation=1): """ Compute receptive field for a single conv layer. Args: kernel_size: Size of convolution kernel (assumed square) dilation: Dilation rate (1 for standard conv) Returns: Receptive field size """ return (kernel_size - 1) * dilation + 1 # Examplesprint("Single Layer Receptive Fields:")print("-" * 40) for k in [3, 5, 7]: for d in [1, 2, 4]: rf = single_layer_receptive_field(k, d) print(f"Kernel {k}×{k}, Dilation {d}: RF = {rf}×{rf}") # Output:# Kernel 3×3, Dilation 1: RF = 3×3# Kernel 3×3, Dilation 2: RF = 5×5# Kernel 3×3, Dilation 4: RF = 9×9# Kernel 5×5, Dilation 1: RF = 5×5# Kernel 5×5, Dilation 2: RF = 9×9# Kernel 5×5, Dilation 4: RF = 17×17# Kernel 7×7, Dilation 1: RF = 7×7# Kernel 7×7, Dilation 2: RF = 13×13# Kernel 7×7, Dilation 4: RF = 25×25The true power of receptive fields emerges when we stack multiple layers. Each layer's receptive field 'sees' a patch of the previous layer's feature map, which itself has a receptive field into the original input. This creates a compound receptive field that grows with depth.
Recursive Receptive Field Formula:
For a network with L layers, each with kernel size kₗ, stride sₗ, and dilation dₗ, the receptive field at layer L is:
$$R_L = R_{L-1} + (k_L - 1) \cdot d_L \cdot \prod_{i=1}^{L-1} s_i$$
where R₀ = 1 (the input is its own receptive field).
The product of strides $\prod s_i$ is the cumulative stride or jump—how many input pixels correspond to moving one position in layer L.
Simplified Formula (all strides = 1):
When all strides are 1 and all dilations are 1:
$$R_L = 1 + \sum_{l=1}^{L} (k_l - 1)$$
For L identical layers with kernel size k: $$R_L = 1 + L(k-1)$$
With stride 1, receptive field grows linearly with depth. With stride 2, it grows exponentially—each strided layer doubles the effective kernel size. This is why pooling and strided convolutions are so important for building large receptive fields efficiently.
Example: VGG-style Network
Consider 3×3 convolutions with stride 1 and 2×2 max pooling (stride 2):
| Layer | Type | Kernel | Stride | Cumulative Stride | RF Size |
|---|---|---|---|---|---|
| Input | - | - | - | 1 | 1 |
| Conv1 | Conv | 3 | 1 | 1 | 3 |
| Conv2 | Conv | 3 | 1 | 1 | 5 |
| Pool1 | Pool | 2 | 2 | 2 | 6 |
| Conv3 | Conv | 3 | 1 | 2 | 10 |
| Conv4 | Conv | 3 | 1 | 2 | 14 |
| Pool2 | Pool | 2 | 2 | 4 | 16 |
| Conv5 | Conv | 3 | 1 | 4 | 24 |
| Conv6 | Conv | 3 | 1 | 4 | 32 |
Notice how pooling accelerates RF growth: after Pool1, each conv layer adds 4 pixels to RF instead of 2.
Visualization:
Layer: Input → Conv3 → Conv3 → Pool2 → Conv3 → Conv3 → Pool2
RF: 1 → 3 → 5 → 6 → 10 → 14 → 16
(+2) (+2) (+1) (+4) (+4) (+2)
The jump after Pool2 is 4, so each 3×3 conv adds (3-1)×4 = 8 pixels
But pooling itself only adds (2-1)×2 = 2 pixels to RF
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
import numpy as npfrom dataclasses import dataclassfrom typing import List, Tuple @dataclassclass LayerConfig: name: str kernel_size: int stride: int dilation: int = 1 def compute_receptive_field(layers: List[LayerConfig]) -> List[Tuple[str, int, int, int]]: """ Compute receptive field at each layer of a CNN. Returns list of (name, rf_size, stride, jump) tuples. """ results = [("input", 1, 1, 1)] # (name, rf, stride, jump) rf = 1 # Current receptive field jump = 1 # Cumulative stride (how much one output pixel = input pixels) for layer in layers: # RF growth from this layer rf_increase = (layer.kernel_size - 1) * layer.dilation * jump rf = rf + rf_increase # Update cumulative stride jump = jump * layer.stride results.append((layer.name, rf, layer.stride, jump)) return results # Example: VGG-like networkvgg_layers = [ LayerConfig("conv1_1", 3, 1), LayerConfig("conv1_2", 3, 1), LayerConfig("pool1", 2, 2), LayerConfig("conv2_1", 3, 1), LayerConfig("conv2_2", 3, 1), LayerConfig("pool2", 2, 2), LayerConfig("conv3_1", 3, 1), LayerConfig("conv3_2", 3, 1), LayerConfig("conv3_3", 3, 1), LayerConfig("pool3", 2, 2), LayerConfig("conv4_1", 3, 1), LayerConfig("conv4_2", 3, 1), LayerConfig("conv4_3", 3, 1), LayerConfig("pool4", 2, 2), LayerConfig("conv5_1", 3, 1), LayerConfig("conv5_2", 3, 1), LayerConfig("conv5_3", 3, 1), LayerConfig("pool5", 2, 2),] print("VGG-style Receptive Field Analysis:")print("-" * 60)print(f"{'Layer':<12} {'RF Size':>10} {'Stride':>8} {'Jump':>8}")print("-" * 60) results = compute_receptive_field(vgg_layers)for name, rf, stride, jump in results: print(f"{name:<12} {rf:>10} {stride:>8} {jump:>8}") # Final RF for VGG-16-like: 212×212 on 224×224 input images# This covers most of the image, enabling global reasoningThe theoretical receptive field tells us the maximum possible input region that can influence a neuron. But in practice, not all pixels within this region contribute equally. Pixels at the center have much more influence than those at the edges.
The Effective Receptive Field (ERF):
The effective receptive field weights each input pixel by its contribution to the output neuron's activation. It's computed as the magnitude of the gradient of the output with respect to each input pixel:
$$\text{ERF}[x,y] = \left|\frac{\partial a^{(L)}{i,j}}{\partial x{x,y}}\right|$$
Key Discovery (Luo et al., 2016):
For CNNs with random weights, the effective receptive field has a Gaussian distribution centered on the theoretical center. The effective RF (where most influence lies) occupies only a fraction of the theoretical RF.
Implications:
Many CNNs have theoretical receptive fields covering the entire input but effective receptive fields of only 20-30% of that size. This means peripheral regions of large objects may not influence detection, causing failures on large-scale patterns or objects near image boundaries.
Mathematical Analysis:
For a network with L conv layers of kernel size k and stride 1, the effective RF is approximately Gaussian with standard deviation:
$$\sigma_{\text{ERF}} \approx \sigma_0 \cdot \sqrt{L}$$
where σ₀ depends on the kernel size. For a 3×3 kernel with no nonlinearities:
$$\sigma_{\text{ERF}} \approx \frac{k-1}{2} \cdot \sqrt{\frac{L}{3}} \approx 0.58\sqrt{L}$$
The theoretical RF grows as L(k-1), but effective RF grows as √L. The ratio shrinks:
$$\frac{\text{Effective RF}}{\text{Theoretical RF}} \propto \frac{1}{\sqrt{L}}$$
Strategies to Increase Effective RF:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import torchimport torch.nn as nnimport torch.nn.functional as Fimport matplotlib.pyplot as pltimport numpy as np def compute_effective_receptive_field(model, input_size=64): """ Compute the effective receptive field of a CNN by measuring gradients from a central output pixel to all input pixels. """ model.eval() # Create input requiring gradients x = torch.randn(1, 3, input_size, input_size, requires_grad=True) # Forward pass output = model(x) # Get the center output position _, _, oh, ow = output.shape center_h, center_w = oh // 2, ow // 2 # Backprop from center output pixel target = torch.zeros_like(output) target[0, :, center_h, center_w] = 1.0 output.backward(target) # Get gradient magnitude at input erf = x.grad.abs().sum(dim=1).squeeze() # Sum over channels # Normalize erf = erf / erf.max() return erf.detach().numpy() def analyze_erf_ratio(model, input_size=128): """ Compare effective RF to theoretical RF. """ erf = compute_effective_receptive_field(model, input_size) # Theoretical RF: where gradient is non-zero theoretical_rf_pixels = (erf > 1e-6).sum() # Effective RF: where gradient is significant (e.g., > 10% of max) effective_rf_pixels = (erf > 0.1).sum() # Also compute "half-width" of ERF center = input_size // 2 row_profile = erf[center, :] half_max = row_profile.max() / 2 half_width = (row_profile > half_max).sum() return { 'theoretical_rf_pixels': int(theoretical_rf_pixels), 'effective_rf_pixels': int(effective_rf_pixels), 'ratio': effective_rf_pixels / theoretical_rf_pixels if theoretical_rf_pixels > 0 else 0, 'half_width': int(half_width) } # Example: Compare plain CNN vs ResNet-styleclass PlainCNN(nn.Module): def __init__(self, num_layers=8): super().__init__() layers = [nn.Conv2d(3 if i == 0 else 64, 64, 3, padding=1) for i in range(num_layers)] self.layers = nn.ModuleList(layers) def forward(self, x): for layer in self.layers: x = F.relu(layer(x)) return x class ResNetStyle(nn.Module): def __init__(self, num_blocks=4): super().__init__() self.conv1 = nn.Conv2d(3, 64, 3, padding=1) self.blocks = nn.ModuleList([ nn.Sequential( nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(), nn.Conv2d(64, 64, 3, padding=1) ) for _ in range(num_blocks) ]) def forward(self, x): x = F.relu(self.conv1(x)) for block in self.blocks: x = F.relu(x + block(x)) # Skip connection! return x # Analyzeprint("Effective Receptive Field Analysis:")print("-" * 50) for name, model in [("Plain CNN (8 layers)", PlainCNN(8)), ("ResNet-style (8 layers)", ResNetStyle(4))]: stats = analyze_erf_ratio(model) print(f"{name}:") print(f" Theoretical RF pixels: {stats['theoretical_rf_pixels']}") print(f" Effective RF pixels (>10%): {stats['effective_rf_pixels']}") print(f" Ratio: {stats['ratio']:.2%}") print(f" Half-width: {stats['half_width']}") print() # ResNet-style typically has larger effective RF due to skip connectionsReceptive field growth creates a hierarchical feature representation where shallow layers detect local patterns and deep layers integrate global context.
The Hierarchy:
| Depth | RF Size | Feature Examples |
|---|---|---|
| Layer 1 | 3×3 | Edges, color gradients |
| Layer 2 | 5×5 | Corners, simple textures |
| Layers 3-4 | 10-20px | Texture patterns, small parts |
| Layers 5-6 | 30-50px | Object parts (eyes, wheels) |
| Layers 7-8 | 70-100px | Object compositions |
| Deeper | 100-200px+ | Whole objects, scenes |
Why This Works:
Natural images have hierarchical structure:
CNN receptive fields naturally align with this hierarchy. Each layer's RF size determines which level of abstraction it can represent.
Match your architecture's receptive field to the scale of patterns you need to detect. For detecting 50-pixel objects, your final layer needs at least a 50-pixel RF. For scene understanding requiring 200-pixel context, design for 200+ pixel RF. Under-sizing RF limits what the network can learn.
Visualizing the Hierarchy:
Classic CNN visualization studies (Zeiler & Fergus, 2014) show how features evolve:
Layer 1: Gabor-like edges at various orientations
╱ ╲ │ ─ ╲╱ ╱╲
Layer 2: Corners, junctions, grid patterns
┌ ┐ └ ┘ ╳ #
Layer 3: Textures (fur, fabric, honeycomb)
▓▓▓ ░░░ ▒▒▒
Layer 4: Parts (eyes, wheels, legs)
👁 ⚙ 🦵
Layer 5: Object categories (faces, vehicles)
🐱 🚗 🏠
Receptive Field Determines Complexity:
A feature can only be as complex as its receptive field allows:
Receptive field is a critical architectural choice that must match the task requirements. Let's examine design strategies.
Strategy 1: Stacking Small Kernels
VGG popularized using multiple 3×3 layers instead of larger kernels:
Advantages:
Strategy 2: Dilated/Atrous Convolutions
Dilation expands RF without increasing parameters:
# Standard conv: RF = 3
nn.Conv2d(64, 64, 3, padding=1, dilation=1)
# Dilated conv: RF = 5
nn.Conv2d(64, 64, 3, padding=2, dilation=2)
# More dilated: RF = 9
nn.Conv2d(64, 64, 3, padding=4, dilation=4)
Used extensively in segmentation (DeepLab) and audio (WaveNet).
High dilation rates can cause 'gridding' artifacts—the dilated kernel only samples every d-th pixel, missing information in between. Solutions include using multiple dilation rates (pyramid pooling) or gradually increasing/decreasing dilation.
Strategy 3: Pooling and Strided Convolutions
Downsampling accelerates RF growth dramatically:
Strategy 4: Multi-Scale Processing
Process at multiple resolutions simultaneously:
Input Image (224×224)
│
├──▶ Branch 1: Full resolution, small RF
│
├──▶ Branch 2: 1/2 resolution, medium RF
│
└──▶ Branch 3: 1/4 resolution, large RF
│
▼
Merge branches → Multi-scale features
Used in Inception modules, Feature Pyramid Networks (FPN), U-Net.
Strategy 5: Global Context Modules
Add explicit global context without extreme depth:
# Squeeze-and-Excitation: Global average pool → FC → reweight
class SEBlock(nn.Module):
def forward(self, x):
# Global context
g = F.adaptive_avg_pool2d(x, 1) # [B, C, 1, 1]
g = self.fc(g.flatten(1)) # [B, C]
g = g.unsqueeze(-1).unsqueeze(-1) # [B, C, 1, 1]
return x * torch.sigmoid(g) # Reweight channels
Strategy 6: Self-Attention
Attention computes all-pairs interactions:
| Strategy | Parameters | Computation | RF Growth | Use Case |
|---|---|---|---|---|
| Stacked 3×3 | Low | Moderate | Linear | General CNNs |
| Dilated convs | Low | Low | Fast (multiplicative) | Segmentation (DeepLab) |
| Strided conv/pool | Low | Reduces | Exponential | Classification, detection |
| Multi-scale | Moderate | Higher | Multiple simultaneous | FPN, Inception |
| Global pooling | Very low | Low | Instant global | Channel attention (SE) |
| Self-attention | High | Quadratic | Instant global | Vision Transformers |
Let's analyze how iconic architectures approach receptive field design.
ResNet:
ResNet-50 on 224×224 inputs:
ResNet RF Calculation:
Conv1(7×7, s=2): RF = 7, jump = 2
Pool(3×3, s=2): RF = 11, jump = 4
Stage 1 (6 layers): RF grows by 2*jump per 3×3 conv
Stage 2 (stride 2): RF doubles effective growth
...
EfficientNet:
EfficientNet balances depth, width, and resolution:
ConvNeXt:
ConvNeXt modernizes CNNs with insights from Transformers:
Vision Transformers (ViTs) have global RF from layer 1 via self-attention. Their success prompted reconsideration of CNN RF design. ConvNeXt and other modern CNNs use larger kernels (7×7) and designs that maximize effective RF, closing the gap with attention-based models.
Semantic Segmentation Networks:
Segmentation requires dense prediction at full resolution while using large RF for context:
DeepLab (ASPP - Atrous Spatial Pyramid Pooling):
Parallel dilated convs with rates [1, 6, 12, 18]
Each captures different RF scale
Concat → 1×1 conv → Final prediction
U-Net:
Encoder: Downsample 4× → large RF at bottleneck
Decoder: Upsample with skip connections
Skips preserve high-resolution spatial info
Bottleneck provides global context
Object Detection Networks:
Detection must handle objects at multiple scales:
Feature Pyramid Network (FPN):
Backbone produces features at multiple scales:
P2 (1/4): Small RF, good for small objects
P3 (1/8): Medium RF, medium objects
P4 (1/16): Large RF, large objects
P5 (1/32): Very large RF, very large objects
Top-down pathway shares high-level features
Each detector head operates on appropriate RF for its object scale.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
import torch.nn as nn def analyze_resnet_rf(): """ Analyze receptive field of ResNet-50. """ layers = [ # (name, kernel, stride, dilation) ("conv1", 7, 2, 1), ("maxpool", 3, 2, 1), # Stage 1: 3 blocks, no downsampling in blocks ("res1_1_1", 1, 1, 1), ("res1_1_2", 3, 1, 1), ("res1_1_3", 1, 1, 1), ("res1_2_1", 1, 1, 1), ("res1_2_2", 3, 1, 1), ("res1_2_3", 1, 1, 1), ("res1_3_1", 1, 1, 1), ("res1_3_2", 3, 1, 1), ("res1_3_3", 1, 1, 1), # Stage 2: stride 2 in first block ("res2_1_1", 1, 1, 1), ("res2_1_2", 3, 2, 1), ("res2_1_3", 1, 1, 1), ("res2_2_1", 1, 1, 1), ("res2_2_2", 3, 1, 1), ("res2_2_3", 1, 1, 1), ("res2_3_1", 1, 1, 1), ("res2_3_2", 3, 1, 1), ("res2_3_3", 1, 1, 1), ("res2_4_1", 1, 1, 1), ("res2_4_2", 3, 1, 1), ("res2_4_3", 1, 1, 1), # Stage 3: stride 2 in first block ("res3_1_1", 1, 1, 1), ("res3_1_2", 3, 2, 1), ("res3_1_3", 1, 1, 1), ("res3_2_1", 1, 1, 1), ("res3_2_2", 3, 1, 1), ("res3_2_3", 1, 1, 1), ("res3_3_1", 1, 1, 1), ("res3_3_2", 3, 1, 1), ("res3_3_3", 1, 1, 1), ("res3_4_1", 1, 1, 1), ("res3_4_2", 3, 1, 1), ("res3_4_3", 1, 1, 1), ("res3_5_1", 1, 1, 1), ("res3_5_2", 3, 1, 1), ("res3_5_3", 1, 1, 1), ("res3_6_1", 1, 1, 1), ("res3_6_2", 3, 1, 1), ("res3_6_3", 1, 1, 1), # Stage 4: stride 2 in first block ("res4_1_1", 1, 1, 1), ("res4_1_2", 3, 2, 1), ("res4_1_3", 1, 1, 1), ("res4_2_1", 1, 1, 1), ("res4_2_2", 3, 1, 1), ("res4_2_3", 1, 1, 1), ("res4_3_1", 1, 1, 1), ("res4_3_2", 3, 1, 1), ("res4_3_3", 1, 1, 1), ] rf = 1 jump = 1 stage_rfs = {"input": 1} current_stage = "conv1" for name, k, s, d in layers: rf_increase = (k - 1) * d * jump rf = rf + rf_increase jump = jump * s # Track RF at stage boundaries if "res" in name and "_1" in name and name.endswith("_1"): stage = name[:5] # e.g., "res2_" if stage != current_stage: stage_rfs[f"Stage {stage[3]}"] = rf current_stage = stage stage_rfs["final"] = rf print("ResNet-50 Receptive Field by Stage:") print("-" * 40) for stage, rf_val in stage_rfs.items(): print(f"{stage}: {rf_val}×{rf_val}") return rf, jump final_rf, final_jump = analyze_resnet_rf()print(f"\nFinal: RF = {final_rf}, Jump = {final_jump}")print(f"On 224×224 input: RF covers entire image (RF > 224)")Receptive field misconfiguration is a common source of CNN failures. Let's examine pitfalls and their solutions.
Pitfall 1: Insufficient RF for Large Objects
Symptom: Network fails on large objects or requires global context Cause: Final layer RF smaller than target objects Solution: Add depth, pooling, or dilated convolutions
Example: A segmentation network with 50-pixel RF on 512×512 images cannot correctly segment 200-pixel objects—it lacks the context to determine object boundaries at that scale.
Pitfall 2: Effective RF Much Smaller Than Theoretical
Symptom: Network behaves as if RF is ~30% of theoretical Cause: Gradient attenuation through depth Solution: Skip connections, attention, larger early kernels
Pitfall 3: Gridding from Dilated Convolutions
Symptom: Checkerboard artifacts in segmentation outputs Cause: Large dilation rates skip intermediate pixels Solution: Use multiple dilation rates, gradually increase/decrease dilation
A single RF cannot optimally match all object sizes. A detector for both 20-pixel and 200-pixel objects needs either multi-scale feature processing (FPN) or an RF that's effectively adaptive (attention-based methods).
Pitfall 4: Over-Aggressive Downsampling
Symptom: Poor performance on small objects or fine details Cause: Too much pooling destroys high-resolution information Solution: Fewer pooling stages, dilated convolutions instead of pooling, feature pyramids
Diagnostic Questions:
What's the smallest object/pattern I need to detect?
What's the largest context I need?
Does my task require dense prediction?
Is there multi-scale structure?
Receptive field is a fundamental concept that connects CNN architecture to task requirements. Understanding RF enables principled architecture design rather than trial-and-error.
Connection to Next Topic:
Receptive fields determine what local region a neuron 'sees'. But the output of a convolution layer isn't a single number—it's a feature map, a spatial array of activations. The next page explores feature maps: how they represent learned features, how multiple feature maps work together, and how to interpret what a CNN has learned.
You now understand receptive fields—the input regions that influence CNN neurons. You've learned calculation formulas, growth patterns, the effective vs theoretical distinction, design strategies, and common pitfalls. Next, we'll explore feature maps: the spatial representations that convolution produces.