Machine LearningConvolutional Neural Networks

The Convolution Operation

LevelIntermediate

Duration90 mins

TopicConvolutional Neural Networks

2 / 5

Cross-Correlation

The Deep Learning Community's 'Convolution'

Here's a fundamental truth that often surprises newcomers: When deep learning practitioners say 'convolution', they almost always mean cross-correlation.

Open any deep learning framework—PyTorch, TensorFlow, JAX—and examine the Conv2d operation. Despite its name, it implements cross-correlation, not true mathematical convolution. The kernel is not flipped before sliding across the input.

This isn't a bug or an oversight. It's a deliberate design choice with deep practical justification. Understanding why this choice was made—and why it typically doesn't matter for neural networks—illuminates something fundamental about how CNNs learn.

In this page, we rigorously define cross-correlation, compare it mathematically to convolution, and explain the conditions under which the distinction is irrelevant. By the end, you'll understand exactly what happens when you instantiate a 'convolutional' layer in your favorite framework.

What You Will Learn

By the end of this page, you will understand the mathematical definition of cross-correlation, clearly distinguish it from true convolution, know why deep learning uses cross-correlation despite calling it 'convolution', and recognize the cases where the distinction does matter.

The Mathematical Definition of Cross-Correlation

1D Cross-Correlation:

For a discrete signal f[n] and kernel g[n], the cross-correlation (often denoted with ★ or ⊛ to distinguish from convolution's *) is:

$$(f \star g)[n] = \sum_{k=-\infty}^{\infty} f[k] \cdot g[k - n]$$

Or equivalently, with finite sequences:

$$(f \star g)[n] = \sum_{k=0}^{K-1} f[n + k] \cdot g[k]$$

The Crucial Difference:

Compare to convolution:

Convolution: (f * g)[n] = Σₖ f[k] · g[n - k] — kernel is flipped
Cross-correlation: (f ★ g)[n] = Σₖ f[k] · g[k - n] — kernel is NOT flipped

The difference is subtle in the formula but profound in interpretation. In convolution, the kernel is reversed as it slides; in cross-correlation, it maintains its original orientation.

2D Cross-Correlation:

For a 2D input I[m, n] and kernel K[i, j]:

$$(I \star K)[m, n] = \sum_{i=0}^{K_h-1} \sum_{j=0}^{K_w-1} I[m + i, n + j] \cdot K[i, j]$$

Here, the kernel slides across the image without rotation or flipping. Each output position is the dot product of the kernel with the corresponding image patch, aligned directly.

Relationship to Convolution:

Cross-correlation with kernel K is equivalent to convolution with the flipped kernel K':

$$f \star g = f * g'$$

where g'[k] = g[-k] in 1D, or K'[i, j] = K[Kₕ - 1 - i, Kw - 1 - j] in 2D (180° rotation).

This means any operation achievable with one is achievable with the other, just with a transformed kernel.

convolution_vs_correlation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import numpy as np
from scipy.signal import convolve2d, correlate2d
 
def demonstrate_convolution_vs_correlation():
    """
    Demonstrate the mathematical difference between convolution and correlation.
    """
    # Create a simple test image
    image = np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ], dtype=float)
    
    # An asymmetric kernel (to see the flip effect clearly)
    kernel = np.array([
        [1, 0],
        [0, 2]
    ], dtype=float)
    
    # Flipped kernel (180° rotation)
    kernel_flipped = np.array([
        [2, 0],
        [0, 1]
    ], dtype=float)
    
    print("Image:")
    print(image)
    print("\nKernel:")
    print(kernel)
    print("\nFlipped Kernel (180° rotation):")
    print(kernel_flipped)
    
    # True convolution (scipy uses the mathematical definition)
    conv_result = convolve2d(image, kernel, mode='valid')
    
    # Cross-correlation (no flip)
    corr_result = correlate2d(image, kernel, mode='valid')
    
    # Convolution with flipped kernel = Cross-correlation with original
    conv_flipped = convolve2d(image, kernel_flipped, mode='valid')
    
    print("\nConvolution (with kernel flip):")
    print(conv_result)
    print("\nCross-correlation (no flip):")
    print(corr_result)
    print("\nConvolution with pre-flipped kernel:")
    print(conv_flipped)
    
    # Key observation: conv_flipped equals corr_result
    print("\nConv(image, flipped_kernel) == Corr(image, kernel):")
    print(np.allclose(conv_flipped, corr_result))
 
 
if __name__ == "__main__":
    demonstrate_convolution_vs_correlation()

The Flip Matters Only for Asymmetric Kernels

For symmetric kernels (where K[i,j] = K[Kₕ-1-i, Kw-1-j]), convolution and cross-correlation produce identical results. Many classical image processing kernels (Gaussian, Laplacian) are symmetric. But learned CNN kernels are not constrained to symmetry, so the distinction could theoretically matter.

Why Deep Learning Uses Cross-Correlation

Given that convolution and cross-correlation are distinct operations, why did the deep learning community standardize on cross-correlation while calling it 'convolution'? There are several compelling reasons:

1. Gradient Computation Symmetry:

During backpropagation, the gradient with respect to the input involves correlating the output gradient with the kernel. If the forward pass uses convolution (with flip), the backward pass uses correlation (without flip), and vice versa.

By using cross-correlation in the forward pass, the mathematical structure of backpropagation becomes more symmetric and slightly more intuitive. The same kernel orientation appears in both input gradient and weight gradient computations.

2. Historical Momentum:

Early CNN implementations (dating to LeCun's 1989 work) used cross-correlation, possibly for implementation simplicity. As the field grew, this choice became entrenched. Changing conventions would break backward compatibility across decades of research.

3. The Learnability Argument (The Killer Insight):

This is the most important reason. Consider what happens during training:

Suppose the 'correct' kernel for detecting vertical edges is K = [[1, 0, -1], [2, 0, -2], [1, 0, -1]]
If we use true convolution, the kernel gets flipped during operation
If we use cross-correlation, it doesn't

But the kernel weights are learned!

Backpropagation doesn't know what the 'intended' kernel looks like. It simply adjusts kernel weights to minimize loss. If cross-correlation is used and a flipped kernel is needed, backpropagation will learn the flipped version directly.

In other words:

With convolution: learn K, apply flip(K)
With cross-correlation: learn flip(K) directly

The final effect on outputs is identical. The network learns whatever kernel orientation produces the correct output—the flip is absorbed into the learned weights.

The Flip Gets Absorbed

Since kernels are learned (not hand-designed), the choice between convolution and cross-correlation is mathematically irrelevant to CNN performance. Whatever feature detector would be learned under convolution, its 180° rotated version is learned under cross-correlation. The network's capacity and final performance are identical.

4. Implementation Simplicity:

Cross-correlation is slightly simpler to implement:

No kernel flip required before the sliding operation
Direct memory access patterns for both input and kernel
Easier to reason about when debugging

These micro-simplifications compound across millions of lines of deep learning code.

Mathematical Properties: Convolution vs. Cross-Correlation

While the learnability argument makes the practical distinction moot for CNNs, the mathematical properties differ significantly. Understanding these differences is valuable for theoretical analysis and for applications outside standard CNNs.

Properties of True Convolution:

Commutativity: f * g = g * f ✓
Associativity: f * (g * h) = (f * g) * h ✓
Distributivity: f * (g + h) = (f * g) + (f * h) ✓
Identity Element: f * δ = f (where δ is the Dirac delta) ✓
Convolution Theorem: ℱ{f * g} = ℱ{f} · ℱ{g} ✓

Properties of Cross-Correlation:

Commutativity: f ★ g ≠ g ★ f in general ✗
Associativity: f ★ (g ★ h) ≠ (f ★ g) ★ h in general ✗
Distributivity: f ★ (g + h) = (f ★ g) + (f ★ h) ✓
No Identity Element in the convolution sense ✗
Correlation Theorem: ℱ{f ★ g} = ℱ{f}* · ℱ{g} (note the conjugate) ✓

Property Comparison: Convolution vs. Cross-Correlation
Property	Convolution (*)	Cross-Correlation (★)	Impact in Deep Learning
Commutativity	✓ f * g = g * f	✗ f ★ g ≠ g ★ f	Rarely exploited in CNNs
Associativity	✓ (f * g) * h = f * (g * h)	✗ Not associative	Matters for kernel fusion analysis
Distributivity	✓ Distributes over +	✓ Distributes over +	Both support linear operations
Identity	✓ δ is identity	✗ No identity	No practical CNN impact
Fourier Relation	Multiplication	Multiplication with conjugate	FFT-based convolution uses true conv

Loss of Associativity: A Deeper Look

The loss of associativity in cross-correlation has subtle implications for theoretical analysis:

With true convolution, stacking two convolutional layers with kernels K₁ and K₂ is theoretically equivalent to a single layer with kernel K₁ * K₂. This is because:

$$f * K_1 * K_2 = f * (K_1 * K_2)$$

With cross-correlation, this equivalence doesn't hold in general:

$$(I \star K_1) \star K_2 \neq I \star (K_1 \star K_2)$$

However, in practice, this theoretical difference has minimal impact because:

Each layer has a nonlinear activation between convolutions, breaking any linear collapse anyway
Learned kernels aren't constrained in ways that would exploit associativity
CNN architectures don't rely on algebraic kernel manipulation

When Associativity Matters

If you're analyzing CNN expressivity or attempting to algebraically combine kernel effects (e.g., for network compression or theoretical capacity analysis), the non-associativity of cross-correlation can cause surprises. Always verify whether a paper uses true convolution or cross-correlation conventions.

Intuitive Comparison: Flip and Slide vs. Direct Slide

Let's build geometric intuition for the difference between convolution and cross-correlation through visualization.

Convolution: Flip and Slide

Rotate the kernel 180° (flip both horizontally and vertically)
Place the flipped kernel at the starting position
Compute element-wise multiplication and sum (dot product)
Slide one position, repeat

Cross-Correlation: Direct Slide

Place the kernel at the starting position (no flip)
Compute element-wise multiplication and sum
Slide one position, repeat

Visual Example:

Consider a 1D signal [a, b, c, d, e] and kernel [1, 2, 3].

Cross-Correlation at position 0:

Signal window: [a, b, c]
Kernel: [1, 2, 3]
Result: a·1 + b·2 + c·3 = a + 2b + 3c

Convolution at position 0:

Signal window: [a, b, c]
Flipped kernel: [3, 2, 1]
Result: a·3 + b·2 + c·1 = 3a + 2b + c

The difference is how kernel weights align with signal positions.

visual_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
 
def visual_comparison_1d():
    """
    Step-by-step trace comparing convolution and cross-correlation.
    """
    signal = np.array([1, 2, 3, 4, 5], dtype=float)
    kernel = np.array([1, 2, 3], dtype=float)
    kernel_flipped = kernel[::-1]  # [3, 2, 1]
    
    print("Signal:", signal)
    print("Kernel:", kernel)
    print("Flipped Kernel:", kernel_flipped)
    print()
    
    # Cross-correlation: direct alignment
    print("=== Cross-Correlation (no flip) ===")
    for pos in range(len(signal) - len(kernel) + 1):
        window = signal[pos:pos + len(kernel)]
        result = np.dot(window, kernel)
        print(f"Position {pos}: {window} • {kernel} = {result}")
    
    print()
    
    # Convolution: kernel is flipped
    print("=== Convolution (with flip) ===")
    for pos in range(len(signal) - len(kernel) + 1):
        window = signal[pos:pos + len(kernel)]
        result = np.dot(window, kernel_flipped)
        print(f"Position {pos}: {window} • {kernel_flipped} = {result}")
    
    print()
    print("Notice: Same positions, different results due to kernel orientation")
 
 
if __name__ == "__main__":
    visual_comparison_1d()

Directional Interpretation:

The kernel flip in convolution can be thought of as 'looking backward'—the kernel's first element (reading left-to-right) aligns with the signal's last element in the window (right-to-left).

Cross-correlation is 'looking forward'—the kernel's first element aligns with the signal's first element in the window. Both directions.

For symmetric kernels (palindromic pattern), both operations yield identical results because flipping doesn't change the kernel.

In Images (2D):

The 180° rotation in 2D means:

Top-left kernel element moves to bottom-right
Top-right moves to bottom-left
And so on

For a kernel detecting diagonal lines going top-left to bottom-right, flipping would detect diagonals going the other direction. With learned kernels, this just means the network learns the orientation that works.

Template Matching Perspective

Cross-correlation is more intuitive for template matching: the kernel directly represents the pattern you're looking for. With convolution, you'd need to flip the template mentally. This is another reason cross-correlation became the deep learning standard—what you visualize is what gets applied.

Framework Implementations: Under the Hood

Let's examine how major deep learning frameworks implement their 'convolution' operations and verify that they use cross-correlation.

PyTorch nn.Conv2d:

PyTorch's documentation explicitly states that Conv2d computes cross-correlation. From the docs:

This module supports TensorFloat32. The implementation uses cross-correlation...

The forward pass is: $$\text{output}[n, c_{out}, h, w] = \sum_{c_{in}} \sum_{i} \sum_{j} \text{input}[n, c_{in}, h+i, w+j] \times \text{weight}[c_{out}, c_{in}, i, j] + \text{bias}[c_{out}]$$

No flip is applied to the weight tensor.

TensorFlow/Keras Conv2D:

TensorFlow similarly implements cross-correlation:

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input...

JAX/Flax:

JAX's lax.conv_general_dilated provides options for various convolution modes but defaults to cross-correlation semantics.

framework_verification.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import torch
import torch.nn as nn
import numpy as np
 
def verify_pytorch_uses_cross_correlation():
    """
    Empirically verify that PyTorch's Conv2d uses cross-correlation.
    """
    # Create a simple test case
    # Input: 1 batch, 1 channel, 3x3 image
    input_tensor = torch.tensor([
        [1.0, 2.0, 3.0],
        [4.0, 5.0, 6.0],
        [7.0, 8.0, 9.0]
    ]).reshape(1, 1, 3, 3)
    
    # Asymmetric kernel to make flip visible
    kernel = torch.tensor([
        [1.0, 2.0],
        [3.0, 4.0]
    ]).reshape(1, 1, 2, 2)
    
    # Create Conv2d layer and set weights manually
    conv = nn.Conv2d(1, 1, 2, bias=False)
    conv.weight.data = kernel
    
    # PyTorch result
    pytorch_result = conv(input_tensor).squeeze()
    
    # Manual cross-correlation (no flip)
    input_np = input_tensor.numpy().squeeze()
    kernel_np = kernel.numpy().squeeze()
    
    correlation_result = np.zeros((2, 2))
    for i in range(2):
        for j in range(2):
            patch = input_np[i:i+2, j:j+2]
            correlation_result[i, j] = np.sum(patch * kernel_np)
    
    # Manual convolution (with 180° flip)
    kernel_flipped = kernel_np[::-1, ::-1]
    convolution_result = np.zeros((2, 2))
    for i in range(2):
        for j in range(2):
            patch = input_np[i:i+2, j:j+2]
            convolution_result[i, j] = np.sum(patch * kernel_flipped)
    
    print("Input:")
    print(input_np)
    print("\nKernel:")
    print(kernel_np)
    print("\nPyTorch Conv2d result:")
    print(pytorch_result.detach().numpy())
    print("\nManual cross-correlation:")
    print(correlation_result)
    print("\nManual true convolution:")
    print(convolution_result)
    print("\n--- Verification ---")
    print(f"PyTorch matches cross-correlation: {np.allclose(pytorch_result.detach().numpy(), correlation_result)}")
    print(f"PyTorch matches true convolution: {np.allclose(pytorch_result.detach().numpy(), convolution_result)}")
 
 
if __name__ == "__main__":
    verify_pytorch_uses_cross_correlation()

Historical Naming

The naming choice to call cross-correlation 'convolution' dates back to the earliest CNN papers. Changing terminology now would cause immense confusion across decades of literature, papers, and codebases. The deep learning community has collectively decided to live with the terminology overload.

When the Distinction Actually Matters

While learnability makes the distinction irrelevant for most CNN training, there are specific scenarios where you must be aware of the difference:

1. Using Pre-defined Kernels:

If you're using hand-crafted kernels (e.g., Sobel, Laplacian, Gabor filters) within a CNN framework, remember that the framework applies cross-correlation. If these kernels were designed with true convolution in mind, you may need to flip them first.

2. FFT-Based Acceleration:

The convolution theorem (ℱ{f * g} = ℱ{f}·ℱ{g}) applies to true convolution, not cross-correlation. If you implement FFT-based 'convolution' for speed, you must either:

Use the correlation theorem (involves conjugate), or
Flip the kernel before FFT multiplication

3. Signal Processing Interoperability:

When interfacing with signal processing libraries (scipy, MATLAB's Signal Processing Toolbox), be aware that:

scipy.signal.convolve uses true convolution
scipy.signal.correlate uses cross-correlation
Deep learning frameworks use cross-correlation

Mixing conventions without awareness causes bugs.

Scenarios Requiring Attention

•Transfer Learning with Fixed Layers: If early layers are frozen and use hand-designed kernels, verify the flip convention matches expectations.
•Network Surgery: When extracting kernels from one framework and inserting into another, ensure both use the same convention.
•Theoretical Analysis: Papers analyzing CNN expressivity or kernel properties may assume true convolution for mathematical convenience.
•Hybrid Pipelines: Systems combining deep learning with traditional signal processing must account for differing conventions.
•Custom Backward Passes: If implementing custom gradients, remember the asymmetry between forward (correlation) and backward (convolution with respect to certain gradients).

predefined_kernel_gotcha.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import torch
import torch.nn as nn
import numpy as np
 
def sobel_kernel_example():
    """
    Demonstrate the importance of flipping when using pre-defined kernels.
    """
    # Standard Sobel kernel for vertical edge detection
    # Designed with convolution in mind: bright-on-right edges
    sobel_x = np.array([
        [-1, 0, 1],
        [-2, 0, 2],
        [-1, 0, 1]
    ], dtype=np.float32)
    
    print("Original Sobel-X kernel (designed for convolution):")
    print(sobel_x)
    
    # For use in PyTorch (which uses cross-correlation):
    # To get the same effect as convolution, flip the kernel
    sobel_x_for_pytorch = np.flip(np.flip(sobel_x, 0), 1)  # 180° rotation
    
    print("\nFlipped Sobel-X for PyTorch cross-correlation:")
    print(sobel_x_for_pytorch)
    
    # However, for symmetric kernels like Gaussian, no flip needed
    gaussian_3x3 = np.array([
        [1, 2, 1],
        [2, 4, 2],
        [1, 2, 1]
    ], dtype=np.float32) / 16
    
    # 180° rotation of a symmetric kernel is the same
    gaussian_flipped = np.flip(np.flip(gaussian_3x3, 0), 1)
    
    print("\nGaussian kernel:")
    print(gaussian_3x3)
    print("\nFlipped Gaussian (identical):")
    print(gaussian_flipped)
    print("\nSymmetric kernels are flip-invariant:", 
          np.allclose(gaussian_3x3, gaussian_flipped))
 
 
if __name__ == "__main__":
    sobel_kernel_example()

Practical Advice

For standard CNN training, ignore the distinction—learned kernels adapt. For pre-defined kernels, test empirically: apply your kernel to a known input and verify the output matches expectations. For theoretical work, always specify which convention you're using.

The Signal Processing Perspective

In signal processing, cross-correlation has a distinct purpose that differs from convolution. Understanding this original context illuminates why the two operations are structurally similar but conceptually different.

Cross-Correlation as Similarity Measure:

In signal processing, cross-correlation measures how similar two signals are as one is shifted relative to the other:

$$R_{fg}[\tau] = \sum_{n} f[n] \cdot g[n + \tau]$$

The value Rfg[τ] tells us how well signal g aligns with signal f when g is shifted by τ samples.

Applications of Cross-Correlation:

Radar/Sonar: Determining the time delay of a reflected signal
Audio Synchronization: Aligning two audio recordings
Template Matching: Finding where a pattern occurs in a signal
Autocorrelation (f ★ f): Measuring a signal's self-similarity, revealing periodicity

Convolution as System Response:

In contrast, convolution computes the output of a linear time-invariant system:

$$y[n] = (x * h)[n]$$

where x is the input and h is the system's impulse response.

The kernel flip in convolution ensures causality for physical systems: past inputs affect current output. Cross-correlation doesn't have this causal interpretation.

Signal Processing: Convolution vs. Cross-Correlation
Aspect	Convolution	Cross-Correlation
Primary Use	System response computation	Similarity/alignment measurement
Physical Interpretation	Output of LTI system	Pattern matching score
Kernel/Filter	Impulse response	Template/reference signal
Symmetry	Commutative, associative	Neither (in general)
Causality	Naturally causal	Not inherently causal

CNNs as Pattern Matchers:

The choice of cross-correlation in CNNs makes sense from the template matching perspective. Each kernel is a learned template, and cross-correlation measures how well each image patch matches the template.

The 'system response' interpretation of convolution is less natural for image classification—we're not computing what happens when an 'image signal' passes through a 'filter system'. We're detecting patterns.

So while the naming is confusing (calling cross-correlation 'convolution'), the operational choice aligns with the intuitive purpose of convolutional layers: pattern detection, not system simulation.

Two Communities, One Operation, Different Names

Signal processors and deep learners often talk past each other because of terminology differences. When reading interdisciplinary papers or implementing hybrid systems, always verify which operation is meant by 'convolution'.

Summary: Cross-Correlation is the Real Operation

We've thoroughly examined the distinction between convolution and cross-correlation, establishing exactly what deep learning frameworks do under the hood. Let's consolidate the key insights:

Key Takeaways

•Mathematical Difference: Convolution flips the kernel; cross-correlation does not. Cross-correlation with K equals convolution with flipped K.
•Framework Reality: All major deep learning frameworks (PyTorch, TensorFlow, JAX) implement cross-correlation, not true convolution, despite naming.
•Why It Doesn't Matter for Learning: Since kernels are learned, the flip is absorbed into the learned weights. The network's capacity is unchanged.
•Property Differences: True convolution has commutativity and associativity; cross-correlation does not. Rarely impacts CNN practice.
•When to Care: Pre-defined kernels, FFT acceleration, signal processing interoperability, and theoretical analysis require awareness of the convention.
•Template Matching Intuition: Cross-correlation aligns with the pattern-matching interpretation of CNN kernels.

What's Next:

With the convolution/correlation distinction clarified, we turn to practical aspects of the operation. The next page covers stride and padding—the control parameters that determine how the kernel moves across the input and how boundaries are handled. These choices directly affect output spatial dimensions and computational cost.

Terminology Demystified

You now understand precisely what 'convolution' means in deep learning contexts—it's cross-correlation, with learned kernels that absorb any flip. This knowledge prevents confusion when reading papers, debugging implementations, or interfacing with signal processing tools.

2 / 5

Loading learning content...

Machine LearningConvolutional Neural Networks

The Convolution Operation

LevelIntermediate

Duration90 mins

TopicConvolutional Neural Networks

2 / 5

Cross-Correlation

The Deep Learning Community's 'Convolution'

Here's a fundamental truth that often surprises newcomers: When deep learning practitioners say 'convolution', they almost always mean cross-correlation.

What You Will Learn

The Mathematical Definition of Cross-Correlation

1D Cross-Correlation:

For a discrete signal f[n] and kernel g[n], the cross-correlation (often denoted with ★ or ⊛ to distinguish from convolution's *) is:

$$(f \star g)[n] = \sum_{k=-\infty}^{\infty} f[k] \cdot g[k - n]$$

Or equivalently, with finite sequences:

$$(f \star g)[n] = \sum_{k=0}^{K-1} f[n + k] \cdot g[k]$$

The Crucial Difference:

Compare to convolution:

Convolution: (f * g)[n] = Σₖ f[k] · g[n - k] — kernel is flipped
Cross-correlation: (f ★ g)[n] = Σₖ f[k] · g[k - n] — kernel is NOT flipped

The difference is subtle in the formula but profound in interpretation. In convolution, the kernel is reversed as it slides; in cross-correlation, it maintains its original orientation.

2D Cross-Correlation:

For a 2D input I[m, n] and kernel K[i, j]:

$$(I \star K)[m, n] = \sum_{i=0}^{K_h-1} \sum_{j=0}^{K_w-1} I[m + i, n + j] \cdot K[i, j]$$

Here, the kernel slides across the image without rotation or flipping. Each output position is the dot product of the kernel with the corresponding image patch, aligned directly.

Relationship to Convolution:

Cross-correlation with kernel K is equivalent to convolution with the flipped kernel K':

$$f \star g = f * g'$$

where g'[k] = g[-k] in 1D, or K'[i, j] = K[Kₕ - 1 - i, Kw - 1 - j] in 2D (180° rotation).

This means any operation achievable with one is achievable with the other, just with a transformed kernel.

convolution_vs_correlation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import numpy as np
from scipy.signal import convolve2d, correlate2d
 
def demonstrate_convolution_vs_correlation():
    """
    Demonstrate the mathematical difference between convolution and correlation.
    """
    # Create a simple test image
    image = np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ], dtype=float)
    
    # An asymmetric kernel (to see the flip effect clearly)
    kernel = np.array([
        [1, 0],
        [0, 2]
    ], dtype=float)
    
    # Flipped kernel (180° rotation)
    kernel_flipped = np.array([
        [2, 0],
        [0, 1]
    ], dtype=float)
    
    print("Image:")
    print(image)
    print("\nKernel:")
    print(kernel)
    print("\nFlipped Kernel (180° rotation):")
    print(kernel_flipped)
    
    # True convolution (scipy uses the mathematical definition)
    conv_result = convolve2d(image, kernel, mode='valid')
    
    # Cross-correlation (no flip)
    corr_result = correlate2d(image, kernel, mode='valid')
    
    # Convolution with flipped kernel = Cross-correlation with original
    conv_flipped = convolve2d(image, kernel_flipped, mode='valid')
    
    print("\nConvolution (with kernel flip):")
    print(conv_result)
    print("\nCross-correlation (no flip):")
    print(corr_result)
    print("\nConvolution with pre-flipped kernel:")
    print(conv_flipped)
    
    # Key observation: conv_flipped equals corr_result
    print("\nConv(image, flipped_kernel) == Corr(image, kernel):")
    print(np.allclose(conv_flipped, corr_result))
 
 
if __name__ == "__main__":
    demonstrate_convolution_vs_correlation()

The Flip Matters Only for Asymmetric Kernels

Why Deep Learning Uses Cross-Correlation

1. Gradient Computation Symmetry:

2. Historical Momentum:

3. The Learnability Argument (The Killer Insight):

This is the most important reason. Consider what happens during training:

Suppose the 'correct' kernel for detecting vertical edges is K = [[1, 0, -1], [2, 0, -2], [1, 0, -1]]
If we use true convolution, the kernel gets flipped during operation
If we use cross-correlation, it doesn't

But the kernel weights are learned!

In other words:

With convolution: learn K, apply flip(K)
With cross-correlation: learn flip(K) directly

The final effect on outputs is identical. The network learns whatever kernel orientation produces the correct output—the flip is absorbed into the learned weights.

The Flip Gets Absorbed

4. Implementation Simplicity:

Cross-correlation is slightly simpler to implement:

No kernel flip required before the sliding operation
Direct memory access patterns for both input and kernel
Easier to reason about when debugging

These micro-simplifications compound across millions of lines of deep learning code.

Mathematical Properties: Convolution vs. Cross-Correlation

Properties of True Convolution:

Commutativity: f * g = g * f ✓
Associativity: f * (g * h) = (f * g) * h ✓
Distributivity: f * (g + h) = (f * g) + (f * h) ✓
Identity Element: f * δ = f (where δ is the Dirac delta) ✓
Convolution Theorem: ℱ{f * g} = ℱ{f} · ℱ{g} ✓

Properties of Cross-Correlation:

Commutativity: f ★ g ≠ g ★ f in general ✗
Associativity: f ★ (g ★ h) ≠ (f ★ g) ★ h in general ✗
Distributivity: f ★ (g + h) = (f ★ g) + (f ★ h) ✓
No Identity Element in the convolution sense ✗
Correlation Theorem: ℱ{f ★ g} = ℱ{f}* · ℱ{g} (note the conjugate) ✓

Property Comparison: Convolution vs. Cross-Correlation
Property	Convolution (*)	Cross-Correlation (★)	Impact in Deep Learning
Commutativity	✓ f * g = g * f	✗ f ★ g ≠ g ★ f	Rarely exploited in CNNs
Associativity	✓ (f * g) * h = f * (g * h)	✗ Not associative	Matters for kernel fusion analysis
Distributivity	✓ Distributes over +	✓ Distributes over +	Both support linear operations
Identity	✓ δ is identity	✗ No identity	No practical CNN impact
Fourier Relation	Multiplication	Multiplication with conjugate	FFT-based convolution uses true conv

Loss of Associativity: A Deeper Look

The loss of associativity in cross-correlation has subtle implications for theoretical analysis:

With true convolution, stacking two convolutional layers with kernels K₁ and K₂ is theoretically equivalent to a single layer with kernel K₁ * K₂. This is because:

$$f * K_1 * K_2 = f * (K_1 * K_2)$$

With cross-correlation, this equivalence doesn't hold in general:

$$(I \star K_1) \star K_2 \neq I \star (K_1 \star K_2)$$

However, in practice, this theoretical difference has minimal impact because:

Each layer has a nonlinear activation between convolutions, breaking any linear collapse anyway
Learned kernels aren't constrained in ways that would exploit associativity
CNN architectures don't rely on algebraic kernel manipulation

When Associativity Matters

Intuitive Comparison: Flip and Slide vs. Direct Slide

Let's build geometric intuition for the difference between convolution and cross-correlation through visualization.

Convolution: Flip and Slide

Rotate the kernel 180° (flip both horizontally and vertically)
Place the flipped kernel at the starting position
Compute element-wise multiplication and sum (dot product)
Slide one position, repeat

Cross-Correlation: Direct Slide

Place the kernel at the starting position (no flip)
Compute element-wise multiplication and sum
Slide one position, repeat

Visual Example:

Consider a 1D signal [a, b, c, d, e] and kernel [1, 2, 3].

Cross-Correlation at position 0:

Signal window: [a, b, c]
Kernel: [1, 2, 3]
Result: a·1 + b·2 + c·3 = a + 2b + 3c

Convolution at position 0:

Signal window: [a, b, c]
Flipped kernel: [3, 2, 1]
Result: a·3 + b·2 + c·1 = 3a + 2b + c

The difference is how kernel weights align with signal positions.

visual_comparison.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
 
def visual_comparison_1d():
    """
    Step-by-step trace comparing convolution and cross-correlation.
    """
    signal = np.array([1, 2, 3, 4, 5], dtype=float)
    kernel = np.array([1, 2, 3], dtype=float)
    kernel_flipped = kernel[::-1]  # [3, 2, 1]
    
    print("Signal:", signal)
    print("Kernel:", kernel)
    print("Flipped Kernel:", kernel_flipped)
    print()
    
    # Cross-correlation: direct alignment
    print("=== Cross-Correlation (no flip) ===")
    for pos in range(len(signal) - len(kernel) + 1):
        window = signal[pos:pos + len(kernel)]
        result = np.dot(window, kernel)
        print(f"Position {pos}: {window} • {kernel} = {result}")
    
    print()
    
    # Convolution: kernel is flipped
    print("=== Convolution (with flip) ===")
    for pos in range(len(signal) - len(kernel) + 1):
        window = signal[pos:pos + len(kernel)]
        result = np.dot(window, kernel_flipped)
        print(f"Position {pos}: {window} • {kernel_flipped} = {result}")
    
    print()
    print("Notice: Same positions, different results due to kernel orientation")
 
 
if __name__ == "__main__":
    visual_comparison_1d()

Directional Interpretation:

The kernel flip in convolution can be thought of as 'looking backward'—the kernel's first element (reading left-to-right) aligns with the signal's last element in the window (right-to-left).

Cross-correlation is 'looking forward'—the kernel's first element aligns with the signal's first element in the window. Both directions.

For symmetric kernels (palindromic pattern), both operations yield identical results because flipping doesn't change the kernel.

In Images (2D):

The 180° rotation in 2D means:

Top-left kernel element moves to bottom-right
Top-right moves to bottom-left
And so on

Template Matching Perspective

Framework Implementations: Under the Hood

Let's examine how major deep learning frameworks implement their 'convolution' operations and verify that they use cross-correlation.

PyTorch nn.Conv2d:

PyTorch's documentation explicitly states that Conv2d computes cross-correlation. From the docs:

This module supports TensorFloat32. The implementation uses cross-correlation...

The forward pass is: $$\text{output}[n, c_{out}, h, w] = \sum_{c_{in}} \sum_{i} \sum_{j} \text{input}[n, c_{in}, h+i, w+j] \times \text{weight}[c_{out}, c_{in}, i, j] + \text{bias}[c_{out}]$$

No flip is applied to the weight tensor.

TensorFlow/Keras Conv2D:

TensorFlow similarly implements cross-correlation:

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input...

JAX/Flax:

JAX's lax.conv_general_dilated provides options for various convolution modes but defaults to cross-correlation semantics.

framework_verification.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import torch
import torch.nn as nn
import numpy as np
 
def verify_pytorch_uses_cross_correlation():
    """
    Empirically verify that PyTorch's Conv2d uses cross-correlation.
    """
    # Create a simple test case
    # Input: 1 batch, 1 channel, 3x3 image
    input_tensor = torch.tensor([
        [1.0, 2.0, 3.0],
        [4.0, 5.0, 6.0],
        [7.0, 8.0, 9.0]
    ]).reshape(1, 1, 3, 3)
    
    # Asymmetric kernel to make flip visible
    kernel = torch.tensor([
        [1.0, 2.0],
        [3.0, 4.0]
    ]).reshape(1, 1, 2, 2)
    
    # Create Conv2d layer and set weights manually
    conv = nn.Conv2d(1, 1, 2, bias=False)
    conv.weight.data = kernel
    
    # PyTorch result
    pytorch_result = conv(input_tensor).squeeze()
    
    # Manual cross-correlation (no flip)
    input_np = input_tensor.numpy().squeeze()
    kernel_np = kernel.numpy().squeeze()
    
    correlation_result = np.zeros((2, 2))
    for i in range(2):
        for j in range(2):
            patch = input_np[i:i+2, j:j+2]
            correlation_result[i, j] = np.sum(patch * kernel_np)
    
    # Manual convolution (with 180° flip)
    kernel_flipped = kernel_np[::-1, ::-1]
    convolution_result = np.zeros((2, 2))
    for i in range(2):
        for j in range(2):
            patch = input_np[i:i+2, j:j+2]
            convolution_result[i, j] = np.sum(patch * kernel_flipped)
    
    print("Input:")
    print(input_np)
    print("\nKernel:")
    print(kernel_np)
    print("\nPyTorch Conv2d result:")
    print(pytorch_result.detach().numpy())
    print("\nManual cross-correlation:")
    print(correlation_result)
    print("\nManual true convolution:")
    print(convolution_result)
    print("\n--- Verification ---")
    print(f"PyTorch matches cross-correlation: {np.allclose(pytorch_result.detach().numpy(), correlation_result)}")
    print(f"PyTorch matches true convolution: {np.allclose(pytorch_result.detach().numpy(), convolution_result)}")
 
 
if __name__ == "__main__":
    verify_pytorch_uses_cross_correlation()

Historical Naming

When the Distinction Actually Matters

While learnability makes the distinction irrelevant for most CNN training, there are specific scenarios where you must be aware of the difference:

1. Using Pre-defined Kernels:

2. FFT-Based Acceleration:

The convolution theorem (ℱ{f * g} = ℱ{f}·ℱ{g}) applies to true convolution, not cross-correlation. If you implement FFT-based 'convolution' for speed, you must either:

Use the correlation theorem (involves conjugate), or
Flip the kernel before FFT multiplication

3. Signal Processing Interoperability:

When interfacing with signal processing libraries (scipy, MATLAB's Signal Processing Toolbox), be aware that:

scipy.signal.convolve uses true convolution
scipy.signal.correlate uses cross-correlation
Deep learning frameworks use cross-correlation

Mixing conventions without awareness causes bugs.

Scenarios Requiring Attention

•Transfer Learning with Fixed Layers: If early layers are frozen and use hand-designed kernels, verify the flip convention matches expectations.
•Network Surgery: When extracting kernels from one framework and inserting into another, ensure both use the same convention.
•Theoretical Analysis: Papers analyzing CNN expressivity or kernel properties may assume true convolution for mathematical convenience.
•Hybrid Pipelines: Systems combining deep learning with traditional signal processing must account for differing conventions.
•Custom Backward Passes: If implementing custom gradients, remember the asymmetry between forward (correlation) and backward (convolution with respect to certain gradients).

predefined_kernel_gotcha.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import torch
import torch.nn as nn
import numpy as np
 
def sobel_kernel_example():
    """
    Demonstrate the importance of flipping when using pre-defined kernels.
    """
    # Standard Sobel kernel for vertical edge detection
    # Designed with convolution in mind: bright-on-right edges
    sobel_x = np.array([
        [-1, 0, 1],
        [-2, 0, 2],
        [-1, 0, 1]
    ], dtype=np.float32)
    
    print("Original Sobel-X kernel (designed for convolution):")
    print(sobel_x)
    
    # For use in PyTorch (which uses cross-correlation):
    # To get the same effect as convolution, flip the kernel
    sobel_x_for_pytorch = np.flip(np.flip(sobel_x, 0), 1)  # 180° rotation
    
    print("\nFlipped Sobel-X for PyTorch cross-correlation:")
    print(sobel_x_for_pytorch)
    
    # However, for symmetric kernels like Gaussian, no flip needed
    gaussian_3x3 = np.array([
        [1, 2, 1],
        [2, 4, 2],
        [1, 2, 1]
    ], dtype=np.float32) / 16
    
    # 180° rotation of a symmetric kernel is the same
    gaussian_flipped = np.flip(np.flip(gaussian_3x3, 0), 1)
    
    print("\nGaussian kernel:")
    print(gaussian_3x3)
    print("\nFlipped Gaussian (identical):")
    print(gaussian_flipped)
    print("\nSymmetric kernels are flip-invariant:", 
          np.allclose(gaussian_3x3, gaussian_flipped))
 
 
if __name__ == "__main__":
    sobel_kernel_example()

Practical Advice

The Signal Processing Perspective

Cross-Correlation as Similarity Measure:

In signal processing, cross-correlation measures how similar two signals are as one is shifted relative to the other:

$$R_{fg}[\tau] = \sum_{n} f[n] \cdot g[n + \tau]$$

The value Rfg[τ] tells us how well signal g aligns with signal f when g is shifted by τ samples.

Applications of Cross-Correlation:

Radar/Sonar: Determining the time delay of a reflected signal
Audio Synchronization: Aligning two audio recordings
Template Matching: Finding where a pattern occurs in a signal
Autocorrelation (f ★ f): Measuring a signal's self-similarity, revealing periodicity

Convolution as System Response:

In contrast, convolution computes the output of a linear time-invariant system:

$$y[n] = (x * h)[n]$$

where x is the input and h is the system's impulse response.

The kernel flip in convolution ensures causality for physical systems: past inputs affect current output. Cross-correlation doesn't have this causal interpretation.

Signal Processing: Convolution vs. Cross-Correlation
Aspect	Convolution	Cross-Correlation
Primary Use	System response computation	Similarity/alignment measurement
Physical Interpretation	Output of LTI system	Pattern matching score
Kernel/Filter	Impulse response	Template/reference signal
Symmetry	Commutative, associative	Neither (in general)
Causality	Naturally causal	Not inherently causal

CNNs as Pattern Matchers:

So while the naming is confusing (calling cross-correlation 'convolution'), the operational choice aligns with the intuitive purpose of convolutional layers: pattern detection, not system simulation.

Two Communities, One Operation, Different Names

Summary: Cross-Correlation is the Real Operation

We've thoroughly examined the distinction between convolution and cross-correlation, establishing exactly what deep learning frameworks do under the hood. Let's consolidate the key insights:

Key Takeaways

•Mathematical Difference: Convolution flips the kernel; cross-correlation does not. Cross-correlation with K equals convolution with flipped K.
•Framework Reality: All major deep learning frameworks (PyTorch, TensorFlow, JAX) implement cross-correlation, not true convolution, despite naming.
•Why It Doesn't Matter for Learning: Since kernels are learned, the flip is absorbed into the learned weights. The network's capacity is unchanged.
•Property Differences: True convolution has commutativity and associativity; cross-correlation does not. Rarely impacts CNN practice.
•When to Care: Pre-defined kernels, FFT acceleration, signal processing interoperability, and theoretical analysis require awareness of the convention.
•Template Matching Intuition: Cross-correlation aligns with the pattern-matching interpretation of CNN kernels.

What's Next:

Terminology Demystified

2 / 5