Why are PyTorch “convolutions” implemented as cross-correlations?

PyTorch convolutions are actually implemented as cross-correlations. This shouldn’t produce issues in training a convolution layer, since one is just a flipped version of the other (and hence the learned function will be equally powerful), but it does prove an issue when:

  1. trying to implement an actual convolution with the functional library
  2. trying to copy the weights of an actual convolution from another deep learning library

The authors say the following in Deep Learning with PyTorch:

Convolution, or more precisely, discrete convolution1

1. There is a subtle difference between PyTorch’s convolution and mathematics’ convolution: one argument’s sign is flipped. If we were in a pedantic mood, we could call PyTorch’s convolutions discrete cross-correlations.

But they don’t explain why it was implemented like this. Is there a reason?

Maybe for eficiency reasons, something similar to how the PyTorch implementation of CrossEntropyLoss isn’t actually cross entropy but an analogous function taking “logits” as inputs instead of raw probabilities (to avoid numerical instability)?

(Stack Overflow post)

1 Like

Convolutions are implemented as cross-correlations for performance / efficiency, as you brought up.

From Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs: “The advantage of cross-correlation is that it avoids the additional step of flipping the filters to perform the convolutions.”