# Why are PyTorch “convolutions” implemented as cross-correlations?

PyTorch convolutions are actually implemented as cross-correlations. This shouldn’t produce issues in training a convolution layer, since one is just a flipped version of the other (and hence the learned function will be equally powerful), but it does prove an issue when:

1. trying to implement an actual convolution with the `functional` library
2. trying to copy the weights of an actual convolution from another deep learning library

The authors say the following in Deep Learning with PyTorch:

Convolution, or more precisely, discrete convolution1

1. There is a subtle difference between PyTorch’s convolution and mathematics’ convolution: one argument’s sign is flipped. If we were in a pedantic mood, we could call PyTorch’s convolutions discrete cross-correlations.

But they don’t explain why it was implemented like this. Is there a reason?

Maybe for eficiency reasons, something similar to how the PyTorch implementation of `CrossEntropyLoss` isn’t actually cross entropy but an analogous function taking “logits” as inputs instead of raw probabilities (to avoid numerical instability)?

1 Like

Convolutions are implemented as cross-correlations for performance / efficiency, as you brought up.

From Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs: “The advantage of cross-correlation is that it avoids the additional step of flipping the filters to perform the convolutions.”

2 Likes