Pytorch convolution and tensorflow convolution giving different results

In Tensorflow convolution:

y = np.random.rand(1,100,100,1)
filterx = np.random.rand(5,5,1,1)
a= tf.nn.conv2d(
    filterx, [1,1,1,1], 'VALID')
In pytorch , for same input and weights:
x = torch.nn.Conv2d(1,1,5, bias = False)
filter = np.transpose(filterx, (2,3,1,0))
x.weight = torch.nn.Parameter(torch.from_numpy(filter))
z = np.transpose(y, (0,3,1,2))
l = x(torch.from_numpy(z))
l = l.detach().numpy()
l = np.transpose(l,(0,2,3,1))
Both l and a variable should have same o/p, but its not. Why convolution behavior is different in tensorflow and pytorch?

When doing these things, I would recommend to always use sizes that are not equal. Then shape compatibility and comparing the shapes tells you where you have the wrong order.

import torch
import tensorflow
import numpy
y = numpy.random.rand(1,100,100,1)
filterx = numpy.random.rand(4,5,1,2)
a= tensorflow.nn.conv2d(
    filterx, [1,1,1,1], 'VALID')
with tensorflow.Session() as sess:
  t = (

x = torch.nn.Conv2d(1,1,5, bias = False)
filter = numpy.transpose(filterx, (3,2,0,1))
x.weight = torch.nn.Parameter(torch.from_numpy(filter))
z = numpy.transpose(y, (0,3,1,2))
l = x(torch.from_numpy(z))
l = l.detach().numpy()
l = numpy.transpose(l,(0,2,3,1))


Gives 4e-15 or so, the double precision.

Best regards



Good Day @tom, I tried your approach and it actually keeps the dimensions when applying filters.

But i have to ask:

  • Could you explain, How you choosed these transpose operations ?
  • Which logic are you following to move the axes?

I will appreciate if you could explain or point me in the right direction of what is being done.


Basically, one can read up on how Tensorflow and PyTorch typically order their batches of images (TF: NHWC, PyTorch: NCHW) and weights (TF: HWCiCo, PyTorch CoCiHW) and then implement it and fix anything that went wrong.
When porting a complex model – like when @ptrblck and I did ported StyleGAN – I usually find a couple of things where my first attempt is wrong. This can be tedious and error-prone when you don’t have a comparison point, which is why people would love first class named tensors.

Best regards


1 Like

Thanks is very clear now, i wasn’t taking into account weights order.