Difference results with torch.nn.Conv2d and torch.nn.functional.conv2d

I just want to perform a simple convolution with a 3x3 kernel in a 3x3 image with padding=1. Both kernel and image have a single channel only.

This is the image and the kernel (weights)

# Initialize image and filter
my_image = torch.Tensor([[0, 1, 0], [1, 1, 1], [0, 1, 0]]).unsqueeze(0).unsqueeze(0)
weights = torch.Tensor([[0, 1, 0], [0, 0, 0], [0, 0, 0]]).unsqueeze(0).unsqueeze(0)

Using torch.nn.Conv2d to apply my convolution I have:

# First scenario
my_conv = torch.nn.Conv2d(1, 1, kernel_size=(3, 3), bias=False, padding=1)
my_conv.weight = torch.nn.Parameter(weights)
res1 = my_conv(my_image)

If I use the torch nn.functional.conv2d, I have:

# Second scenario
res2 = F.conv2d(weights, my_image, bias=None, padding=1)

The results I obtain are different:

res1 will be
tensor([[[[0., 0., 0.],
[0., 1., 0.],
[1., 1., 1.]]]], grad_fn=)

and res2:
tensor([[[[1., 1., 1.],
[0., 1., 0.],
[0., 0., 0.]]]])

What am I missing here? I appreciate all kinds of help.

You have mixed up the input and weight in the call to F.conv2d. Input goes first.

If you’re comparing to pen and paper, note that the PyTorch convolutions (and just about everyone else’s) are correlations, i.e. no flipping of the axes before summation occurs.

Best regards

Thomas

P.S.: This would have been easy to spot had you used different sizes for weights and inputs. (Edit: I should say I’m a bit prone to this type of error myself and that is why I try to make a habit of trying to make tensor dimensions as unique as possible. We all started confused at some point.)

1 Like

@tom,

Got it. Thank you for your explanation.

Should be the same:

import torch
import torch.nn.functional as F
import torch.nn as nn

filter = torch.randn(1,1,3,3)
inputs = torch.randn(1,1,5,5)
o1=F.conv2d(inputs, filters, padding=0, bias=None)
print(o1)
# using nn.Conv2d
print(filters.size())
fc = filters.transpose(2,3) 
print(fc.size()) 
# corss-correlation
conv = nn.Conv2d(1,1,kernel_size=3,padding=0, bias=False)
conv.weight = nn.Parameter(filters) 
o2=conv(inputs)
print(o2)
# convolution
conv.weight = nn.Parameter(fc) 
o3=conv(inputs)
print(o3)

Out:

tensor([[[[ 1.9127,  0.5849, -2.8391],
          [-0.8041, -2.4248, -7.8349],
          [ 2.3626,  0.1699,  4.0892]]]])
torch.Size([1, 1, 3, 3])
torch.Size([1, 1, 3, 3])
tensor([[[[ 1.9127,  0.5849, -2.8391],
          [-0.8041, -2.4248, -7.8349],
          [ 2.3626,  0.1699,  4.0892]]]], grad_fn=<MkldnnConvolutionBackward>)
tensor([[[[-3.8473,  4.5131, -4.5969],
          [-0.7352,  2.9420, -4.2360],
          [ 0.4477, -0.3692, -1.1258]]]], grad_fn=<MkldnnConvolutionBackward>)

The third output is if you would like to use real convolution.

1 Like

There should not be any difference in the output values as torch.nn.Conv2d calls torch.nn.functional.conv2d under the hood to compute the convolution. That being said, a computational graph (helpful for gradients, will only be formed for torch.nn.Conv2d), which is the reason we see the reference to the backward object only in the case of nn.Conv2d and not functional.conv2d.

1 Like