Output of a convolution layer

Hello there, how are you doing?

I’m studying convolutional networks, and to check my understanding I compared the output of a single convolutional layer with the result produced by the function “correlated2d” from the scipy library. Both results are the same, however, the Pytorch documentation describes the output of a 2D convolution as

image

so, because of the order of the cross-correlation operator described above, shouldn’t the result be a flipped version of what I got?

My code:

import numpy as np
import torch
import torch.nn as nn
import scipy


kernel = torch.from_numpy(
    np.array(
        [[[[ 1, 0, -1], 
           [ 0, 0,  0], 
           [-2, 0,  1]]]]
    )
).float()

input_data = torch.from_numpy(
    np.array(
        [[[1, 2, 3, 4, 5],
          [6, 7, 8, 9, 1],
          [2, 3, 4, 5, 6],
          [7, 8, 9, 1, 2],
          [3, 4, 5, 6, 7]]]
    )
).float()

conv2d = nn.Conv2d(
    in_channels=1,
    out_channels=1,
    kernel_size=3,
    padding=0,
    stride=1,
    padding_mode="zeros",
    bias=False,
)
conv2d.weight = nn.Parameter(kernel)

print(conv2d(input_data))

print(scipy.signal.correlate2d(
    input_data.squeeze(0), 
    kernel.squeeze(0).squeeze(0), 
    mode="valid"
))

Results:

tensor([[[ -2.,  -3.,  -4.],
         [ -7., -17.,  -9.],
         [ -3.,  -4.,  -5.]]], grad_fn=<SqueezeBackward1>)


[[ -2.  -3.  -4.]
 [ -7. -17.  -9.]
 [ -3.  -4.  -5.]]

Expected result:

print(scipy.signal.correlate2d(
    kernel.squeeze(0).squeeze(0), 
    input_data.squeeze(0), 
    mode="valid"
))

array([[ -5.,  -4.,  -3.],
       [ -9., -17.,  -7.],
       [ -4.,  -3.,  -2.]], dtype=float32)

That indeed looks strange, but I cannot reproduce that behavior on my system. In order to run your script, I had to modify the scipy import slightly, but this is what I used:

import numpy as np
import torch
import torch.nn as nn
import scipy.signal

print(scipy.__version__)
print(torch.__version__)


kernel = torch.from_numpy(
    np.array(
        [[[[ 1, 0, -1],
           [ 0, 0,  0],
           [-2, 0,  1]]]]
    )
).float()

input_data = torch.from_numpy(
    np.array(
        [[[1, 2, 3, 4, 5],
          [6, 7, 8, 9, 1],
          [2, 3, 4, 5, 6],
          [7, 8, 9, 1, 2],
          [3, 4, 5, 6, 7]]]
    )
).float()

conv2d = nn.Conv2d(
    in_channels=1,
    out_channels=1,
    kernel_size=3,
    padding=0,
    stride=1,
    padding_mode="zeros",
    bias=False,
)
conv2d.weight = nn.Parameter(kernel)

print(conv2d(input_data))

print(scipy.signal.correlate2d(
    input_data.squeeze(0),
    kernel.squeeze(0).squeeze(0),
    mode="valid"
))

I got:

1.6.3
1.14.0a0+410ce96
tensor([[[ -2.,  -3.,  -4.],
         [ -7., -17.,  -9.],
         [ -3.,  -4.,  -5.]]], grad_fn=<SqueezeBackward1>)
[[ -2.  -3.  -4.]
 [ -7. -17.  -9.]
 [ -3.  -4.  -5.]]