Conv2d with 4 pos args

LS

I’m trying to understand some code made by someone else.
One of the layers in the pytorch neural network is:

self.conv1 = torch.nn.Conv2d(4, 32, 8, 4)

In the code I notice that the input to this layer is
torch.Size([1, 4, 84, 84])
and the output of this layer is
torch.Size([1, 32, 20, 20])

I read the documentation of pytorch conv2d, but this confused me since there are only 3 positional arguments:
https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

whereas in the code I have there are 4 positional arguments.

I’d like to understand how Conv2d(4, 32, 8, 4) transforms a torch.Size([1, 4, 84, 84]) to a torch.Size([1, 32, 20, 20])

Kind rgds

So, the parameters are the following:

  • in_channels = 4
  • out_channels = 32
  • kernel_size = 8
  • stride = 4

The input is of size (batch_size, n_channels, height, width). Batch dim is 1 in both, so that can be discarded. The second dim of the input is 4, which is the in_channels, and the second dim of the output is 32, which is the out_channels.

Now, how (84, 84) turns into (20, 20) follows the formula described in the conv2d documentation:

image

h_out = floor(((84 + 2 * 0 - 1 * (8 - 1) - 1))/4 +1) = floor(19 + 1) = 20
w_out is the same.

1 Like

Excellent, thanks a lot!

So, do I understand correctly that there are in total 4 * 32 * 8 * 8 kernel coefficients? I.e. one 8x8 kernel for each (input layer C_in, output layer C_out) combination?

I just realized that key word arguments can be assigned as a positional argument as well. Just tested it, and indeed it’s the case. Didn’t know that!

1 Like